Material Detail

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

This video was recorded at Electronic lexicography in the 21st century: new applications for new users (eLex2011). Dante (www.webDante.com) is a lexical database which provides a fine-grained, corpus-based description of the core vocabulary of English. Every fact recorded in the database is derived from, and explicitly supported by, evidence from a 1.7 billion-word corpus of current English. Almost all of these facts are machine-retrievable. Dante – the Database of ANalysed Texts of English – was designed and created for Foras na Gaeilge by the Lexicography Master Class and an 18-strong team of skilled lexicographers, using the Sketch Engine (www.sketchengine.co.uk) for corpus-querying, and IDM's Dictionary Production System (DPS: www.idm.fr) for entry-building. The resulting database records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and includes over 27,000 idioms and phrases, underpinned by over 600,000 sentence examples from the corpus. The project pioneered new approaches in project management, software customisation, text origination, and quality control. Collectively, these initiatives enabled us to achieve significant levels of automation (hence cost saving) in the lexicographic process, as well as greater systematicity. Most of these innovations are transferable, so our experience on the Dante project has implications for lexicographic methodology as a whole. Though Dante began life as an 'English framework' destined for the development of a new English-Irish dictionary (http://www.focloir.ie/english.asp) it was designed to be a linguistic resource beyond this primary function. It offers publishers a launchpad for the development or updating of monolingual or bilingual dictionaries, and provides rich data for researchers, software developers, and materials writers. In this talk we will discuss the project's methodological innovations, demonstrate the wealth and range of data in Dante, and reflect on the long-term potential of this unique database.

Keywords:: videolectures, ocwc, oec

Disciplines:

Social Sciences / Linguistics

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 8, 2015
Date Modified in MERLOT:: February 8, 2015
Author:: Michael Rundell, Lexicography MasterClass
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States

Browse...

Disciplines with similar materials as The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

Social Sciences / Linguistics

Material Detail

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

Quality

More about this material

Browse...

Disciplines with similar materials as The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

People who viewed this also viewed

Other materials like The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

Comments