We can generate word databases and frequency lists of the most frequent word forms or lemmas (sometimes referred to as dictionaries or lexicons). The lists can be enhanced with additional information such as part of speech or other information that can be retrieved from the corpus.

Word databases of all words in a language

Our corpora are large enough to generate a database of all words in a language. The length of such database can reach millions of words. The database can be filtered based on customer’s criteria and prepared for download in a number of formats.

We can meet any formatting requirements specified by the customer.

An example of an Estonian frequency word list showing the word form, lemma, grammatical tag and frequency.

eestlased eestlane S 12529 
esindaja esindaja S 12471 
edukalt edukalt D 12419 
eestlaste eestlane S 12370 
esineb esinema V 12126 
esindajad esindaja S 11809 
ehitada ehitama V 11763

Word database, lexicon or dictionary available in these languages

Please contact us if you need a language database in another language.

a course in lexicography and lexical computing