We provide large high-quality word databases, lexical data, word lists and lexicons in many languages. Our data are generated from large databases of authentic text called text corpora. The largest corpora contain texts with a total length of 60,000,000,000 words. Such data allow us to generate databases of millions or even hundreds of millions of items while preserving accuracy and reliability. Our customers are software developers, dictionary and language teaching material publishers and anyone who needs reliable language data.
The databases we supply can be enriched with related linguistic data such as synonyms, collocations, example sentences and morphological and statistical information.
We also provide solutions in the area of full-text search, terminology extraction, document classification and categorization, data mining and information retrieval.
Data samples
Word frequency lists: English, Spanish, French, Arabic, Russian, Portuguese, Hindi. Bigram databases: English, Spanish, German, Russian.