Language databases, tools and solutions

Language databases
in 100+ languages

We are providers of language databases such as word lists, n-gram lists, word databases and lexicons for use in text processing products, typing correction, prediction software and similar applications.

language databases

Text Analytics API &
text analysis tools

Our text analytics API can retrieve data from multi-billion-word text corpora in 100+ languages for use in other software.

text analytics API

Custom dictionary and glossary building

We specialize in building bespoke dictionaries and glossaries to the customer’s specification using the efficient and cost-effective Dictionary Express method.

Dictionary Express

LANGUAGE SOLUTIONS

Our team includes some of the top NLP experts ready to contribute to projects involving text processing, text categorization, search solutions and information retrieval.

language solutions

Language tools

We developed a number of NLP open-source tools for boilerplate removal, character encoding detection, deduplication, web crawling and others used for building text corpora which can be searched in NoSketch Engine, an open-source corpus query system. Automated corpus building using these tools is integrated into Sketch Engine.

language tools

New words

It is a service dedicated to discovering new and trending words in a language. By combining an automatic tool for diachronic analysis with our expertise in text analysis we are able to provide exceptionally valuable content for websites and content marketing.

new words

Language modelling

We provide word embedding models trained using fastText from the corpora available in the Sketch Engine.

Download our language models for dozens of languages.

word embeddings

TEXT CORPUS BUILDING

We build, manage and analyze very large (billions of words) text data (text corpora). Sketch Engine, our flagship product, hosts corpora in 100+ languages which are used by publishers, linguists, universities, software developers, teachers and students to learn about how language works.

corpus building