Language databases
in 90+ languages

We are providers of language databases such as word lists, n-gram lists, word databases and lexicons for use in text processing products, typing correction, prediction software and similar applications.

API

Sketch Engine API can retrieve language data from multi-billion word text corpora in 90+ langauges for use in another software.

LANGUAGE SOLUTIONS

Our team includes some of the top NLP experts ready to contribute to projects involving text processing, text categorization, search solutions, information retrieval.

Language tools

We developed a number of NLP open-source tools for boilerplate removal, character encoding detection, deduplication, web crawling and others used for building text corpora which can be searched in NoSketch Engine, an open source  corpus query system. Automated corpus building using these tools is integrated in Sketch Engine.

TEXT CORPUS BUILDING

We build, manage and analyze very large (billions of words) text data (text corpora). Sketch Engine, our flagship product, hosts corpora in 90+ languages which are used by publishers, linguists, universities, software developers, teachers and students to learn about how language works.

Consultancy & Legal councelling

We have been consultants in lexicography and corpus linguistics advising on dictionary projects and corpus building.

We provide legal counselling and expert witness services.