API

for text analytics, Natural Language Processing (NLP), corpus building and searching

The text analytics API was developed to enable other software to exploit the NLP functionality of Sketch Engine. The API opens the door to a complete suite of text analytics tools in Sketch Engine which is designed as text analysis software. It mirrors the functionality available through the web interface.

Text analysis API demo

Register for 30-day free trial and explore text analysis tools via API.

We offer a complete functionality for at least 25 major languages. Certain features (e.g. morphological analysis or part-of-speech tagging) might not be available for all 100+ languages. Please contact us to check the extent of support for your language.

One-off jobs

If you need a large amount of language data processed, analysed or generated only once, it is usually more practical and cost-effective to request a one-off job. An example can be a generation of a word database of all words in a language or processing a multi-billion-word data set into a text corpus. The language data will be processed by the Lexical Computing team and delivered in a format according to your specification.

Examples of text analytics API use

retrieval of

  • keywords and terms
  • collocations (coocurrences)
  • synonyms (thesaurus)
  • Good Dictionary EXamples

test processing to obtain

  • part-of-speech tagging
  • lemmatization
  • frequency counts (word lists)
  • keywords and terms for the purpose of topic modelling

Supported languages

Please check the languages and the size of corpora we already have for each language.

The text analytics API is available for these languages

A complete functionality is available for 25+ major languages. Some features might not be available for the remaining supported languages. Please contact us for details.

Supported languages

Afrikaans
Albanian
Amazigh
Amharic
Ancient Greek
Arabic
Armenian
Azerbaijani
Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian
Burmese
Cantonese
Catalan
Cebuano
Chinese Simplified
Chinese Traditional
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Filipino
Finnish
French
Frisian
Georgian
German
Greek
Gujarati
Hausa (Boko)
Hebrew
Hindi
Hungarian
Icelandic
Igbo
Indonesian
Irish
Italian
Japanese
Kalaamaya
Kannada
Kazakh
Khmer
Korean
Kurdish (Kurmanji)
Kurdish (Sorani)
Kuwarra
Kyrgyz
Lao
Latin
Latvian
Limburgish
Lithuanian
Macedonian
Maduwongga
Malay
Malayalam
Maldivian
Maltese
Mankulatjarra
Manyjiljar
Maori
Marathi
Marlpa
Mirning
Mongolian
Montenegrin N'Ko
Ndebele
Nepali
Newspeak
Ngaanyatjarra
Ngaju
Ngalia
Nganta
Northern Sotho
Norwegian Bokmål
Norwegian
Norwegian Nynorsk
Nyakinyaki
Oromo
Pashto
Pintupi
Pitjantjatjara
Polish
Portuguese
Punjabi (Gurmukhi)
Punjabi (Shahmukhi)
Romanian
Russian
Samoan
Sanskrit (romanised)
Scottish Gaelic
Serbian
Serbian (Latin)
Setswana
Sinhalese
Slovak
Slovenian
Somali
Spanish
Swahili
Swazi
Swedish
Syriac
Tagalog
Tajik
Talysh
Tamil
Tatar
Telugu
Thai
Tibetan
Tigrinya
Tjalkatjarra
Tjupan
Tsonga
Turkish
Turkmen
Ukrainian
Urdu
Uzbek
Vietnamese
Wangkatja
Warlpiri
Welsh
Wudjaarri
Xhosa
Yankunytjatjara
Yiddish
Yoruba
Zulu