Language model

A language model is a probability distribution that describes how frequent an occurrence of a particular sequence of words is. This modeling is nowadays used in the various applications of Natural Language Processing, such as machine translation, speech recognition, part-of-speech tagging, parsing and others.

Word embeddings

The Sketch Engine team prepared word embeddings, language models trained using fastText from the multi-billion-word corpora available in Sketch Engine. In a nutshell, the embedding means a word vector which describes word relations described by numbers (lengths) and directions.

Try word embeddings

Practical examples of using word embeddings include creating a thesaurus or word analogy (finding similar relations on the same principle, e.g. king – man, queen – woman). See the example from our embeddings viewer for the query king -man +woman (you will get queen, princess, …).

Language models for download are available for:

  • English (Modern English, Early Modern English)
  • Arabic
  • Chinese
  • Czech
  • Danish
  • French
  • German
  • Italian
  • Korean
  • Portuguese
  • Russian
  • Spanish (American, European)

on attributes lemma or word form.