Language model & word embedding ~ Lexical Computing

Language model

A language model is a probability distribution that describes how frequent an occurrence of a particular sequence of words is. This modeling is nowadays used in the various applications of Natural Language Processing, such as machine translation, speech recognition, part-of-speech tagging, parsing and others.

Word embeddings

The Sketch Engine team prepared word embeddings, language models trained using fastText from the multi-billion-word corpora available in Sketch Engine. In a nutshell, the embedding means a word vector which describes word relations described by numbers (lengths) and directions.

Try word embeddings

Practical examples of using word embeddings include creating a thesaurus or word analogy (finding similar relations on the same principle, e.g. king – man, queen – woman). See the example from our embeddings viewer for the query king -man +woman (you will get queen, princess, …).

Language models for download are available for:

English (Modern English, Early Modern English)
Arabic
Chinese
Czech
Danish
French
German
Italian
Korean
Portuguese
Russian
Spanish (American, European)

on attributes lemma or word form.

Language model

Word embeddings

Try word embeddings

corpus query and management system

online dictionary editor

term extraction

A Course in Lexicography and Lexical Computing

Language modelling

Language model

Word embeddings

Try word embeddings

corpus query and management system

online dictionary editor

term extraction

A Course in Lexicography and Lexical Computing