Russian word frequency lists

We produce high-quality frequency word lists in Russian (and many other languages). The lists are generated from a large database of authentic text (text corpora) produced by real users of Russian. Our largest Russian corpus contains texts with a total length of 14,000,000,000 words.

Data quality

A relatively small corpus is sufficient to generate a database of  10,000 most frequent Russian words, or the list of 3,000 words or 5,000 words because such words appear frequently enough in any text.

However, an enormous text database (corpus) is required to ensure reliable word frequency information even for rare and infrequently used words. The only viable option of building corpora of billions of words is using an automatic procedure of downloading content from the web. Lexical Computing developed a sophisticated procedure for collecting only linguistically valuable content from the web. A series of tools is used to focus on the right content and to perform deduplication and cleaning. This ensures that the statistics are not skewed. This blog post gives more details.

Wordlist size

We are able to generate word frequency lists of millions of unique words in Russian. The actual size depends on the specifications. By default, we will not include any word which appears fewer than 5 times in the corpus. Such words are typically noise without any linguistic value. We are able to accommodate any requirements specified by the customer.

Enriched word frequency wordlists

The word frequency lists can be enriched with additional information such as POS tags, lemmas, probabilities of the next word, or any other statistics or morphological information.

Wordlist sample

The easiest is to register a free trial account in Sketch Engine and use the wordlist tool to generate a wordlist. The advanced tab of the wordlist tool allows for detailed specifications to be used.

Wordlist prices

We will provide a quotation based on the exact specifications and the intended use of the wordlist.

Wordlist download

Your Russian word frequency list will be made for download to you on a dedicated link within the agreed period of time. It normally takes a week or two to generate the data. Very complex wordlist can be computationally demanding and can take longer to produce.

Russian word frequency list

A random sample of words from the frequency list of Russian word forms with part-of-speech tags. The list can be delivered in the required format and supplemented with statistical, morphological and other linguistic information.

Russian word frequency

Russian word frequency list sample

Download a spreadsheet with a sample of the last 100 words in each thousand between 1,000 and 100,000. The list is case sensitive. Lists with specific criteria and filtering options can be generated to your requirements.