Skip to content

Digital-Pushkin-Lab/Russian-Word-Frequency-Lists-for-Children

Repository files navigation

Russian frequency lists for children

This repository contains word lists compiled from DetCorpus - corpus of Russian literature for children.

Wordlist_Detcorpus_50000 is a list of 50 000 lemmas with their frequencies from corpus of Russian literature for children, including more than 2,097 prose works written in Russian between the 1920s and 2010s and aimed at children and adolescents.

Wordlist_Detcorpus_nonfiction is a list of the 20 000 most frequent lemmas from the non-fiction subcorpus of DetCorpus.

Columns in the word lists

lemma is the normalized word forms, lemmatization made by Mystem analyzer. abs_frequency is the raw, absolute frequency value showing how many times lemma occurs in the corpus. ipm (items per million) is the normalized frequency value.

About

Collection of word lists with frequencies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors