Lexical Feature Matrix
Afrikaans (38,000 forms)
Iso: AF | Tier: 1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 20k | Total Number of Forms: 38K |
Albanian (284,000 forms)
Iso: SQ | Tier: 2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 35K | Total Number of Forms: 284K |
Amharic (230,000 forms)
Iso: AM | Tier: 3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 16K | Total Number of Forms: 230K |
Arabic (17 million forms)
Iso: AR | Tier: 3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 22K | Total Number of Forms: 17M |
Armenian (150,000 forms)
Iso: HY | Tier: 2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 6K | Total Number of Forms: 150K |
Assamese (1.26 millions forms)
Iso: AS | Tier: 2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 1.26M |
Azeri (1.1 Million forms)
Iso: AZ | Tier: 3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 14k | Total Number of Forms: 1.1M |
Basque (25 million forms)
Iso: EU | Tier: 3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: No | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 25M |
Belarusian (1 million forms)
Iso: BE | Tier: 2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 66k | Total Number of Forms: 1M |
Bengali (1.47 millions forms)
Iso: BN | Tier: 2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 54K | Total Number of Forms: 1.47M |
Bulgarian (800,000 forms)
Iso: BG | Tier: 2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 75K | Total Number of Forms: 800K |
Burmese (30,000 forms)
Iso: MY | Tier: 3 | Lemma: Yes | POS: Yes | Voice: No | Tense: No |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 30K |
Catalan (1.5 Million forms)
Iso: CA | Tier: 1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 35K | Total Number of Forms: 1.5M |
Chinese (75,000 forms)
Iso: ZH | Tier: 3 | Lemma: Yes | POS: Yes | Voice: No | Tense: No |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 75K | Total Number of Forms: 75K |
Croatian (434,000 forms)
Iso: HR | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 44K | Total Number of Forms: 434K |
Czech (4 million forms)
Iso: CS | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 55K | Total Number of Forms: 4M |
Danish (700,000 forms)
Iso: DA | Tier:1 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 60K | Total Number of Forms: 700K |
Dutch (500,000 forms)
Iso: NL | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 90K | Total Number of Forms: 500K |
English (180,000 forms)
Iso: EN | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 60K | Total Number of Forms: 180K |
Esperanto (400,000 forms)
Iso: EO | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: No | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 50K | Total Number of Forms: 400K |
Estonian (7 million forms)
Iso: ET | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 85K | Total Number of Forms: 7M |
Finnish (80 million forms)
Iso: FI | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 70K | Total Number of Forms: 80M |
French (1.4 millions forms)
Iso: FR | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 60K | Total Number of Forms: 1.4M |
Galician (5 million forms)
Iso: GL | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 5M |
Georgian (500,000 forms)
Iso: KA | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 23K | Total Number of Forms: 500K |
German (2.5 millions forms)
Iso: DE | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 100K | Total Number of Forms: 2.5M |
Greek (500,000 forms)
Iso: EL | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 27K | Total Number of Forms: 500K |
Gujarati (2.5 millions forms)
Iso: GU | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 2.5M |
Hebrew (12 million forms)
Iso: HE | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 23K | Total Number of Forms: 12M |
Hindi (500,000 forms)
Iso: HI | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 500K |
Hungarian (18 million forms)
Iso: HU | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 75K | Total Number of Forms: 18M |
Icelandic (1.75 millions forms)
Iso: IS | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 50K | Total Number of Forms: 1.75M |
Indonesian (150,000 forms)
Iso: ID | Tier:1 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: No |
Aspect: Yes | Mood: No | Person: No | Number: Yes | Gender: No | Case: No |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 35K | Total Number of Forms: 150K |
Irish Gaelic (1.5 millions forms)
Iso: GA | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: Yes | Total Number of Lemmas: 30K | Total Number of Forms: 1.5M |
Italian (1.4 millions forms)
Iso: IT | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: Yes | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 65K | Total Number of Forms: 1.4M |
Japanese (9.4 millions forms)
Iso: JP | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 450K | Total Number of Forms: 9.4 M |
Kannada (500,000 forms)
Iso: KN | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 40K | Total Number of Forms: 500K |
Kazakh (2 million forms)
Iso: KK | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 10K | Total Number of Forms: 2M |
Khmer (30,000 forms)
Iso: KM | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: No |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 30K |
Korean (6,25 millions forms)
Iso: KO | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: No | Number: No | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 75K | Total Number of Forms: 6.25M |
Kyrgyz (2 million forms)
Iso: KY | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 10K | Total Number of Forms: 2M |
Laos (45,000 forms)
Iso: LO | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 45K |
Latvian (2.37 millions forms)
Iso: LV | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 42K | Total Number of Forms: 2.37M |
Lithuanian (26 millions forms)
Iso: LT | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 44K | Total Number of Forms: 26M |
Macedonian (150,000 forms)
Iso: MK | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 150K |
Malay (120,000 forms)
Iso: MS | Tier:1 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: No |
Aspect: Yes | Mood: No | Person: No | Number: Yes | Gender: No | Case: No |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 120K |
Malayalam (500,000 forms)
Iso: ML | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 35K | Total Number of Forms: 500K |
Marathi (17 millions forms)
Iso: MR | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: No | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 19K | Total Number of Forms: 17M |
Mongolian (500,000 forms)
Iso: MN | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 23K | Total Number of Forms: 500K |
Nepali (1 million forms)
Iso: NE | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 15K | Total Number of Forms: 1M |
Norwegian Bokmal (500,000 forms)
Iso: NB | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 500K |
Norwegian Nynorsk (400,000 forms)
Iso: NN | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 75K | Total Number of Forms: 400K |
Oriya (63,000 forms)
Iso: OR | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 33K | Total Number of Forms: 63K |
Persian (400,000 forms)
Iso: FA | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: No |
Degree: Yes | Definiteness State: Yes | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 10K | Total Number of Forms: 400K |
Polish (1.45 millions forms)
Iso: PL | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 95K | Total Number of Forms: 1.45M |
Portuguese (3.5 millions forms)
Iso: PT | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 40K | Total Number of Forms: 3.5M |
Punjabi (240,000 forms)
Iso: PA | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 240K |
Romanian (300,000 forms)
Iso: RO | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 36K | Total Number of Forms: 300K |
Russian (1.5 millions forms)
Iso: RU | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 50K | Total Number of Forms: 1.5M |
Serbian (1.5 million forms)
Iso: SR | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: Yes | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 1.5 M |
Sindhi (451,000 forms)
Iso: SD | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 17K | Total Number of Forms: 451K |
Sinhala (916,000 forms)
Iso: SI | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: No | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 916K |
Slovak (1.5 million forms)
Iso: SK | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 45K | Total Number of Forms: 1.5M |
Slovenian (178,000 forms)
Iso: SL | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 22K | Total Number of Forms: 178K |
Spanish (2.5 million forms)
Iso: ES | Tier:1 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 60K | Total Number of Forms: 2.5M |
Swahili (650,000 forms)
Iso: SW | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number: Yes | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 34K | Total Number of Forms: 650K |
Swedish (500,000 forms)
Iso: SV | Tier:1 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: Yes | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 70K | Total Number of Forms: 500K |
Tagalog (90,000 forms)
Iso: TL | Tier:2 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: No |
Aspect: Yes | Mood: Yes | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 40K | Total Number of Forms: 90K |
Tamil (1 million forms)
Iso: TA | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: No | Person: Yes | Number:Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 27K | Total Number of Forms: 1M |
Telugu (1.5 millions forms)
Iso: TE | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: No | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 30K | Total Number of Forms: 1.5M |
Thai (40,000 forms)
Iso: TH | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: No |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: Yes |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 40K | Total Number of Forms: 40K |
Turkish (3.5 millions forms)
Iso: TR | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: No | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 300K | Total Number of Forms: 3.5M |
Ukrainian (650,000 forms)
Iso: UK | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 40K | Total Number of Forms: 650K |
Urdu (200,000 forms)
Iso: UR | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: Yes | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 15K | Total Number of Forms: 200K |
Uzbek (1 million forms)
Iso: UZ | Tier:3 | Lemma: Yes | POS: Yes | Voice: Yes | Tense: Yes |
Aspect: Yes | Mood: Yes | Person: Yes | Number: Yes | Gender: No | Case: Yes |
Degree: Yes | Definiteness State: No | Negative: Yes | Contractions: No | Pronominal Clitics: Yes | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 11K | Total Number of Forms: 1M |
Vietnamese (40,000 forms)
Iso: VI | Tier:2 | Lemma: Yes | POS: Yes | Voice: No | Tense: No |
Aspect: No | Mood: No | Person: No | Number: No | Gender: No | Case: No |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 34K | Total Number of Forms: 40K |
Zulu (1 million forms)
Iso: ZU | Tier:3 | Lemma: Yes | POS: Yes | Voice: No | Tense: Yes |
Aspect: No | Mood: Yes | Person: No | Number: Yes | Gender: Yes | Case: Yes |
Degree: No | Definiteness State: No | Negative: No | Contractions: No | Pronominal Clitics: No | Formality: No |
Frequency: Yes | Named Entities: Yes | Offensive: Yes | Category: No | Total Number of Lemmas: 10K | Total Number of Forms: 1M |