Bitext NLP Data Overview

Lexical Level and Lemmatization
At the lexical level, the main component is the lemmatizer, which has integrated tools to perform decompounding or word segmentation (something required by some languages to perform proper lemmatization).
The lemmatizer can be additionally packaged to cover the full pipeline of language analysis, from sentence segmentation to full parsing, and includes tools like spell-checking.
Both components of the lemmatizer, data and software, can be distributed integrated or separately. All these tools are available in 77 languages and 25 language variants.
Bitext Lemmatizer
Syntactic Level and Parsing
At the syntactic level, the parser is the main component. The parser analyzes the structure of the sentences in the text and is used for tasks like POS Tagging and Phrase Extraction. Additionally, it is used as the base component for various semantic level tasks like Named Entity Recognition (NER), Topic-Level Sentiment Analysis or Generation of Synthetic Text. We have developed parsers for 21 languages and are always adding new languages.
For a full list of services, at the lexical, syntactic and semantic levels, check our linguistic services.

Camino de las Huertas, 20, 28223 Pozuelo
Madrid, Spain

541 Jefferson Ave Ste 100, Redwood City
CA 94063, USA