AI

Machine Learning & Deep Linguistic Analysis in Text Analytics

Text analysis is becoming a pervasive task in many business areas. Machine Learning is the most common approach used in text analysis and is based on statistical and mathematical models.

Linguistic approaches, which are based on knowledge of language and its structure, are far less frequently used. These two approaches are often seen as alternative or competing approaches.

This view is a major obstacle to the progress of the Big Data industry, where text represents a large percentage of big data.

The two approaches are indeed complementary and cooperative approaches that, when properly combined, provide the most effective way of extracting high-quality insights from big data.

The misconception that these two approaches compete predominates in the industry. We disagree: machine learning and linguistic approaches can work together.

In fact, they should: linguistic approaches are ideal for understanding language and providing it with structure; machine learning cannot understand this structure but needs it to extract accurate insights from text data. So each discipline has a “sweet spot“.

Linguistic Analysis is in a better position to extract structure from text. On the one hand, Machine Learning typically handles text in a “naïve“ way, as a flat set of strings (using different versions of the classical “bag of words“ approach). So sentences like “dog bites man“ and “man bites dog“ look the same.

This poses a limitation on the amount of information that Machine Learning can extract.

On the other hand, Deep Linguistic Analysis is based on knowledge about language (grammar, ontologies and dictionaries) and it can handle the structure of language at all levels (morphology, syntax and semantics).

By taking into account the structure of language, Deep Linguistic Analysis understands complex phenomena like negation (“I never liked it“) and conditionality (“I’d like it if it were cheaper“) accurately, especially in complex cases where two sentences have a similar wording but entirely different meanings (like “I don’t plan to buy this product” and “if I don’t buy this product today I can buy it tomorrow”).

So Deep Linguistic Analysis is specifically designed to find the structure in (apparently) unstructured text.

However, Machine Learning is in a better position to extract insights (from previously analyzed and structured text, rather than unstructured), while Linguistics has nothing to do with insight extraction.

And we can take advantage of these two facts if we do things in the right order.

First, Deep Linguistic Analysis generates a rich and accurate representation of the structure of texts; second, Machine Learning uses this structure to extract insights from actual features, which is the task that it naturally excels at.

 

admin

Recent Posts

Integrating Bitext NAMER with LLMs

A robust discussion persists within the technical and academic communities about the suitability of LLMs…

2 days ago

Bitext NAMER Cracks Named Entity Recognition

Chinese, Southeast Asian, and Arabic names require transliteration, often resulting in inconsistent spellings in Roman…

2 weeks ago

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…

6 months ago

Any Solutions to the Endless Data Needs of GenAI?

Discover the advantages of using symbolic approaches over traditional data generation techniques in GenAI. Learn…

7 months ago

From General-Purpose LLMs to Verticalized Enterprise Models

In the blog "General Purpose Models vs. Verticalized Enterprise GenAI," the focus is on the…

8 months ago

Case Study: Finequities & Bitext Copilot – Redefining the New User Journey in Social Finance

Bitext introduced the Copilot, a natural language interface that replaces static forms with a conversational,…

10 months ago