Chatbots

Why Linguistics for Text Analysis?

In previous posts, we have outlined the crucial role of Machine Learning for Analytics (in How to Make Machine Learning more Effective using Linguistic Analysis?), and the implications of using Machine Learning for analyzing and structuring text (in How Phrase Structure helps Machine Learning?).

In a following post, we will explain how Linguistics can complement Machine Learning and how it can be integrated in the same technology stack.

Recapping, the main limitation of Machine Learning for text analytics is that it is “blind” to text structure. And text structure is essential for moving towards text understanding.

This is the first benefit Linguistics provides to data sicentists. Linguistics helps X-ray the internal structure of text.

As the science of language, Linguistics collects knowledge about language (grammars, ontologies, lexicons). This knowledge allows us to understand the structure of language and decompose it in different layers (morphology, syntax, semantics).

By uncovering the structure of a sentence, Linguistics helps us deal with complex phenomena accurately, especially in complex cases where we have similar wordings but entirely different meanings:

  • negation: “I never enjoyed it” as opposed to “I enjoyed it like never before
  • conditionality: “I’ll buy it if they change their pricing policy
  • comparison: “ACME R3 is much better than the Samsung Galaxy

 

Besides, understanding structure allows Linguistics to provide granularity. Granularity is about reading a sentence like “the screen is wonderful but I hate the on-screen keyboard” and identifyings the topics being discussed (screen, on-screen keyboard) and the opinions about those topics (“is wonderful, I hate it”).

Granularity is about detecting that there are two opinions about two topics within the same sentence.

Another advantage that Linguistics provides is the ability to analyze different types of text: from short and informal tweets to lengthy formal legal documents or newswires.

Considering the variety of texts involved in Big Data projects, this is a critical advantage that saves significant efforts in text tagging and algorithm training.

Additionally, engines based on Linguistics allow easily for incremental and consistent improvements.

Fixes can be implemented easily by adding new rules or modifying existing ones, all with predictable results. So moving from the “usual” 70% accuracy to +90% is a matter of customizing the engine.

In summary, Linguistics provides an understanding of text structure that is the base for tackling many different business applications (understanding customers, preventing churn, generating sales leads, detecting risk of loan defaults, etc.), and is likely most beneficial when integrated with machine learning techniques.

Did you like this post? Remember to leave your comments and share!

You could be interested in our Methodology where you could find the process we do setting up and training a bot.

admin

Recent Posts

Integrating Bitext NAMER with LLMs

A robust discussion persists within the technical and academic communities about the suitability of LLMs…

2 days ago

Bitext NAMER Cracks Named Entity Recognition

Chinese, Southeast Asian, and Arabic names require transliteration, often resulting in inconsistent spellings in Roman…

2 weeks ago

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…

6 months ago

Any Solutions to the Endless Data Needs of GenAI?

Discover the advantages of using symbolic approaches over traditional data generation techniques in GenAI. Learn…

7 months ago

From General-Purpose LLMs to Verticalized Enterprise Models

In the blog "General Purpose Models vs. Verticalized Enterprise GenAI," the focus is on the…

8 months ago

Case Study: Finequities & Bitext Copilot – Redefining the New User Journey in Social Finance

Bitext introduced the Copilot, a natural language interface that replaces static forms with a conversational,…

10 months ago