How to solve data scarcity for AI

Data scarcity is one of the major bottlenecks for Artificial Intelligence (AI) to reach production levels. The reason is simple: data, or the lack of it, is the number one reason why AI/Natural Language Understanding (NLU) projects fail. So the AI community is working extremely hard to come up with a solution.

As a result, the range of solutions is really wide. These are the two main trends:

Data simulation via software: This approach uses advanced Machine Learning (ML) techniques, like Transfer Learning or Active Learning and other next-generation AI algorithms. The biggest issue here is probably that it’s difficult to predict for which cases these will or won’t work, so it takes multiple experimentation, evaluation and re-training iterations, without any guarantee of significant improvement.
Manual data creation or labelling: There is a wide range of companies that create data from scratch, starting with Amazon Mechanical Turk. This approach produces customized data on demand. The main issue is how to scale it. It is also hard to edit and reuse the data for retraining/adjusting when results are not quite right.

As an intermediate path, a new trend is getting traction: Synthetic/Artificial data generation. This approach actually “writes” the new data using software rather than manual effort. Sometimes, data is produced with the required labeling, using NLP technologies. This approach is promising because it merges the best of both worlds: the scalability of an automatic approach and the data transparency and explainability of a manual approach.

At Bitext, we are working in this space, focused on HMI (Human Machine Interaction) and chatbots. You can download a test dataset and see how synthetic/artificial data works for your case:

For more information, visit www.bitext.com, and follow Bitext on Twitter or LinkedIn.

admin

Next What do you evaluate in your chatbots? Some ideas »

Previous « Bitext mentioned in 4 Gartner Hype Cycle Reports in 2019

Fine-tuning LLM

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…

1 year ago

How to solve data scarcity for AI

Recent Posts

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction

Multilingual Named Entity Recognition for Knowledge Graphs: Supporting 70+ Languages with Precision

How LLM Verticalization Reduces Time and Cost in GenAI-Based Solutions

Integrating Bitext NAMER with LLMs

Bitext NAMER Cracks Named Entity Recognition

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

How to solve data scarcity for AI

Related Post

Recent Posts

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction

Multilingual Named Entity Recognition for Knowledge Graphs: Supporting 70+ Languages with Precision

How LLM Verticalization Reduces Time and Cost in GenAI-Based Solutions

Integrating Bitext NAMER with LLMs

Bitext NAMER Cracks Named Entity Recognition

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction