Chatbots

Evaluate the Quality of your Chatbots and Conversational Agents

It is always important to evaluate the quality of your chatbots and conversational agents in order to know the its real health, accuracy and efficiency.

Chatbot accuracy can only be increased by constantly evaluating and retraining it with new data that answers your customer’s queries.

Chatbots require large amounts of training data to perform correctly. If you want your chatbot to recognize a specific intent, you need to provide a large number of sentences that express that intent, usually generated by hand. This manual generation is error-prone and can cause erroneous results.

How can we solve it?

With artificially-generated data. Since Dialogflow is one of the most popular chatbot-building platforms, we chose to perform our tests using it.

We tested how Dialogflow can benefit from the Artificial Training Data approach, comparing chatbots trained using hand-tagged sentences with chatbots that used automatically-generated training data. Our tests show that if we train bots with only 2 or 3 example sentences per intent in Dialogflow, performance suffers. Furthermore, using 10 sentences per intent, there is only minimal improvement.

On the other hand, by extending these hand-tagged corpora with additional variants automatically generated by Artificial Training Data, there is higher overall improvement and chatbot accuracy.

We carried out two different tests (A and B), both using the following 5 intents related to the house lighting management. In the first test (A), we trained two different bots:

A first bot (A1) was trained with only 12 hand-tagged sentences (2 to 3 sentences per intent). Using those sentences as input, our Bitext Artificial Training Data service generated 391 sentences which, combined with the 12 sentences from bot A1, were used to train a second bot A2 (with around 80 sentences per intent).
The second test (B) was very similar to the first. The only difference was the number of sentences used in the training and evaluation sets. In this case, the first bot (B1) was trained with a hand-tagged training set of 50 sentences (10 per intent).
Using those sentences as input, our Bitext Artificial Training Data service generated 798 sentences which, combined with the 50 sentences from bot B1, were used to train the second bot B2 (with around 170 sentences per intent). We used the same 100 chatbot evaluation sentences from test A as the evaluation set.

In both tests, we observed a significant improvement reaching at least 90% in the chatbot accuracy in both intent detection and slot filling. Do you want to see the results for yourself? Download our Dialogflow Full Benchmark Dataset now.

The Bitext Artificial Training Data service lets you create big training sets with no effort. If you only want to write one or two sentences per intent, our service is able to generate the rest of the variants needed to go from poor results to great chatbot accuracy.

If you would like to get further details, you can check some additional tools here:

Download Dialogflow Benchmark: increase accuracy up to 40%.
Download LUIS Benchmark: increase accuracy up to 40%.
Try real examples from domain leaders in retail, home and news.
Boost the capabilities of your chatbots. Available in 9 languages.
Request a Demo to reach 90% accuracy despite any mistake made by the user.
Check out our FAQs about Chatbots and Virtual Assistants.

admin

Next Knowledge Graph Generation for Financial Databases »

Previous « Synthetic Training Data for Chatbots

Fine-tuning LLM

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…

1 year ago

Evaluate the Quality of your Chatbots and Conversational Agents

How can we solve it?

Recent Posts

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction

Multilingual Named Entity Recognition for Knowledge Graphs: Supporting 70+ Languages with Precision

How LLM Verticalization Reduces Time and Cost in GenAI-Based Solutions

Integrating Bitext NAMER with LLMs

Bitext NAMER Cracks Named Entity Recognition

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Evaluate the Quality of your Chatbots and Conversational Agents

How can we solve it?

Related Post

Recent Posts

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction

Multilingual Named Entity Recognition for Knowledge Graphs: Supporting 70+ Languages with Precision

How LLM Verticalization Reduces Time and Cost in GenAI-Based Solutions

Integrating Bitext NAMER with LLMs

Bitext NAMER Cracks Named Entity Recognition

Deploying Successful GenAI-based Chatbots with less Data and more Peace of Mind.

Bitext NAMER: Slashing Time and Costs in Automated Knowledge Graph Construction