It is always important to evaluate the quality of your chatbots and conversational agents in order to know the its real health, accuracy and efficiency.
Chatbot accuracy can only be increased by constantly evaluating and retraining it with new data that answers your customer’s queries.
Chatbots require large amounts of training data to perform correctly. If you want your chatbot to recognize a specific intent, you need to provide a large number of sentences that express that intent, usually generated by hand. This manual generation is error-prone and can cause erroneous results.
With artificially-generated data. Since Dialogflow is one of the most popular chatbot-building platforms, we chose to perform our tests using it.
We tested how Dialogflow can benefit from the Artificial Training Data approach, comparing chatbots trained using hand-tagged sentences with chatbots that used automatically-generated training data. Our tests show that if we train bots with only 2 or 3 example sentences per intent in Dialogflow, performance suffers. Furthermore, using 10 sentences per intent, there is only minimal improvement.
On the other hand, by extending these hand-tagged corpora with additional variants automatically generated by Artificial Training Data, there is higher overall improvement and chatbot accuracy.
We carried out two different tests (A and B), both using the following 5 intents related to the house lighting management. In the first test (A), we trained two different bots:
In both tests, we observed a significant improvement reaching at least 90% in the chatbot accuracy in both intent detection and slot filling. Do you want to see the results for yourself? Download our Dialogflow Full Benchmark Dataset now.
The Bitext Artificial Training Data service lets you create big training sets with no effort. If you only want to write one or two sentences per intent, our service is able to generate the rest of the variants needed to go from poor results to great chatbot accuracy.
If you would like to get further details, you can check some additional tools here:
A robust discussion persists within the technical and academic communities about the suitability of LLMs…
Chinese, Southeast Asian, and Arabic names require transliteration, often resulting in inconsistent spellings in Roman…
Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…
Discover the advantages of using symbolic approaches over traditional data generation techniques in GenAI. Learn…
In the blog "General Purpose Models vs. Verticalized Enterprise GenAI," the focus is on the…
Bitext introduced the Copilot, a natural language interface that replaces static forms with a conversational,…