Databricks Partner
As Databrick partners for Data and Fine-tuning of LLMs, Bitext integrates open source machine learning advancements and Lakehouse models for data warehousing. Our engagement with data integration tools like dbt and Fivetran, and Databricks SQL, enhances our data management capabilities.
With the significant growth in SaaS LLM APIs usage and ML applications, dbt has emerged as the most rapidly expanding tool, indicating a surge in efficient data manipulation within the Databricks Lakehouse. The shift towards Delta Lake from on-prem and cloud data warehouses signifies an industry move towards more efficient data science (DS) methodologies aimed at accelerating growth, improving predictability, and enhancing customer experiences.
Our expertise spans a range of DS/ML Applications, such as Speech Recognition, Simulations & Optimizations, Recommender Systems, Natural Language Processing, and Industry Data Modeling. By employing specialized Python libraries including NLTK, Transformers, and FuzzyWuzzy, as well as Transformer-related libraries and LLM tools like LangChain, we tackle complex DS/ML projects. The use of the MLflow Tracking Server and MLflow Model Registry is central to our approach in managing project tracking and model lifecycle.
At Bitext, our collaboration with Databrick goes beyond data integration and business intelligence (BI). It’s about leveraging the latest in data science and machine learning to offer solutions that are not just successful but also sustainable, ensuring our clients can efficiently adapt to new technologies and methodologies in AI and ML.
Consulting
Bitext Builds Datasets for Fine-tuning LLMs
Our Data Preparation Solution Has Two Components
1. We Leverage Your Internal Data Sources
Data Collection
We help you identify and collect high-quality datasets that align with your specific application and domain. Our team of experts assists in sourcing diverse and relevant data, ensuring that you have a robust foundation for training your LLM.
Data Cleaning and Preprocessing
Our advanced data cleaning and preprocessing techniques ensure that your training data is of the highest quality. We apply data cleaning algorithms, handle noisy or irrelevant samples, and perform any necessary data transformations to optimize the dataset for fine-tuning.
Annotation and Labeling
If your LLM requires annotated or labelled data, we offer efficient annotation services. Our experienced annotators precisely label the data based on your specific requirements, whether it’s sentiment analysis, named entity recognition, or any other custom annotation task.
2. We Expand Your Internal Data with Synthetic Text (NLG)
Data Augmentation
Enhance the diversity and richness of your training data through our data augmentation techniques. We generate synthetic samples, perform data synthesis, and apply data augmentation algorithms to expand the size and variety of your dataset.
Privacy and Compliance
We understand the importance of data privacy and compliance. Rest assured that your data will be handled with the utmost confidentiality and in compliance with applicable data protection regulations.
Customization and Flexibility
We tailor our data preparation services to meet your unique needs. Whether you require domain-specific data, specific data formats, or custom preprocessing steps, we work closely with you to deliver a solution that aligns with your objectives.
Collaboration and Support
Our dedicated team of data scientists and engineers collaborate closely with you throughout the data preparation process. We provide guidance, support, and expertise to ensure that your data is prepared to maximize the performance of your LLM.
Datasets for Fine-tuning LLMs
Training LLMs to respond accurately and efficiently across a variety of communication scenarios demands meticulous attention and linguistic capabilities. The multilingual datasets we provide are specifically tailored to enhance the performance of these advanced NLP and Generative AI models.
Our datasets are distinguished by:
- Extensive Contextual Variety: We develop datasets that reflect wide-ranging interaction scenarios. This allows LLMs to adapt and be effective in countless environments, from customer support to business data management.
- Linguistic Diversity and Register: We account for the various ways users communicate, whether in a formal tone or in everyday colloquial language, ensuring the models are prepared for any type of interaction.
- Innovation in Realistic Noise Generation: We incorporate “noisy” elements, such as common spelling and punctuation errors found in human communication, to strengthen the robustness of the models when faced with imperfect data.
- Adaptation to Constant Changes: Industries evolve, and so do the ways we communicate. Therefore, we continually update our datasets to keep LLMs abreast of current linguistic trends and needs.
The excellence of our datasets for LLMs are the direct result of decades of research and development in computational linguistics. Our expertise in creating hybrid data, which blends advanced synthetic techniques with meticulous expert supervision, has set new standards in the training and fine-tuning of linguistic models; Bitext allows AI systems to process and understand human language with unparalleled complexity and nuance.
Language Register Variations – Tailored Communication
Creating conversational agents that can smoothly interact with users requires a deep understanding of language registers. Our datasets are enriched with a spectrum of linguistic registers, ranging from formal business exchanges to casual everyday conversations. This enables the fine-tuning of Large Language Models (LLMs) to fit the tone and style appropriate for diverse communication contexts.
Recognizing the tone, employing the right language, and grasping the context are key for AI to resonate with users from various cultural backgrounds. Whether it’s an official inquiry or an informal chat, our datasets equip LLMs to respond suitably, enhancing the user experience and the accuracy of AI conversations.
For a comprehensive view of how these linguistic attributes are annotated and tailored within our datasets to meet the dynamic needs of language-based AI applications, please see our focused exposition:
Explore the Linguistic Features
With Bitext’s tools at your disposal, you can confidently fine-tune your AI to provide cohesive and contextually aware communication, mirroring the richness and diversity of human interaction.
Realism through Noise – Enhanced Robustness
To make the training data more robust and lifelike, we introduce noise, such as spelling mistakes, spacing errors, and missing punctuation. This prepares our Prebuilt Chatbots to handle the type of “noisy” input they might encounter in real-life interactions.
List of Fine-Tuning LLM Verticals
At Bitext, we understand that specialization and adaptability are essential for the seamless operation of automated customer support services. That’s why we are dedicated to fine-tuning large language models (LLMs) to deliver precise, industry-tailored results. Regardless of whether you’re in the automotive sector, academia, or even the intricate world of healthcare, Bitext has specialized datasets to meet the specific needs of any vertical.
We meticulously cater to each industry to facilitate understanding and improve responses to the most common inquiries. By integrating our vertical datasets, we ensure that your customer support systems are equipped to interact with and satisfy a wide array of linguistic demands. Simulating linguistic variations and common writing errors also contributes to the resilience of your system against the unpredictability of everyday language.
We encourage you to explore our range of verticals and download our datasets for evaluation. Learn more about us and discover how vertical-specific data optimization can strengthen the effectiveness of your customer support systems.
Bitext’s LLM Evaluation Methodology for Conversational AI
Overview
Our main advantage is that Bitext automates most steps in the evaluation pipeline, including the generation of an evaluation dataset, which is a critical step in the absence of historical evaluation data.
This semi-supervised process is based on standard accuracy metrics (like the F1-score, that takes into account both precision and recall together). The analysis of these metrics is then compiled in a report highlighting strengths and weaknesses, both at the bot level and at the intent level.
The process combines software tools, evaluation data and expert insights in one single methodology. This methodology is transparent and easy to explain to end users.
The Evaluation Dataset for Fine-tuning LLMs
Data & Flags
The key to this process is a rich proprietary dataset designed for evaluation that contains thousands of utterances per intent. These utterances are tagged with intent information, so there is no need to manually tag them.
Also, these utterances are categorized with flags according to their linguistic features
- Language register: colloquial, formal…
- Regional variant: UK/US English; Spain/Mexico Spanish; Canada/France French …
- And more: offensive language, spelling errors, punctuation errors…
These flags are key to automatically evaluating the accuracy of the chatbot in different use environments; they permit the chatbot to perform seamlessly with users of virtually any demographic.
The Evaluation Methodology
The evaluation methodology is built on an iterative process to train the conversational AI model, evaluate performance, retrain and remeasure performance. This iterative process provides systematic performance improvements.
The evaluation system is designed as a continuous improvement process that is implemented in cycles:
- Select training dataset
- Train conversational AI model
- Select evaluation dataset
- Evaluate trained conversational AI model
- Identify accuracy gaps
- Identify problems and fixes
- Re-train with new fixes
- Re-evaluate to measure improvements
Datasets
Pre-Built Datasets to train your LLMs
- Data Services for Enterprise Generative AI: Data Creation & Evaluation, Model Finetuning & Verticalization
- Text Annotation Tools to tag your data with Linguistic Knowledge: POS, NER, Topic
- Lexical and Semantic Data for NLP applications in 77 languages and 25 variants
- Synthetic Text Generation Tools to produce custom data with NLG technology
- Pre-Built Datasets to train and evaluate your assistant/chatbot
Datasets Available
Our Prebuilt QA datasets are designed to deal with language register variations including polite/formal, colloquial and offensive language. We have profiled the language register use in user queries from a wide range of business sectors, and we use this information to generate training data with a similar profile, ensuring maximum linguistic coverage.
We also introduce noise into the training data, including spelling mistakes, run-on words and missing punctuation. This realistic data makes our Prebuilt Datasets more resilient in the face of “noisy” input that is common in real life.
Your Title Goes Here
Automotive (List of Intents)
English |
CATEGORY_intent |
APPOINTMENT_cancel_appointment |
APPOINTMENT_reschedule_appointment |
APPOINTMENT_schedule_appointment |
BILLING_change_billing_information |
BILLING_dispute_invoice |
BILLING_invoices |
BILLING_set_up_billing_information |
CONTACT_customer_service |
CONTACT_finance_department |
CONTACT_human_agent |
CONTACT_roadside_assistance |
CONTACT_service_center |
DEALERSHIP_availability_vehicle |
DEALERSHIP_find_dealer |
FINANCE_buy_vehicle |
FINANCE_information_payment_in_full |
FINANCE_leasing |
FINANCE_pay_in_installments |
INFORMATION_diesel_vehicles |
INFORMATION_electric_vehicles |
INFORMATION_hybrid_electric_vehicles |
INFORMATION_hybrid_vehicles |
INFORMATION_offers |
INFORMATION_petrol_vehicles |
INFORMATION_pre-owned_vehicles |
LEASE_change_due_date |
LEASE_change_leasing_information |
MAINTENANCE_cancel_maintenance_plan |
MAINTENANCE_get_manual |
MAINTENANCE_information_maintenance_plans |
MAINTENANCE_sign_up_maintenance_plan |
PARTS_ACCESSORIES_buy_accessories |
PARTS_ACCESSORIES_buy_parts |
PARTS_ACCESSORIES_information_accessories |
PARTS_ACCESSORIES_information_parts |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
REPAIRS_check_status_repairs |
REPAIRS_find_closest_garage |
REPAIRS_loaner_vehicle |
REPAIRS_request_loaner_vehicle |
SERVICE_cancel_service_appointment |
SERVICE_cancel_service_plan |
SERVICE_information_service_plans |
SERVICE_reschedule_service_appointment |
SERVICE_schedule_service_appointment |
SERVICE_sign_up_service_plan |
WARRANTY_buy_extended_coverage |
WARRANTY_check_coverage |
WARRANTY_check_start_date_warranty |
WARRANTY_download_warranty |
Retail Banking (List of Intents)
English |
CATEGORY_intent |
ACCOUNT_check_recent_transactions |
ACCOUNT_close_account |
ACCOUNT_create_account |
ATM_dispute_ATM_withdrawal |
ATM_recover_swallowed_card |
CARD_activate_card |
CARD_activate_card_international_usage |
CARD_block_card |
CARD_cancel_card |
CARD_check_card_annual_fee |
CARD_check_current_balance_on_card |
CONTACT_customer_service |
CONTACT_human_agent |
FEES_check_fees |
FIND_find_ATM |
FIND_find_branch |
LOAN_apply_for_loan |
LOAN_apply_for_mortgage |
LOAN_cancel_loan |
LOAN_cancel_mortgage |
LOAN_check_loan_payments |
LOAN_check_mortgage_payments |
PASSWORD_get_password |
PASSWORD_set_up_password |
TRANSFER_cancel_transfer |
TRANSFER_make_transfer |
Education (List of Intents)
English |
CATEGORY_intent |
ACCOMMODATION_accommodation |
AFTER_ADMISSION_change_program |
AFTER_ADMISSION_decline_admission_offer |
APPLICATION_INFORMATION_REQUEST_admission_requirements |
APPLICATION_INFORMATION_REQUEST_admission_requirements_international_students |
APPLICATION_INFORMATION_REQUEST_application_deadlines |
APPLICATION_INFORMATION_REQUEST_contact_admission_counseling_service |
APPLICATION_INFORMATION_REQUEST_documents_required_apply |
APPLICATION_INFORMATION_REQUEST_language_requirements |
APPLICATION_INFORMATION_REQUEST_medical_requirements |
CONTACT_human_agent |
DEGREE_INFORMATION_REQUEST_career_opportunities |
DEGREE_INFORMATION_REQUEST_information_degree |
FINANCIAL_AID_accept_admission_offer |
FINANCIAL_AID_apply_loan |
FINANCIAL_AID_information_scholarships |
FINANCIAL_AID_requirements_loan |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
POLICIES_university_polices |
STUDENT_PORTAL_course_schedule |
STUDENT_PORTAL_find_student_ID |
STUDENT_PORTAL_grades_report |
STUDENT_PORTAL_recover_password |
STUDENT_PORTAL_report_student_portal_issue |
STUDENT_PORTAL_sign_up_student_portal |
STUDENT_SUPPORT_contact_student_support |
UNIVERSITY_APPLICATION_PROCESS_application_status |
UNIVERSITY_APPLICATION_PROCESS_change_application |
UNIVERSITY_APPLICATION_PROCESS_sign_up_course |
UNIVERSITY_APPLICATION_PROCESS_submit_application |
UNIVERSITY_APPLICATION_PROCESS_withdraw_application |
UNIVERSITY_INFORMATION_REQUEST_information_campus |
UNIVERSITY_INFORMATION_REQUEST_information_programs |
UNIVERSITY_INFORMATION_REQUEST_information_registration_fees |
UNIVERSITY_INFORMATION_REQUEST_information_university |
Events & Ticketing (List of Intents)
English |
CATEGORY_intent |
CANCELLATIONS_cancel_ticket |
CANCELLATIONS_check_cancellation_fee |
CANCELLATIONS_check_cancellation_policy |
CANCELLATIONS_track_cancellation |
CONTACT_customer_service |
CONTACT_event_organizer |
CONTACT_human_agent |
DELIVERY_delivery_options |
DELIVERY_delivery_period |
EVENTS_find_upcoming_events |
EVENTS_information_about_type_events |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
POLICY_check_privacy_policy |
REFUNDS_check_refund_policy |
REFUNDS_get_refund |
REFUNDS_track_refund |
TICKETS_buy_ticket |
TICKETS_change_personal_details_on_ticket |
TICKETS_find_ticket |
TICKETS_information_about_tickets |
TICKETS_sell_ticket |
TICKETS_transfer_ticket |
TICKETS_upgrade_ticket |
Field Service (List of Intents)
English |
CATEGORY_intent |
APPOINTMENT_cancel |
APPOINTMENT_place |
APPOINTMENT_quote |
APPOINTMENT_reschedule |
APPOINTMENT_schedule |
APPOINTMENT_technician |
APPOINTMENT_time_arrival |
BILLING_check_bill |
CONTACT_customer_service |
CONTACT_human_agent |
CONTACT_technical_support |
FEEDBACK_file_complaint |
FEEDBACK_leave_review |
GENERAL_INFORMATION_location |
GENERAL_INFORMATION_rates |
GENERAL_INFORMATION_service_hours |
PAYMENT_pay |
PAYMENT_payment_methods |
QUOTE_accept_quote |
QUOTE_change_quote |
QUOTE_decline_quote |
SERVICES_emergencies |
SERVICES_information |
SERVICES_inspection |
SERVICES_installation |
SERVICES_maintenance |
SERVICES_repairs |
Healthcare (List of Intents)
English |
CATEGORY_intent |
ADMISSION_PROCESS_information_about_the_admission_process |
APPOINTMENT_cancel_appointment |
APPOINTMENT_request_a_referral |
APPOINTMENT_reschedule_appointment |
APPOINTMENT_schedule_appointment |
BILLING_change_billing_information |
BILLING_dispute_invoice |
BILLING_invoices |
BILLING_set_up_billing_information |
CONTACT_admissions_office |
CONTACT_billing_department |
CONTACT_contact_information |
CONTACT_health_professional |
CONTACT_patient |
CONTACT_technical_support |
EMERGENCY_emergencies |
EMERGENCY_get_directions |
EMERGENCY_information_about_emergency_rooms |
HEALTH_INFORMATION_clinical_trials |
HEALTH_INFORMATION_events |
HEALTH_INFORMATION_health_advice |
LAB_RESULTS_information_lab_results |
LAB_RESULTS_see_lab_results |
LEGAL_medical_records |
LEGAL_patient_rights |
LEGAL_privacy_policy |
LOCATION_AND_DIRECTION_check_location |
LOCATION_AND_DIRECTION_directions |
LOCATION_AND_DIRECTION_find_healthcare_center |
LOCATION_AND_DIRECTION_parking |
LOCATION_AND_DIRECTION_public_transportation |
PATIENT_PORTAL_access_patient_portal |
PATIENT_PORTAL_information_about_patient_portal |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
REVIEW_file_complaint |
REVIEW_leave_review |
VISITING_INFORMATION_patient_&_visitor_guide |
VISITING_INFORMATION_visiting_hours |
Hospitality (List of Intents)
English |
CATEGORY_intent |
BILLING_invoices |
CANCELLATION_FEES_cancellation_fees |
CHECK_IN_check_in |
CHECK_OUT_check_out |
CONTACT_human_agent |
EVENT_host_event |
FEEDBACK_file_complaint |
FEEDBACK_leave_review |
HOTEL_book_hotel |
HOTEL_cancel_hotel_reservation |
HOTEL_change_hotel_reservation |
HOTEL_check_hotel_facilities |
HOTEL_check_hotel_offers |
HOTEL_check_hotel_prices |
HOTEL_check_hotel_reservation |
HOTEL_search_hotel |
LUGGAGE_store_luggage |
MENU_check_menu |
NIGHT_add_night |
PARKING_SPACE_book_parking_space |
PETS_bring_pets |
POINTS_redeem_points |
REFUND_get_refund |
SHUTTLE_SERVICE_shuttle_service |
Insurance (List of Intents)
English |
CATEGORY_intent |
AUTO_INSURANCE_information_auto_insurance |
CLAIMS_accept_settlement |
CLAIMS_file_claim |
CLAIMS_negotiate_settlement |
CLAIMS_receive_payment |
CLAIMS_reject_settlement |
CLAIMS_track_claim |
COMPLAINTS_appeal_denied_insurance_claim |
COMPLAINTS_dispute_invoice |
COMPLAINTS_file_complaint |
CONTACT_customer_service |
CONTACT_human_agent |
CONTACT_insurance_representative |
COVERAGE_change_coverage |
COVERAGE_check_coverage |
COVERAGE_downgrade_coverage |
COVERAGE_upgrade_coverage |
ENROLLMENT_buy_insurance_policy |
ENROLLMENT_cancel_insurance_policy |
ENROLLMENT_cancellation_fees |
ENROLLMENT_compare_insurance_policies |
GENERAL_INFORMATION_general_information |
HEALTH_INSURANCE_information_health_insurance |
HOME_INSURANCE_information_home_insurance |
INCIDENTS_report_incident |
INCIDENTS_schedule_appointment |
LIFE_INSURANCE_information_life_insurance |
PAYMENT_check_payments |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
PAYMENT_schedule_payments |
PET_INSURANCE_information_pet_insurance |
POLICY_change_personal_details |
QUOTE_calculate_insurance_quote |
QUOTE_check_rates |
RENEW_renew_insurance_policy |
TRAVEL_INSURANCE_information_travel_insurance |
Legal Services (List of Intents)
English |
CATEGORY_intent |
BILLING_change_billing_information |
BILLING_dispute_invoice |
BILLING_invoices |
BILLING_set_up_billing_information |
CLAIMS_file_claim |
CLAIMS_track_claim |
CONTACT_customer_service |
CONTACT_human_agent |
CONTACT_lawyer |
GENERAL_INFORMATION_law_firm |
GENERAL_INFORMATION_legal_services |
GENERAL_INFORMATION_litigation_process |
GENERAL_INFORMATION_work_with_lawyer |
LEGAL_PLANS_buy_legal_plan |
LEGAL_PLANS_cancel_legal_plan |
LEGAL_PLANS_change_coverage |
LEGAL_PLANS_change_personal_details |
LEGAL_PLANS_check_coverage |
LEGAL_PLANS_compare_legal_plans |
LEGAL_PLANS_downgrade_legal_plan |
LEGAL_PLANS_information_legal_plans |
LEGAL_PLANS_renew_legal_plan |
LEGAL_PLANS_upgrade_legal_plan |
PAYMENT_fees |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
PRIVACY_AND_SECURITY_privacy_policy |
PRIVACY_AND_SECURITY_security |
Manufacturing (List of Intents)
English |
CATEGORY_intent |
BILLING_change_billing_information |
BILLING_dispute_invoice |
BILLING_invoices |
BILLING_set_up_billing_information |
COMPANY_brand |
COMPANY_company |
COMPANY_customers |
COMPANY_earnings |
COMPANY_events |
COMPANY_facilities |
COMPANY_latest_news |
COMPANY_management_team |
CONTACT_customer_service |
CONTACT_human_agent |
CONTACT_sales_representative |
DELIVERY_delivery_period |
LEGAL_certifications |
ORDER_cancel_order |
ORDER_place_order |
ORDER_product_configuration |
ORDER_track_order |
PRODUCT_description |
PRODUCT_download_documentation |
PRODUCT_warranty |
QUOTE_accept_quote |
QUOTE_change_quote |
QUOTE_decline_quote |
QUOTE_request_quote |
SERVICES_information_services |
SHIPPING_change_shipping_adress |
SHIPPING_set_up_shipping_adress |
SHIPPING_shipping_points |
SUPPLY_CHAIN_product_process |
SUPPLY_CHAIN_supply_chain |
Media Streaming (List of Intents)
English |
CATEGORY_intent |
CONTACT_customer_service |
CONTACT_human_agent |
CONTENT_report_copyright_infringement |
CONTENT_report_inappropiate_content |
FUNCTIONING_devices |
FUNCTIONING_general_use |
FUNCTIONING_quickstart_guide |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
PROGRAM_SCHEDULE_program_schedule |
PROGRAM_SCHEDULE_releases |
SETTINGS_change_language |
SETTINGS_change_subtitle_language |
SETTINGS_parental_control |
SETTINGS_recover_password |
SUBSCRIPTION_cancel_subscription |
SUBSCRIPTION_change_subscription |
SUBSCRIPTION_free_trial |
SUBSCRIPTION_premium_subscription |
SUBSCRIPTION_renew_subscription |
SUBSCRIPTION_subscribe |
SUBSCRIPTION_subscription |
SUBSCRIPTION_subscription_prices |
Mortgages & Loans (List of Intents)
English |
CATEGORY_intent |
CONTACT_contact_agent |
CONTACT_customer_service |
CONTACT_human_agent |
FEES_check_late_payment_fee |
FEES_lock_interest_rate |
INFORMATION_REQUEST_borrowing_limit |
INFORMATION_REQUEST_check_application_process |
INFORMATION_REQUEST_check_application_requirements |
INFORMATION_REQUEST_check_fees |
INFORMATION_REQUEST_check_loans |
INFORMATION_REQUEST_compare_loans |
INFORMATION_REQUEST_estimate_loan_payment |
LOAN_APPLICATION_PROCESS_change_application |
LOAN_APPLICATION_PROCESS_check_application_status |
LOAN_APPLICATION_PROCESS_closing |
LOAN_APPLICATION_PROCESS_submit_documentation |
LOAN_APPLICATION_PROCESS_withdraw_application |
LOAN_APPLICATION_apply_for_joint_loan |
LOAN_APPLICATION_apply_for_loan |
LOAN_APPLICATION_consolidate_debt |
LOAN_APPLICATION_reapply_for_loan |
LOAN_MODIFICATIONS_add_co-borrower |
LOAN_MODIFICATIONS_change_due_date |
LOAN_MODIFICATIONS_extend_loan |
PAYMENT_check_loan_terms |
PAYMENT_check_repayment_methods |
PAYMENT_make_additional_payments |
PAYMENT_pay_off_loan |
PAYMENT_refinance_loan |
PAYMENT_request_payment_arrangement |
PAYMENT_split_payment |
PAYMENT_turn_off_recurring_payments |
PAYMENT_turn_on_recurring_payments |
PERSONAL_INFORMATION_change_personal_data |
PERSONAL_INFORMATION_change_preferred_bank_account |
PERSONAL_INFORMATION_check_credit_report |
PERSONAL_INFORMATION_check_credit_score |
PERSONAL_INFORMATION_check_loan_details |
PERSONAL_INFORMATION_check_privacy_policy |
Moving & Storage (List of Intents)
English |
CATEGORY_intent |
COMPANY_check_company |
COMPANY_check_moves |
COMPLAINT_file_complaint |
CONTACT_customer_service |
CONTACT_human_agent |
FEEDBACK_file_complaint |
MOVE_MANAGEMENT_cancel_move |
MOVE_MANAGEMENT_delay_move |
MOVE_PREPARATION_check_delivery_options |
MOVE_PREPARATION_information_quotes |
MOVE_PREPARATION_prepare_move |
MOVE_PREPARATION_request_quote |
MOVING_PROCESS_information_delivery |
MOVING_PROCESS_information_pick_up |
MOVING_PROCESS_search_for_tracking_number |
MOVING_PROCESS_track_shipment |
PACKING_AND_ITEMS_information_packing |
PACKING_AND_ITEMS_move_dangerous_items |
PACKING_AND_ITEMS_move_special_items |
PACKING_AND_ITEMS_transport_pets |
PAPERWORK_AND_DOCUMENTS_check_insurance |
PAPERWORK_AND_DOCUMENTS_information_bill_of_lading |
PAPERWORK_AND_DOCUMENTS_information_order_for_service |
PAPERWORK_AND_DOCUMENTS_report_contract_issue |
PAPERWORK_AND_DOCUMENTS_sign_order_for_service |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
STORAGE_rent_storage_unit |
Real Estate/Construction (List of Intents)
English |
CATEGORY_intent |
ACCOUNT_change_account |
ACCOUNT_create_account |
ACCOUNT_delete_account |
ACCOUNT_edit_account |
APPOINTMENT_cancel_appointment |
APPOINTMENT_reschedule_appointment |
APPOINTMENT_schedule_appointment |
CHARACTERISTICS_check_accessibility |
CHARACTERISTICS_check_asking_price |
CHARACTERISTICS_check_availability |
CHARACTERISTICS_check_characteristics |
CHARACTERISTICS_check_equipment |
CHARACTERISTICS_check_location |
CHARACTERISTICS_check_number_of_rooms |
CHARACTERISTICS_check_size |
CONTACT_customer_service |
CONTACT_human_agent |
CONTACT_owner |
LIST_PROPERTY_add_pictures |
LIST_PROPERTY_change_asking_price |
LIST_PROPERTY_change_rent_to_sale |
LIST_PROPERTY_create_listing |
LIST_PROPERTY_delete_pictures |
LIST_PROPERTY_edit_listing |
LIST_PROPERTY_remove_listing |
LOOK_FOR_PROPERTY_look_for_property |
REPORT_report_listing |
VISITING_HOURS_visiting_hours |
Restaurant & Bar Chains (List of Intents)
English |
CATEGORY_intent |
CATERING_cancel_catering |
CATERING_change_catering |
CATERING_information_about_catering |
CATERING_order_catering |
COMPANY_information_about_company |
COMPANY_locations |
CONTACT_customer_service |
CONTACT_human_agent |
EVENTS_events |
FEEDBACK_file_complaint |
FEEDBACK_leave_review |
FRANCHISE_apply_for_franchise |
FRANCHISE_find_franchise |
FRANCHISE_franchising |
LEGAL_privacy_policy |
MENU_check_menu |
MENU_check_offers |
MENU_information_about_allergens |
ONLINE_ORDER_cancel_order |
ONLINE_ORDER_change_order |
ONLINE_ORDER_delivery_time |
ONLINE_ORDER_order_food_online |
ONLINE_ORDER_order_issue |
ONLINE_ORDER_track_order |
PAYMENT_payment_methods |
PAYMENT_report_payment_issue |
RESERVATIONS_cancel_reservation |
RESERVATIONS_change_reservation |
RESERVATIONS_make_reservation |
RESTAURANT_find_restaurant |
Retail/E-commerce (List of Intents)
English |
CATEGORY_intent |
ACCOUNT_change_account |
ACCOUNT_order_history |
ACCOUNT_recover_password |
APP_WEBSITE_technical_issue |
APP_WEBSITE_website_functionality |
CONTACT_human_agent |
DELIVERY_damaged_item |
DELIVERY_damaged_package |
DELIVERY_delivery_issue |
DELIVERY_missing_item |
DELIVERY_shipping_costs |
DELIVERY_wrong_item |
FEEDBACK_submit_consumer_feedback |
ORDER_cancel_order |
ORDER_change_order |
ORDER_order_status |
ORDER_request_invoice |
PAYMENT_pay |
PAYMENT_report_payment_issue |
PRODUCT_availability |
PRODUCT_exchange_product |
PRODUCT_exchange_status |
PRODUCT_product_information |
PRODUCT_product_issue |
PRODUCT_refund_policy |
PRODUCT_refund_status |
PRODUCT_request_refund |
PRODUCT_return_order |
PRODUCT_return_policy |
PRODUCT_submit_product_feedback |
PRODUCT_submit_product_idea |
STORE_store_location |
STORE_store_opening_hours |
USER_request_right_to_rectification |
Telecommunications (List of Intents)
English |
CATEGORY_intent |
BILLING_check_bill |
BILLING_dispute_bill |
COMPLAINTS_get_compensation |
COMPLAINTS_report_poor_signal_coverage |
COMPLAINTS_report_problem |
CONSUMPTION_check_excess_data_charges |
CONSUMPTION_check_usage |
CONSUMPTION_set_usage_limits |
CONTACT_customer_service |
CONTACT_human_agent |
PAYMENT_pay |
PAYMENT_payment_methods |
PAYMENT_schedule_payments |
SERVICES_activate_call_management_services |
SERVICES_activate_phone |
SERVICES_activate_roaming |
SERVICES_check_internet_availability |
SERVICES_check_signal_coverage |
SERVICES_deactivate_call_management_services |
SERVICES_deactivate_phone |
SERVICES_install_internet |
SUBSCRIPTION_cancel_plan |
SUBSCRIPTION_change_plan |
SUBSCRIPTION_change_provider |
SUBSCRIPTION_check_cancellation_fee |
SUBSCRIPTION_sign_up_for_plan |
Travel (List of Intents)
English |
CATEGORY_intent |
ARRIVAL_TIME_check_arrival_time |
BAGGAGE_checked_baggage_allowance |
BOARDING_PASS_get_boarding_pass |
BOARDING_PASS_print_boarding_pass |
BOOK_book_flight |
BOOK_book_trip |
CANCELLATION_FEES_cancellation_fees |
CANCEL_cancel_flight |
CANCEL_cancel_trip |
CHANGE_change_flight |
CHANGE_change_trip |
CHECK_IN_check_in |
CHECK_PRICES_check_flight_prices |
CONTACT_human_agent |
DEPARTURE_TIME_check_departure_time |
FLIGHT_STATUS_check_flight_status |
INSURANCE_check_flight_insurance_coverage |
INSURANCE_check_trip_insurance_coverage |
INSURANCE_purchase_flight_insurance |
INSURANCE_purchase_trip_insurance |
INSURANCE_search_flight_insurance |
INSURANCE_search_trip_insurance |
OFFERS_check_flight_offer |
OFFERS_check_trip_offers |
PRICES_check_trip_prices |
REFUND_get_refund |
RESERVATION_check_flight_reservation |
SEARCH_search_flight |
SEARCH_search_trip |
SEAT_change_seat |
SEAT_choose_seat |
TRIP_DETAILS_check_trip_details |
TRIP_PLAN_check_trip_plan |
Utilities (List of Intents)
English |
CATEGORY_intent |
ACCOUNT_change_account_holder |
BILLING_invoices |
COMPLAINTS_complaints |
CONSUMPTION_consumption |
CONTACT_customer_service |
CONTACT_human_agent |
CONTRACT_cancel_contract |
HOUSE_moving_house |
INSPECTION_request_inspection |
MAINTENANCE_maintenance |
OUTAGES_check_outages |
PAYMENT_pay |
RATE_check_rates |
RATE_compare_rates |
REPAIR_available_repair_times |
REPAIR_cost_repair |
REPAIR_request_repair |
SERVICE_service |
SIGN_UP_sign_up_services |
SUBSCRIPTION_cancellation_fees |
SWITCH_switch_provider |
Wealth Management (List of Intents)
English |
CATEGORY_intent |
ACCOUNT_create_account |
ACCOUNT_delete_account |
ACCOUNT_recover_password |
ACCOUNT_requirements_to_create_account |
BECOME_CLIENT_arrange_meeting |
BECOME_CLIENT_become_client |
BECOME_CLIENT_calculate_portfolio_risk |
BECOME_CLIENT_check_fees |
BECOME_CLIENT_check_services |
BECOME_CLIENT_get_manager |
BECOME_CLIENT_minimum_amount_to_invest |
BECOME_CLIENT_run_simulator |
CONTACT_contact_manager |
CONTACT_customer_service |
CONTACT_human_agent |
MANAGEMENT_implement_own_plan |
MANAGEMENT_transfer_money_to_account |
MANAGEMENT_withdraw_money_from_account |
PORTFOLIO_check_balances |
PORTFOLIO_check_portfolio |
PORTFOLIO_portfolio_performance |
PORTFOLIO_portfolio_value |
PORTFOLIO_search_for_stocks |
PORTFOLIO_set_price_alert |
How did we select these intents?
1. Select a representative set of texts about the domain
2. Extract frequent actions by parsing the texts above and extracting the most common triples SUBJECT + VERB + OBJECT
3. Normalize frequent actions by analyzing vertical-specific synonyms: “purchase + item” and “buy + product” can be normalized under “purchase + product”
4. We build a bottom-up knowledge graph to automatically structure intents, through their SUBJECT + VERB + OBJECT triples
5. We offer to curate a custom ontology specific for each client/chatbot
Bitext NLP Data Overview
Bitext’s Deep Linguistic Analysis
Bitext develops comprehensive NLP datasets and multilingual tools (like lexical, semantic, and syntactic annotation tools) in up to 77 languages.
Bitext offers multilingual datasets, designed for enterprise use, to analyze & tag text at three levels:
- Lexical
- Syntactic
- Semantic
Lexical Level and Lemmatization
The lemmatizer can be additionally packaged to cover the full pipeline of language analysis, from sentence segmentation to full parsing, and includes tools like spell-checking.
Both components of the lemmatizer, data and software, can be distributed integrated or separately. All these tools are available in 77 languages and 25 language variants.
Syntactic Level and Parsing
For a full list of services, at the lexical, syntactic and semantic levels, check our linguistic services.
Example of dataset for Customer Service
MADRID, SPAIN
Camino de las Huertas, 20, 28223 Pozuelo
Madrid, Spain
SAN FRANCISCO, USA
541 Jefferson Ave Ste 100, Redwood City
CA 94063, USA