A robust discussion persists within the technical and academic communities about the suitability of LLMs for tasks like Named Entity Recognition (NER). While LLMs have demonstrated extraordinary capabilities across a wide range of language-related tasks, several concerns remain, including:
- LLMs were designed primarily for generation tasks, rather than classification tasks.
- LLMs are not the most efficient approach, given the significant computational resources they require.
- LLMs may pose privacy concerns, as they typically involve sending data to the cloud.
In practical terms, these factors mean that in some cases, integrating a classical NLP solution with an LLM is the optimal approach—especially when computational resources and privacy are critical considerations. Classical NLP solutions, such as SDK-based tools that can be installed locally for enhanced privacy and require minimal hardware resources, offer an attractive alternative. Fortunately, such solutions can be integrated with any LLM, combining the power of LLMs with efficient NER functionality.
With real-time data streaming into government software, resolving ambiguities in entity identification is crucial, particularly for investigations into activities like money laundering. The Bitext NAMER addresses these challenges, including:
1. Correctly and identifying generic names.
2. Assigning them a type: person, place, time, organization…
3. Resolving aliases, also known as (AKAs), and psuedonyms.
4. Distinguishing similar names linked to potentially unrelated entities (e.g., “Levo Chan”).
Bitext’s proprietary methods support more than 20 languages, with an additional 30 languages available on request.
Two Integration Approaches for Bitext NAMER and LLMs
There are two primary approaches to integrating Bitext’s NER system with an LLM like GPT or Llama:
- Pre-processing the input text:
In this approach, entities are annotated using the NER system before feeding the pre-annotated text to the LLM as part of the input prompt or context. This is particularly beneficial for larger systems, where explicitly connecting entities to existing knowledge graphs or databases is advantageous. This method is compatible with virtually any language model.
- Model-driven integration:
Here, the LLM is configured to call the NER system directly when needed. This approach requires an LLM platform that supports external API calls or “function calling,” such as GPT, Jamba, Mistral, and Llama. This method is ideal for use cases where end users interact directly with the LLM, and the model orchestrates a workflow based on user input.
Leveraging Data from Bitext NAMER
The data generated by Bitext’s NAMER system can be utilized in various ways, including:
- Generating an entity list with metadata for use as a content library.
- Direct integration into an index or knowledge base.
- Maintaining Bitext NAMER output as a separate file for on-demand access by analysts, researchers, or investigators.
By grouping name variants, Bitext NAMER refines search queries, enabling analysts or software agents to retrieve documents or content objects in a single step—eliminating the need for time-consuming and often frustrating iterative querying. Furthermore, Bitext NAMER enhances existing systems by linking entities to knowledge graphs or databases, facilitating the creation of a “semantic layer” tailored to an organization’s specific needs or investigations.
More info about Bitext NAMER