Datafari RagAPI - RAG - ALPHA VERSION
Valid from Datafari 6.1 (to be confirmed)
This feature is a work in progress, and is subject to change. The documentation visibility should be set to “Community Edition” when the API will be published.
Introduction
As we have been working on the implementation of a RAG solution into Datafari, we came up with a new API : “Datafari RagAPI”. This feature is natively implemented into Datafari, and is meant to interact with external web service that use an LLM to retrieve information.
What is RAG?
RAG stands for Retrieval-Augmented Generation. It is the exploitation of a Large Language Model in order to generate the response to a user question or prompt leveraging only some contextual information provided with the prompt (these contextual information can be relevant documents or chunks of documents, coming from sources such as Datafari search results). Although many use cases use the Vector search approach in a first step, our approach to RAG is to use the “classic BM25 search” method: Datafari retrieves documents using a BM25 algorithm (note that this may evolve in the future with the implementation of vector search in Solr). Then, the documents are sent to the External LLM Service, which may or may not use embedding and vector storage solution before extracting a response with the Large Language Model (we decided to remain flexible here, and consider the External LLM Service as a blackbox).
Classic search VS Vector Search
The “Retrieve” part of the RAG is an important step. In this step, a search is processed to identify a list of document that may contain the wanted information, and extract relevant fragments that can be interpreted by the LLM. In our own terms, the “classic” method is the search by keywords, implemented in Datafari. The vector search is based on Machine Learning technologies to capture the meaning and the context of unstructured data, converted into digital vectors. The advantage of vector search is to “understand” natural language queries, thus finding more relevant documents, that may not necessarily use the same terms as the ones in the query.
Currently, as Vector Search is not implemented yet on Datafari and because direct vector search does not easily scale to millions of indexed documents, this technology is not fully available for our RAG solution in our initial version. However, we provide an optional hybrid solution. When activated, the RAG process triggers a classic keyword search to find the N most relevant documents (by default, 3 documents). Then, these documents are stored in a vector Database and queried to extract relevant snippets, that shall be sent to the LLM to generate a proper response.
What does Datafari-RagAPI do?
Datafari-RagAPI is a new action in our Datafari API Search endpoint. When called, it will first run a classic search with the user prompt. Then it will send a POST HTTP request to an external web service, including a JSON containing the user prompt, as well as a list of extracts of documents retrieved during the first search. The JSON may also contain parameters specific to the webservice and/or the Large Language Model (LLM), such as the temperature.
Then, Datafari-RagAPI will format the webservice response into a standard JSON.
What is this external webservice?
Our solution currently supports two types of webservices: OpenAI API (or similar) and our own custom work-in-progress “Datafari External LLM Service”.
OpenAI API is a service provided by OpenAI. Its chat completion endpoint can be used to process RAG searches.
Datafari AI Agent is an experimental solution developped by France Labs. It is a simple Python API, hosting at least one Large Language Model. This solution is currently mosty used for RAG search, but will be extended to more features (categorization, summarization, vector embeddings).
In both cases, we use the LLM to extract a response to the user question from the provided document.
Endpoints
METHOD | URL | DESCRIPTION | QUERY BODY | RESPONSE | PROTECTED | EDITION |
---|---|---|---|---|---|---|
GET | search/select?q={prompt}&action=rag search/select?q={prompt}&action=rag&format={format}&lang={lang} | More details about this API further down in this documentation ! |
| SPECIFIC RESPONSE FORMAT. See below for more explanations. |
| CE |
Parameters
The endpoint above handles multiple parameters.
action : Mandatory, must be set to “rag”.
prompt : Mandatory. It contains the prompt or the search written by the user.
format : Optional. This parameters allows the user to define the format of the generated response. Allowed values for this field are "bulletpoint", "text", "stepbystep" or “default”. It can also be left blank or unset.
lang : Optional. The expected language of the response. Requests from DatafariUI should specify the user’s preferred language.
Allowed values are “en”, “fr”, “it”, “pt”, “pt_br”, “de”, “es”, and “ru”.
If no language is selected, the API will try to retrieve the logged user’s preferred language.
If the user is not logged in, English will be used by default.
Response structure
The responses are formatted using the following template:
{
"status": "OK|ERROR",
"content": {
"message": "...",
"documents": {
"0": {
"url": "...",
"id": "...",
"title": "...",
"content": "..."
},
"1": {
"url": "...",
"id": "...",
"title": "...",
"content": "..."
}
...
}
}
}
In this case, the “message” contains the text generated by the Large Language Model. The “documents” section is a JSONArray containing information about the document sent to the LLM and used to retrieve the answer. Each document has an ID (id), URL (url), a title (title) and a content. The content may be the “exactContent” of the document, the “preview_content”, or the Solr “highlightings” depending on the RAG configuration. Some fields may be added in the future developments.
For errors, the content follows the following structure:
{
"code": {int},
"reason": {String}
}
Example of a valid response:
{
"status": "OK",
"content": {
"message": "According to the document, the plane for Nice takes off at 15:50.",
"documents": {
"0": {
"id": "file://///localhost/fileshare/volotea.pdf",
"title":"CHECK-IN",
"url":"file://///localhost/fileshare/volotea.pdf",
"content":"Boarding pass Vos bagages 1 Bagage(s) cabine + 1 accessoire SEC. AF1599:026 Total 12 Kg John Doe Bagage cabine 55 x 35 x 25 cm max. Vol Brest Nice AF1599DépartBES NIC 15:50 / 06 FEBEffectué par HOP ..."
}
}
}
}
Example of an error response:
The “message” field is not localized. It only has an informative value. In order to allow translations, we recommend using the “label” thing. Here are the different labels, and the associated message.
Example of a valid formatted response
You may notice that line breaks are written as “\\n”. To display the response into an HTML page, you might need to replace these by “<br/>”. This is subject to change, and might be manage directly into Datafari RagAPI in the future.
Configuration
Multiple configurable parameters can be set in the rag.properties
file. This file can be found:
In git repository:
On the Datafari server:
This file contains the parameters you need to call your LLM solution. You can either use the OpenAI API (or similar), or the provided Datafari AI Agent.
Global RAG properties
rag.enabled
: Set to “true” to enable RAG features (default: true).rag.enable.log
: Enable processing logs (default: false).rag.enable.chunking
: Enable chunking. Highly recommended if vector search is disabled. (default: true).rag.enable.vector.search
: Enable vector search. This significantly increased processing time, but also highly improves the quality of the response. (default: true).
Web services parameters
rag.api.endpoint
: The URL of the API you want to call. By default, OpenAI LlmService useshttps://api.openai.com/v1/
. If you are using Datafari AI Agent, consider usinghttp://your-aiagent-ws-address:8888/batch
rag.api.token
: Your API token. Required to use OpenAI services. Please use your own token.rag.llm.service
: The API you are using. Current accepted values are : "datafari" (for AI Agent API), "openai" (for OpenAI API) (default: openai).
LLM parameters
rag.model
: The LLM model to be used. Can be left blank to use the API’s default model. (default:gpt-3.5-turbo
).rag.temperature
: Temperature controls the randomness of the text that the LLM generates. With GPT, you can set a Temperature of between 0 and 2 (the default is 1). With others LLM, it can be set between 0 and 1. We recommand setting it to 0. (default: 0)rag.maxTokens
: Integer value. The maximum number of tokens in the response. (default: 200)
Datafari RAG pre-processing properties
rag.maxFiles
: Integer value. The maximum number of files that should be included in the process. Allowing too many files decrease performances. (default: 3)
rag.chunk.size
: Integer value. The maximum length in character of a chunk that should be handled by the LLM. Setting an exceeding value may cause errors or malfunction. (default: 30000)rag.operator
: “AND” or “OR”. This fields defines the operator (q.op in Solr) used in the Search process in Datafari. Using “AND” may increase the relevancy of a response, but decreases the chance to find relevant sources. (default: OR)
Examples of configuration
For OpenAI
For Datafari AI Agent services
Technical specification
Process description
The client sends a query to the Datafari API, using the “RAG” action
The most important part here is the prompt, that will be used to retrieve documents and snippets.
A search query is processed based on the user prompt, in order to retrieve a list of potentially relevant documents. The data is extracted and formatted.
Vector search is optional. If it is activated, retrieved documents will be embedded and stored in a vector database. Then, a query will be processed to retrieve relevant snippets from those documents.
This step significantly increases the processing time, but provides better results, in particular with natural language. Also, as vector search provides short relevant snippets, it makes chunking optional.
The vector search can return up to 5 text segments. This value is currently hard-coded in VectorUtils class.
Documents might be to big to be handled in one time by the LLM. Chunking allows to cut large documents into smaller pieces to process them sequentially.
This feature is optional, but is highly recommended if you don’t use vector search. In the future, this feature might be improve. See this link for more chunking strategies.
During prompting, the list of documents/snippets is converted into a list of prompts that will be processed by the LLM. Each prompt contains instructions (defined in
rag-instructions.txt
), documents extracts, and the user prompt as a question.
If documents are short enough, they might be merged into one single prompt to improve performances.
The expected language of the response is defined in the prompt. It used the preferred language of the user if it is set. Otherwise, it uses the browser language.
Our solution is conceived to be able to interact with various LLM API. The dispatcher select the proper connector to interact with the configured LLM API.
A connector is a Java class implementing our LlmService interface. It contains the “invoke” method, taking as parameter a list of prompts, and returning a simple String response from the LLM.
To this day, we provide two LlmServices:
OpenAILlmService, that requires an API key to interact with OpenAI API
DatafariLlmService, that allows you to call our custom LLM API, “Datafari AI Agent”.
The selected connector prepares one or multiple HTTP/HTTPS queries, one for each prompt from the list. Then, it calls the external LLM API, and extracts the response.
If the list of prompts contains multiple entries, all the responses are concatenated and sent to the LLM again, to generate a final summary.
The format of the JSON depends on the configured template. In this step, the context is cleaned : it must not contain special characters that could break the webservice (\n
, \b
, \\
...).
The response is formatted in JSON and sent back to the user.
Available LlmServices
An LlmService is a class that implements our “LlmService.java” interface, and is used to call an external LLM API, like OpenAI API or Datafari AI Agent. The entrypoint of these classes are the method invoke()
.
This method takes, as parameter, a list of String prompts ready to be sent.
The invoke()
method sends each prompt to the associated LLM API. If there is only one prompt in a list, then the String response is directly returned. Otherwise, responses are concatenated, and the API is called one last time to create a summary response.
Currently there are two available LlmServices.
OpenAI LLM connector
This connector can be used to access OpenAI API, or any other API that uses OpenAI signature. The default model, gpt-3.5-turbo
, can be changed by editing the rag.model property.
If you are planning to use your own OpenAI-like solution, edit the rag.api.endpoint
property. Default value is https://api.openai.com/v1/
Datafari LLM connector
DatafariLlmService is the default service. It allows you to interact with our LLM solution, Datafari AI Agent, by generating the proper JSON request body.
See https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3522854915 for more information about the web services.
The URL to be set in rag.api.endpoint
should look like:
Security
Security is a major concern, and was a central element in our technical decisions. One of the main advantages of the Retrieval Augmented Generation is that the LLM only uses the data he is provided to answer a question. As long as we control the data sent to the model, we can prevent any leaks.
The “prompt injection” is a set of technique used to override original instructions in the prompt, through the user input. As our prompt does not contain secret, confidential or structural elements, we consider that it is not a serious issue if a user is able to read it.
Datafari Enterprise Edition provides a security solution, allowing enterprises to set up access restrictions on their files in Datafari. In any case, our RAG solution must not break this security. If security is configured, any user can process a RAG search on the server. However, a “classic search” will be processed first to retrieve available documents. If the user is not allowed to see a document, this document will not be retrieved and won’t be sent to the Datafari AI Agent services. That way, it is impossible for a user to use the RAG tools to retrieve information he should not be able to access.
More information in Datafari Enterprise Edition here.