Info |
---|
Valid from Datafari 6.1 (to be confirmed) |
Note |
---|
This feature is a work in progress, and is subject to change. The documentation visibility should be set to “Community Edition” when the API will be published. |
Introduction
As we have been working on the implementation of a RAG solution into Datafari, we came up with a new API : “Datafari-RagAPI”. This feature is natively implemented into Datafari, and is meant to interact with an external webservice that uses an LLM to retrieve information.
What is RAG?
RAG stands for Retrieval-Augmented Generation. It is the exploitation of a Large Language Model in order to generate the response to a user question or prompt leveraging only some contextual information provided with the prompt (these contextual information can be relevant documents or chunks of documents, coming from sources such as Datafari search results). Although many use cases use the Vector search approach in a first step, our approach to RAG is to use the “classic BM25 search” method: Datafari retrieves documents using a BM25 algorithm (note that this may evolve in the future with the implementation of vector search in Solr). Then, the documents are sent to the External LLM Service, which may or may not use embedding and vector storage solution before extracting a response with the Large Language Model (we decided to remain flexible here, and consider the External LLM Service as a blackbox).
Classic search VS Vector Search
The “Retrieve” part of the RAG is an important step. In this step, a search is processed to identify a list of document that may contain the wanted information, and extract relevant fragments that can be interpreted by the LLM. In our own terms, the “classic” method is the search by keywords, implemented in Datafari. The vector search is based on Machine Learning technologies to capture the meaning and the context of unstructured data, converted into digital vectors. The advantage of vector search is to “understand” natural language queries, thus finding more relevant documents, that may not necessarily use the same terms as the ones in the query. However, as Vector Search is not implemented yet on Datafari and because direct vector search does not easily scale to millions of indexed documents, this technology is not available for our RAG solution in our initial version.
What does Datafari-RagAPI do?
Datafari-RagAPI is a new action in our Datafari API Search endpoint. When called, it will first run a classic search with the user prompt. Then it will send a POST HTTP request to an external webservice, including a JSON containing the user prompt, as well as a list of extracts of documents retrieved during the first search. The JSON may also contain parameters specific to the webservice and/or the Large Language Model (LLM), such as the temperature.
Then, Datafari-RagAPI will format the webservice response into a standard JSON.
What is this external webservice?
Our solution currently supports two types of webservices: OpenAI API (or similar) and our own custom work-in-progress “Datafari External LLM Service”.
OpenAI API is a service provided by OpenAI. Its chat completion endpoint can be used to process RAG searches.
Datafari External LLM Service is an experimental solution developped by France Labs. It is a simple Python webservice, hosting a Large Language Model. This solution is currently dedicated to RAG search, but could be extended to more features.
In both cases, we use the LLM to extract a response to the user question from the provided document.
Endpoints
METHOD | URL | DESCRIPTION | QUERY BODY | RESPONSE | PROTECTED | EDITION |
---|---|---|---|---|---|---|
GET | search/select?q={prompt}&action=rag search/select?q={prompt}&action=rag&format={format} | More details about this API further down in this documentation ! |
| SPECIFIC RESPONSE FORMAT. See below for more explanations. |
| CE |
Parameters
The endpoint above handles multiple parameters.
action : Mandatory, must be set to “rag”.
prompt : Mandatory. It contains the prompt or the search written by the user.
format : Optional. This parameters allows the user to define the format of the generated response. Allowed values for this field are "bulletpoint", "text", "stepbystep" or “default”. It can also be left blank or unset.
Response structure
The responses are formatted using the following template:
Code Block |
---|
{ "status": "OK|ERROR", "content": { "message": "...", "documents": { "0": { "url": "...", "id": "...", "title": "...", "content": "..." }, "1": { "url": "...", "id": "...", "title": "...", "content": "..." } ... } } } |
In this case, the “message” contains the text generated by the Large Language Model. The “documents” section is a JSONArray containing information about the document sent to the LLM and used to retrieve the answer. Each document has an ID (id), URL (url), a title (title) and a content. The content may be the “exactContent” of the document, the “preview_content”, or the Solr “highlightings” depending on the RAG configuration. Some fields may be added in the future developments.
For errors, the content follows the following structure:
Code Block |
---|
{ "code": {int}, "reason": {String} } |
Example of a valid response:
Code Block |
---|
{ "status": "OK", "content": { "message": "According to the document, the plane for Nice takes off at 15:50.", "documents": { "0": { "id": "file://///localhost/fileshare/volotea.pdf", "title":"CHECK-IN", "url":"file://///localhost/fileshare/volotea.pdf", "content":"Boarding pass Vos bagages 1 Bagage(s) cabine + 1 accessoire SEC. AF1599:026 Total 12 Kg John Doe Bagage cabine 55 x 35 x 25 cm max. Vol Brest Nice AF1599DépartBES NIC 15:50 / 06 FEBEffectué par HOP ..." } } } } |
Example of an error response:
Code Block |
---|
{ "status": "ERROR", "content": { "error": { "code": 428, "reason": "The query cannot be answered because no associated documents were found." } } } |
Example of a valid formatted response
Code Block |
---|
GET https://DATAFARI_HOST/Datafari/rest/v2.0/search/select?action=rag&q=comment%20faire%20un%20clafoutis%20%C3%A0%20la%20cerise&format=stepbystep |
Code Block |
---|
{ "status": "OK", "content": { "documents": [ { "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Le clafoutis à la cerise ... ", "id": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx", "title": "Recette Clafoutis à la cerise", "url": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx" }, { "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Les 100 recettes préférées des français ... ", "id": "https://www.cuisineaz.com/les-100-recettes-preferees-des-francais-p425", "title": "Les 100 recettes préférées des français", "url": "https://www.cuisineaz.com/les-100-recettes-preferees-des-francais-p425" }, { "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Les aliments de saisons en juillet ... ", "id": "https://www.cuisineaz.com/les-aliments-de-saison-en-juillet-p685", "title": "Aliments de saison en juillet : calendrier fruits, légumes de juillet", "url": "https://www.cuisineaz.com/les-aliments-de-saison-en-juillet-p685" } ], "message": "Pour réaliser un clafoutis à la cerise, suivez ces étapes :\\n\\n 1. Préchauffez votre four à 350°F (180°C).\\n\\n 2. Battrez les œufs dans un saladier jusqu'à l'obtention d'une texture lisse. Ajoutez la farine, le sucre et le lait. Mélangez bien pour obtenir une pâte lisse. Salez légèrement si vous le souhaitez.\\n\\n 3. Graissez votre moule ou plaque de four avec du beurre. Disposez les cerises dans le fond du moule, laissant un petit espace entre elles. Versez la pâte dessus, en la répartissant régulièrement.\\n\\n 4. Mettez au four pendant 35 à 40 minutes, jusqu'à ce que la brochette en bois retirée du centre vienne propre. Retirez le four et saupoudrez-en de sucre glace en haut. Laissez refroidir un peu avant de servir.\\n\\n Astuces :\\n\\n - Vous pouvez remplacer les cerises par d'autres fruits tels que des pommes, des abricots ou des myrtilles pour une variante classique du clafoutis.\\n - Pour ajouter plus de saveur, vous pouvez également ajouter un sachet de sucre vanillé, 1 cc de vanille en poudre, ou 1 cc d'extrait de vanille à la batterie.\\n - Si vous préférez ne pas utiliser du lactose, vous pouvez remplacer le lait par un alternative sans lactose comme du lait de noix ou du lait de soja." } } |
Info |
---|
You may notice that line breaks are written as “\\n”. To display the response into an HTML page, you might need to replace these by “<br/>”. This is subject to change, and might be manage directly into Datafari RagAPI in the future. |
Configuration
Multiple configurable parameters can be set in the rag.properties
file. This file can be found:
In git repository:
Code Block datafari-ce/datafari-webapp/src/main/resources/rag.properties
On the Datafari server:
Code Block /opt/datafari/tomcat/webapps/Datafari/WEB-INF/classes/rag.properties
This file contains the parameters you will use to call your webservice. You can either use the OpenAI API (or similar), or the provided Datafari RAG Webservices. Some parameters are specific to OpenAI, others are commons.
Common parameters
rag.api.endpoint
: The URL of the API you want to call. By default, OpenAI chat completion service (https://api.openai.com/v1/chat/completions
)rag.temperature
: Temperature controls the randomness of the text that GPT generates. With GPT, you can set a Temperature of between 0 and 2 (the default is 1).rag.maxToken
: Integer value. The maximum number of tokens in the response. Recommended value is “100”.rag.maxFiles
: Integer value. The maximum number of files that should be included in the request. Allowing too many files increases the risk of errors linked to the number of tokens received by the Large Language Model. This value will be subject to experimentation, but we currently recommend to set it to “3”.rag.maxFiles
: Integer value. The maximum length in character of the JSON that is send to the External RAG Webservice. Default value is 150000000.rag.addInstructions
: "true" or "false". If true, the instructions defined inrag-instructions.txt
will be attached to the request of the webservices. If you are using OpenAI API, this field must be set to true.rag.template
: “datafari” or “openai”. The template defines how the request body will be formatted. “openai” should be used if you are using an OpenAI-like API. If you are using the “Datafari-RAG webservice”, then this parameter should be set to “datafari”.rag.maxJsonLength
: Integer value. Maximum size in character of the documents extracts that will be send to the webservice. If a document exceed this value, it will be truncated. Default value is 150000000.rag.solrField
: “highlighting”, “preview_content”, “exactContent”. This parameters indicates where the documents extracts are coming from. The relevancy of the response depends on the quality and on the length of the extracts. However, larger extracts will result to heavier processing time.rag.hl.fragsize
: Integer value, optional. Ifrag.solrField
is set to "highlighting", this parameter will define the size in characters of the extract. Default value in Datafari: “200”. Recommended value to increase chances to get a response: “500”.rag.operator
: “AND” or “OR”. This fields defines the operator (q.op in Solr) used in the Search process in Datafari.rag.enable.logs
: “true” or “false”. Enable or disable logs from RAG processing.
Specific to OpenAI and similar
rag.model
: The LLM model to be used. By default,gpt-3.5-turbo
.rag.api.token
: Your OpenAI API token. To use OpenAI services, you will have to change it and use your own token.
Examples of configuration
For OpenAI
Code Block |
---|
rag.api.token=sk-12345notarealapikey6789 rag.api.endpoint=https://api.openai.com/v1/chat/completions rag.temperature=0.7 rag.maxTokens=250 rag.maxFiles=3 rag.maxJsonLength=150000000 rag.model=gpt-3.5-turbo rag.addInstructions=true rag.template=openai rag.solrField=preview_content rag.hl.fragsize=0 rag.enable.logs=true rag.operator=AND |
For Datafari External LLM service
Code Block | ||
---|---|---|
| ||
rag.api.token= rag.api.endpoint=https://<LLM_SERVICE_HOST>:8888/invoke rag.temperature=0.7 rag.maxTokens=250 rag.maxFiles=3 rag.maxJsonLength=150000000 rag.maxFiles=3 rag.maxJsonLength=150000000 rag.model= rag.addInstructions=false rag.template=datafari rag.solrField=highlighting rag.hl.fragsize=800 rag.enable.logs=true rag.operator=AND |
Technical specification
Process
The client sends a query to the Datafari API, using the “RAG” action
Code Block https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/select?q={prompt}&action=rag
The “
config.properties
" is retrieved to set up configuration.A search query is processed based on the user prompt, in order to retrieve a list of potentially relevant documents.
Some pieces of the documents are extracted and stored in a list. These pieces can be extracted from the Solr fields “preview_content” or “exactContent”, or from the Solr Highlighting depending on your configuration.
A request is built with a JSON body (see the format below, in the “Request and response structures” section). This JSON contains:
- the context (the list of extracts of documents, and the instructions ifrag.addInstructions
is set to true)
- the user prompt
The format of the JSON depends on the configured template. In this step, the context is cleaned : it must not contain special characters that could break the webservice (\n
,\b
,\\
...).The external LLM service processes the context and the prompts, and responds with a generated answer embedded in a JSON body.
The response is extracted as a String from the JSON.
The response is sent back to the user in a Datafari formatted JSON.
Request and response structures
The JSON sent to the LLM service won’t be the same for OpenAI API and Datafari External LLM service.
For OpenAI
Code Block |
---|
{ "model": "gpt-3.5-turbo", "messages": [ {"role": "system", "content": "{instructions}"}, {"role": "user", "content": "{document_1}"}, {"role": "user", "content": "{document_2}"}, ... {"role": "user", "content": "{prompt}"} ], "temperature": 0.7 } |
Example:
Code Block |
---|
{ "model": "gpt-3.5-turbo", "messages": [ {"role": "system", "content": "{instructions}"}, {"role": "user", "content": "Document 1. Once uppon a time, a nice rabbit called Paris ate a carot."}, {"role": "user", "content": "Document 2. Boarding pass to Nice\\n. Your flight will take of at 15h50 from terminal 2..."}, {"role": "user", "content": "when does the plane for Nice take off"} ], "temperature": 0.7 } |
For Datafari External LLM service
Code Block |
---|
{ "input": { "context": "{optional_instructions + documents}", "max_tokens": 200, "question": "{prompt}", "temperature": 0.4 } } |
Example
Code Block |
---|
{ "input": { "context": "Document 1. Once uppon a time, a nice rabbit called Paris ate a carot.\\n Document 2. Boarding pass to Nice\\n. Your flight will take of at 15h50 from terminal 2...", "max_tokens": 200, "question": "when does the plane for Nice take off", "temperature": 0.4 } } |
The response format will also differ:
For OpenAI
Code Block |
---|
{ "id": "...", "object": "chat.completion", "created": ..., "model": "...", "usage": { "prompt_tokens": ..., "completion_tokens": ..., "total_tokens": ... }, "choices": [ { "message": { "role": "assistant", "content": "{llm_response}" }, "logprobs": null, "finish_reason": "stop", "index": 0 } ] } |
Example:
Code Block |
---|
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1677858242, "model": "gpt-3.5-turbo-0613", "usage": { "prompt_tokens": 13, "completion_tokens": 7, "total_tokens": 20 }, "choices": [ { "message": { "role": "assistant", "content": "The plane to Nice will take off at 15:50, according to the document 2:\"Your flight will take off at 15h50\"." }, "logprobs": null, "finish_reason": "stop", "index": 0 } ] } |
For Datafari External LLM service (subject to changes)
Code Block |
---|
{ "output": "{llm_response}, "metadata": { ... } } |
Example:
Code Block |
---|
{ "output": "The plane to Nice will take off at 15:50, according to the document 2:\"Your flight will take of at 15h50\".", "metadata": { "run_id":"66337767-fa38-428a-ab2a-d06c750d43b", "feedback_tokens":[ ] } } |
Security
Security is a major concern, and was a central element in our technical decisions. One of the main advantages of the Retrieval Augmented Generation is that the LLM only uses the data he is provided to answer a question. As long as we control the data sent to the model, we can prevent any leaks.
The “prompt injection” is a set of technique used to override original instructions in the prompt, through the user input. As our prompt does not contain secret, confidential or structural elements, we consider that it is not a serious issue if a user is able to read it.
Datafari Enterprise Edition offers a security solution, allowing enterprises to set up access restrictions on their files in Datafari. In any case, our RAG solution must not break this security. If security is configured, any user can process a RAG search on the server. However, a “classic search” will be processed first to retrieve available documents. If the user is not allowed to see a document, this document will not be retrieved and won’t be sent to the External RAG Webservice. That way, it is impossible for a user to use the RAG tools to retrieve information he should not be able to access.
More information in Datafari Enterprise Edition here.