Info
Valid from Datafari 6.1 (to be confirmed)

Note
This feature is a work in progress, and is subject to change. The documentation visibility should be set to “Community Edition” when the API will be published.

Introduction

As we have been working on the implementation of a RAG solution into Datafari, we came up with a new API feature: “Datafari - RagAPI”. This feature is natively implemented into Datafari, and is meant to interact with an external webservice that uses an LLM to retrieve information.

What is RAG?

RAG stands for Retrieval-Augmented Generation. It is the exploitation of a Large Language Model in order to generate the response to a user question or prompt leveraging only some contextual information provided with the prompt (these contextual information can be relevant documents or chunks of documents, coming from sources such as Datafari search results). Although many use cases use the Vector search approach in a first step, our approach to RAG is to use the “classic BM25 search” method: Datafari retrieves documents using a BM25 algorithm (note that this may evolve in the future with the implementation of vector search in Solr). Then, the documents are sent to the External LLM Service, which may or may not use embedding and vector storage solution before extracting a response with the Large Language Model (we decided to remain flexible here, and consider the External LLM Service as a blackbox).

Classic search VS Vector Search

The “Retrieve” part of the RAG is an important step. In this step, a search is processed to identify a list of document that may contain the wanted information, and extract relevant fragments that can be interpreted by the LLM. In our own terms, the “classic” method is the search by keywords, implemented in Datafari. The vector search is based on Machine Learning technologies to capture the meaning and the context of unstructured data, converted into digital vectors. The advantage of vector search is to “understand” natural language queries, thus finding more relevant documents, that may not necessarily use the same terms as the ones in the query.

Currently, as Vector Search is not implemented yet on Datafari and because direct vector search does not easily scale to millions of indexed documents, this technology is not fully available for our RAG solution in our initial version. However, we provide an optional hybrid solution. When activated, the RAG process triggers a classic keyword search to find the N most relevant documents (by default, 3 documents). Then, these documents are stored in a vector Database and queried to extract relevant snippets, that shall be sent to the LLM to generate a proper response.

What does Datafari-RagAPI do?

Datafari-RagAPI is a new action in our Datafari API Search endpoint. When called, it will first run a classic search with the user prompt. Then it will send a POST HTTP request to an external webservice, including a JSON containing the user prompt, as well as a list of extracts of documents retrieved during the first search. The JSON may also contain parameters specific to the webservice and/or the Large Language Model (LLM), such as the temperature.

Then, Datafari-RagAPI will format the webservice response into a standard JSON.

Image Removed

What is this external webservice?

Our solution currently supports two types of webservices: OpenAI API (or similar) and our own custom work-in-progress “Datafari External LLM Service”.

OpenAI API is a service provided by OpenAI. Its chat completion endpoint can be used to process RAG searches.
Datafari External LLM Service is an experimental solution developped by France Labs. It is a simple Python webservice, hosting a Large Language Model. This solution is currently dedicated to RAG search, but could be extended to more features.

In both cases, we use the LLM to extract a response to the user question from the provided document.

Endpoints

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

EDITION

GET

search/select?q={prompt}&action=rag

search/select?q={prompt}&action=rag&format={format}

More details about this API further down in this documentation !
Perform a search query before sending the results to the LLM.

SPECIFIC RESPONSE FORMAT. See below for more explanations.

CE

Parameters

The endpoint above handles multiple parameters.

action : Mandatory, must be set to “rag”.

prompt : Mandatory. It contains the prompt or the search written by the user.

format : Optional. This parameters allows the user to define the format of the generated response. Allowed values for this field are "bulletpoint", "text", "stepbystep" or “default”. It can also be left blank or unset.

Response structure

The responses are formatted using the following template:

Code Block{ "status": "OK|ERROR", "content": { "message": "...", "documents": { "0": { "url": "...", "id": "...", "title": "...", "content": "..." }, "1": { "url": "...", "id

RagAPI is a collection of Java classes and methods designed to handle various AI-related processes within Datafari. Currently, RagAPI supports RAG (Retrieval Augmented Generation) and document summarization, with future plans to integrate translation capabilities.

RagAPI is used in two main contexts:

Search API: Enables RAG functionality within the search engine.
AiPowered API: Supports both RAG and additional AI-powered features, such as document summarization.

All our AI features can be used calling the proper API endpoint, or by using the AI Assistant tool from Datafari UI.

At the core of RagAPI are LLM Services, a set of classes that act as interfaces between Datafari and external APIs leveraging Large Language Models (LLMs). These services allows integration with third-party AI providers like OpenAI API, as well as Datafari AI Agent, our in-house LLM API solution.

Info

This documentation covers the details of the RAG processes, the functioning of LLM Services, and the common configuration for all AI-related features.

To read more about AI-related features, check our /wiki/spaces/DATAFARI/pages/3619946497 documentation.
Summarization and categorization can be handled during indexing, thanks to our /wiki/spaces/DATAFARI/pages/3462168580.
For more information about Solr Vector Search and its associated document chunking method, see /wiki/spaces/DATAFARI/pages/3503751175.

What is RAG?

RAG stands for Retrieval-Augmented Generation. It is the exploitation of a Large Language Model in order to generate the response to a user question or prompt leveraging only some contextual information provided with the prompt (these contextual information can be relevant documents or chunks of documents, coming from sources such as Datafari search results).

Here are some sources, for a better understanding of RAG:

Classic search VS Vector Search

The “Retrieve” part of the RAG is an important step. In this step, a search is processed to identify a list of document that may contain the wanted information, and extract relevant fragments that can be interpreted by the LLM. In our own terms, the “classic” method is the keywords-based search, implemented in Datafari. The vector search is based on Machine Learning technologies to capture the meaning and the context of unstructured data, converted into digital vectors. The advantage of vector search is to “understand” natural language queries, thus finding more relevant documents, that may not necessarily use the same terms as the ones in the query.

Datafari currently offers three different approaches to RAG retrieval:

Keyword-based Search (classic BM25): Documents are retrieved using a traditional BM25 Datafari search, followed by a chunking process.
InMemory Vector Search (hydrid solution): An initial BM25 search is performed, retrieving relevant documents. These documents are then chunked and stored in an "InMemoryEmbeddingsStore." A secondary vector search is executed within this in-memory store to refine the results, and extract a limited number of relevant chunks.
Solr Vector Search: During indexing, documents are pre-chunked, and each chunk is vectorized. The classic keyword-based search is replaced by a fully vector-based retrieval process, using Text to Vector Solr features.

How does RAG works in Datafari ?

The RAG process can be started through two differents Datafari API endpoints:

“/search/*” from Search API (GET) (see details in the “Endpoints” section).
“/rag” from AiPowered API (POST), documented in /wiki/spaces/DATAFARI/pages/3619946497.

Query reception

A “RAG” query is received from the user, through one of the API endpoints.

Retrieval

The first step of the process is to retrieve a list of relevant chunks, using one of the three method precedently listed. Depending on the selected method, search results may need to be processed.

Prompting

A list of prompts (including instructions for the model, relevant documents chunks and the user query) is prepared and sent to the LLM External Service.

If the prompts exceed the length limit for a single request, each chunk should be processed separately. Once all chunks have been handled, the LLM should be invoked again to generate a final, consolidated response.

Response Generation

The LLM generates a text response, citing the source he used to find the response to the user query.

Response formatting

Datafari will format the webservice response into a standard JSON, attach the relevant sources, and sent it to the user.

Image Added

What is this external webservice?

Our solution currently supports OpenAI-compatible APIs. For example:

OpenAI API is a service provided by OpenAI. Its chat completion endpoint can be used to process RAG searches.
Datafari AI Agent is an experimental solution developped by France Labs. It is a simple Python OpenAI-like API, hosting at least one Large Language Model. This solution currently supports text generation and vector embeddings.

In both cases, we use the LLM to extract a response to the user question from the provided document.

Endpoints

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

EDITION

GET

search/select?q={prompt}&action=rag

search/select?q={prompt}&action=rag&format={format}&lang={lang}

More details about this API further down in this documentation !
Perform a search query before sending the results to the LLM.

SPECIFIC RESPONSE FORMAT. See below for more explanations.

CE

Parameters

The endpoint above handles multiple parameters.

action : Mandatory, must be set to “rag”.

prompt : Mandatory. It contains the prompt or the search written by the user.

format : Optional. This parameters allows the user to define the format of the generated response. Allowed values for this field are "bulletpoint", "text", "stepbystep" or “default”. It can also be left blank or unset.

lang : Optional. The expected language of the response. Requests from DatafariUI should specify the user’s preferred language.

Allowed values are “en”, “fr”, “it”, “pt”, “pt_br”, “de”, “es”, and “ru”.
If no language is selected, the API will try to retrieve the logged user’s preferred language.
If the user is not logged in, English will be used by default.

Response structure

The responses are formatted using the following template:

Code Block
{ "status": "OK\|ERROR", "content": { "message": "...", "documents": [ "title { "url": "...", "contentid": "...", "title": "...", } "content": "..." }, } }

In this case, the “message” contains the text generated by the Large Language Model. The “documents” section is a JSONArray containing information about the document sent to the LLM and used to retrieve the answer. Each document has an ID (id), URL (url), a title (title) and a content. The content may be the “exactContent” of the document, the “preview_content”, or the Solr “highlightings” depending on the RAG configuration. Some fields may be added in the future developments.

For errors, the content follows the following structure:

Code Block
{ "code": {int}, "reason": {String} }

Example of a valid response:

Code Block

{
  "status": "OK",
  "content": {  {
          "url": "...",
          "id": "...",
          "title": "...",
          "content": "..."
        },
       "message": "According to the document, the plane for Nice takes off at 15:50.",
      "documents": {
        "0": {
          "id": "file://///localhost/fileshare/volotea.pdf",
          "title":"CHECK-IN",
          "url":"file://///localhost/fileshare/volotea.pdf",
    ...
      ]
  }
}

In this case, the “message” contains the text generated by the Large Language Model. The “documents” section is a JSONArray containing information about the document sent to the LLM and used to retrieve the answer. Each document has an ID (id), URL (url), a title (title) and a content. The content may be the “exactContent” of the document, the “preview_content”, or the Solr “highlightings” depending on the RAG configuration. Some fields may be added in the future developments.

For errors, the content follows the following structure:

Code Block
{ "code": {int}, "reason": {String} }

Example of a valid response:

Code Block

{
  "status": "OK",
  "content": {
      "contentmessage": "BoardingAccording passto Vosthe bagagesdocument, 1 Bagage(s) cabine + 1 accessoire SEC. AF1599:026 Total 12 Kg John Doe Bagage cabine 55 x 35 x 25 cm max. Vol Brest Nice AF1599DépartBES NIC 15:50 / 06 FEBEffectué par HOP ..."the plane for Nice takes off at 15:50.",
      "documents": [
        {
          "id": "file://///localhost/fileshare/volotea.pdf",
        }  "title":"CHECK-IN",
    }   } }

Example of an error response:

Code Block

{   "statusurl": "ERROR"file://///localhost/fileshare/volotea.pdf",
  "content": {       "messagecontent":"Boarding "Sorry,pass IVos couldbagages not find an answer to your question.",
      "error": {
        "code": 428,
        "label": "ragNoValidAnswer"1 Bagage(s) cabine + 1 accessoire SEC. AF1599:026 Total 12 Kg John Doe Bagage cabine 55 x 35 x 25 cm max. Vol Brest Nice AF1599DépartBES NIC 15:50 / 06 FEBEffectué par HOP ..."
        }
      ]
  }
}

The “message” field is not localized. It only has an informative value. In order to allow translations, we recommend using the “label” thing. Here are the different labels, and the associated message.

Code Block"ragErrorNotEnabled": "Sorry, it seems the feature is not enabled." "ragNoFileFound

Example of an error response:

Code Block

{
  "status": "ERROR",
  "content": {
      "message": "Sorry, I couldn'tcould not find anyan relevantanswer document to answer your requestquestion.",
      "ragTechnicalErrordocuments": "Sorry[],
I met a technical issue. Please try again later, and if the problem remains, contact an administrator."
"ragNoValidAnswer": "Sorry, I could not find an answer to your question."

Example of a valid formatted response

Code Block
GET https://DATAFARI_HOST/Datafari/rest/v2.0/search/select?action=rag&q=comment%20faire%20un%20clafoutis%20%C3%A0%20la%20cerise&format=stepbystep

Code Block

{
  "status": "OK",
  "content": {
    "documents": [
        {
            "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Le clafoutis à la cerise ... ", 
			"id": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx",
            "title": "Recette Clafoutis à la cerise",
            "url": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx"
        },
       "error": {
        "code": 428,
        "label": "ragNoValidAnswer"
      }
  }
}

The “message” field is not localized. It only has an informative value. In order to allow translations, we recommend using the “label” thing. Here are the different labels, and the associated message.

Code Block

"ragErrorNotEnabled": "Sorry, it seems the feature is not enabled."
"ragNoFileFound": "Sorry, I couldn't find any relevant document to answer your request."
"ragTechnicalError": "Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator."
"ragNoValidAnswer": "Sorry, I could not find an answer to your question."

Example of a valid formatted response

Code Block
GET https://DATAFARI_HOST/Datafari/rest/v2.0/search/select?action=rag&q=comment%20faire%20un%20clafoutis%20%C3%A0%20la%20cerise&format=stepbystep

Code Block

{
  "status": "OK",
  "content": {
    "documents": [
        {
            "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... LesLe 100clafoutis recettesà préféréesla descerise français ... ", 
            			"id": "https://www.cuisineaz.com/recettes/lesclafoutis-100a-recettesla-preferees-des-francais-p425cerise-65064.aspx",
            "title": "LesRecette 100Clafoutis recettesà préféréesla des françaiscerise",
            "url": "https://www.cuisineaz.com/les-100-recettes/clafoutis-prefereesa-desla-francaiscerise-p42565064.aspx"
        },
        {
            "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Les aliments100 derecettes saisonspréférées endes juilletfrançais ... ", 
            "id": "https://www.cuisineaz.com/les-aliments100-derecettes-saisonpreferees-endes-juilletfrancais-p685p425",
            "title": "AlimentsLes de100 saisonrecettes enpréférées juillet : calendrier fruits, légumes de juilletdes français",
            "url": "https://www.cuisineaz.com/les-aliments100-derecettes-saisonpreferees-endes-juilletfrancais-p685p425"
        },
    ],    {
 "message": "Pour réaliser un clafoutis à la cerise, suivez ces étapes :\\n\\n 1. Préchauffez votre four à 350°F (180°C).\\n\\n 2. Battrez les œufs dans un saladier jusqu'à l'obtention d'une texture lisse. Ajoutez la farine, le sucre et le lait. Mélangez bien pour obtenir une pâte lisse. Salez légèrement si vous le souhaitez.\\n\\n 3. Graissez votre moule ou plaque de four avec du beurre. Disposez les cerises dans le fond du moule, laissant un petit espace entre elles. Versez la pâte dessus, en la répartissant régulièrement.\\n\\n 4. Mettez au four pendant 35 à 40 minutes, jusqu'à ce que la brochette en bois retirée du centre vienne propre. Retirez le four et saupoudrez-en de sucre glace en haut. Laissez refroidir un peu avant de servir"content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Les aliments de saisons en juillet ... ",
            "id": "https://www.cuisineaz.com/les-aliments-de-saison-en-juillet-p685",
            "title": "Aliments de saison en juillet : calendrier fruits, légumes de juillet",
            "url": "https://www.cuisineaz.com/les-aliments-de-saison-en-juillet-p685"
        }
    ],
    "message": "Pour réaliser un clafoutis à la cerise, suivez ces étapes :\\n\\n 1. Préchauffez votre four à 350°F (180°C).\\n\\n Astuces :\\n\\n - Vous pouvez remplacer les cerises par d'autres fruits tels que des pommes, des abricots ou des myrtilles pour une variante classique du clafoutis.\\n - Pour ajouter plus de saveur, vous pouvez également ajouter un sachet de sucre vanillé, 1 cc de vanille en poudre, ou 1 cc d'extrait de vanille à la batterie.\\n - Si vous préférez ne pas utiliser du lactose, vous pouvez remplacer le lait par un alternative sans lactose comme du lait de noix ou du lait de soja."
  }
}

Info
You may notice that line breaks are written as “\\n”. To display the response into an HTML page, you might need to replace these by “<br/>”. This is subject to change, and might be manage directly into Datafari RagAPI in the future.

Configuration

Multiple configurable parameters can be set in the rag.properties file. This file can be found:

In git repository:

Code Block
datafari-ce/datafari-tomcat/conf-datafari/rag.properties

On the Datafari server:

Code Block
/opt/datafari/tomcat/conf/rag.properties

This file contains the parameters you will use to call your webservice. You can either use the OpenAI API (or similar), or the provided Datafari RAG Webservices.

Global RAG properties

rag.enabled: Set to “true” to enable RAG features (default: true).
rag.enable.log: Enable processing logs (default: false).
rag.enable.chunking: Enable chunking. Highly recommended if vector search is disabled. (default: true).
rag.enable.vector.search: Enable vector search. This significantly increased processing time, but also highly improves the quality of the response. (default: true).

Web services parameters

rag.api.endpoint: The URL of the API you want to call. By default, OpenAI LlmConnector uses https://api.openai.com/v1/. If you are using Datafari RAG Web Services, consider using http://your-datafari-ws-address:8888/batch
rag.api.token: Your API token. Required to use OpenAI services, please use your own token.
rag.llm.service: The API you are using. Current accepted values are : "datafari" (for Datafari RAG API), "openai" (for OpenAI API) (default: openai).

LLM parameters

rag.model: The LLM model to be used. Can be left blank to use the API’s default model. (default: gpt-3.5-turbo).
rag.temperature: Temperature controls the randomness of the text that the LLM generates. With GPT, you can set a Temperature of between 0 and 2 (the default is 1). With others LLM, it can be set between 0 and 1. We recommand setting it to 0. (default: 0)
rag.maxTokens: Integer value. The maximum number of tokens in the response. (default: 200)

Datafari RAG pre-processing properties

rag.maxFiles : Integer value. The maximum number of files that should be included in the process. Allowing too many files decrease performances. (default: 3)

rag.chunk.size : Integer value. The maximum length in character of a chunk that should be handled by the LLM. Setting an exceeding value may cause errors or malfunction. (default: 30000)
rag.operator: “AND” or “OR”. This fields defines the operator (q.op in Solr) used in the Search process in Datafari.

Examples of configuration

For OpenAI

Code Block

##############################
# ## GLOBAL RAG PROPERTIES ###
##############################

rag.enabled=true
rag.enable.logs=false
rag.enable.chunking=true
rag.enable.vector.search=true

###############################
### WEB SERVICES PARAMETERS ###
###############################

rag.api.endpoint=
rag.api.token=sk-12345notarealapikey6789
rag.llm.service=openai

######################
### LLM PARAMETERS ###
######################

rag.model=
rag.temperature=0.2
rag.maxTokens=200

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

rag.maxFiles=3
rag.chunk.size=30000
rag.operator=AND

For Datafari External LLM service

Code Block

language	none

##############################
# ## GLOBAL RAG PROPERTIES ###
##############################

rag.enabled=true
rag.enable.logs=false
rag.enable.chunking=true
rag.enable.vector.search=true

###############################
### WEB SERVICES PARAMETERS ###
###############################

rag.api.endpoint=http://my-datafari-ws.com:8888/batch
rag.api.token=
rag.llm.service=datafari

######################
### LLM PARAMETERS ###
######################

rag.model=mistral-7b-openorca.Q4_0.gguf
rag.temperature=0
rag.maxTokens=400

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

rag.maxFiles=3
rag.chunk.size=30000
rag.operator=AND

Info
Default URL for OpenAI is https://api.openai.com/v1/. If you are using you own OpenAI-like API, you can set your own URL into the rag.api.endpoint.

Technical specification

Process description

Image Removed

The client sends a query to the Datafari API, using the “RAG” action
Code Block
https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/select?q={prompt}&action=rag

The most important part here is the prompt, that will be used to retrieve documents and snippets.

A search query is processed based on the user prompt, in order to retrieve a list of potentially relevant documents. The data is extracted and formatted.
Vector search is optional. If it is activated, retrieved documents will be embedded and stored in a vector database. Then, a query will be processed to retrieve relevant snippets from those documents.

This step significantly increases the processing time, but provides better results, in particular with natural language. Also, as vector search provides short relevant snippets, it makes chunking optional.

The vector search can return up to 5 text segments. This value is currently hard-coded in VectorUtils class.

Image Removed

Documents might be to big to be handled in one time by the LLM. Chunking allows to cut large documents into smaller pieces to process them sequentially.

This feature is optional, but is highly recommended if you don’t use vector search. In the future, this feature might be improve. See this link for more chunking strategies.

During prompting, the list of documents/snippets is converted into a list of prompts that will be processed by the LLM. Each prompt contains instructions (defined in rag-instructions.txt), documents extracts, and the user prompt as a question.

If documents are short enough, they might be merged into one single prompt to improve performances.

The expected language of the response is defined in the prompt. It used the preferred language of the user if it is set. Otherwise, it uses the browser language.

Our solution is conceived to be able to interact with various LLM API. The dispatcher select the proper connector to interact with the configured LLM API.

A connector is a Java class implementing our LlmConnector interface. It contains the “invoke” method, taking as parameter a list of prompts, and returning a simple String response from the LLM.

To this day, we provide two LlmConnectors:

OpenAILlmConnector, that requires an API key to interact with OpenAI API
DatafariLlmConnector, that allows you to call our custom Datafari RAG Webservice solution.

The selected connector prepares one or multiple HTTP/HTTPS queries, one for each prompt from the list. Then, it calls the external LLM API, and extracts the response.

If the list of prompts contains multiple entries, all the responses are concatenated and sent to the LLM again, to generate a final summary.

The format of the JSON depends on the configured template. In this step, the context is cleaned : it must not contain special characters that could break the webservice (\n, \b, \\...).

The response is formatted in JSON and sent back to the user.

Available LlmConnectors

An LlmConnector is a class that implements our “LlmConnector.java” interface, and is used to call an external LLM API, like OpenAI API or Datafari External RAG web services. The entrypoint of these classes are the method invoke().

Code Block

/**
*
* @param prompts A list of prompts. Each prompt contains instructions for the model, document content and the user query
* @return The string LLM response
 */
String invoke(List<String> prompts, HttpServletRequest request) throws IOException;

This method takes, as parameter, a list of String prompts ready to be sent.

The invoke() method sends each prompt to the associated LLM API. If there is only one prompt in a list, then the String response is directly returned. Otherwise, responses are concatenated, and the API is called one last time to create a summary response.

Currently there are two available LlmConnectors.

OpenAI LLM connector

This connector can be used to access OpenAI API, or any other API that uses OpenAI signature. The default model, gpt-3.5-turbo, can be changed by editing the rag.model property.

If you are planning to use your own OpenAI-like solution, edit the rag.api.endpoint property. Default value is https://api.openai.com/v1/

Datafari LLM connector

DatafariLlmConnector is the default connector. It allows you to interact with our LLM solution, Datafari External RAG Web service, generating the proper JSON request body.

See RAG External web services for more information about the web services.

The URL to be set in rag.api.endpoint should look like:

Code Block
http://my-datafari-ws.com:8888/batch

2. Battrez les œufs dans un saladier jusqu'à l'obtention d'une texture lisse. Ajoutez la farine, le sucre et le lait. Mélangez bien pour obtenir une pâte lisse. Salez légèrement si vous le souhaitez.\\n\\n 3. Graissez votre moule ou plaque de four avec du beurre. Disposez les cerises dans le fond du moule, laissant un petit espace entre elles. Versez la pâte dessus, en la répartissant régulièrement.\\n\\n 4. Mettez au four pendant 35 à 40 minutes, jusqu'à ce que la brochette en bois retirée du centre vienne propre. Retirez le four et saupoudrez-en de sucre glace en haut. Laissez refroidir un peu avant de servir.\\n\\n Astuces :\\n\\n - Vous pouvez remplacer les cerises par d'autres fruits tels que des pommes, des abricots ou des myrtilles pour une variante classique du clafoutis.\\n - Pour ajouter plus de saveur, vous pouvez également ajouter un sachet de sucre vanillé, 1 cc de vanille en poudre, ou 1 cc d'extrait de vanille à la batterie.\\n - Si vous préférez ne pas utiliser du lactose, vous pouvez remplacer le lait par un alternative sans lactose comme du lait de noix ou du lait de soja."
  }
}

Info
You may notice that line breaks are written as “\\n”. To display the response into an HTML page, you might need to replace these by “<br/>”.

Configuration

This configuration applies not only to RAG but also to the features provided by the AiPowered API.

Multiple configurable parameters can be set in the rag.properties file. This file can be found:

In git repository:

Code Block
datafari-ce/datafari-tomcat/conf-datafari/rag.properties

On the Datafari server:

Code Block
/opt/datafari/tomcat/conf/rag.properties

This file contains the parameters you need to call your LLM solution, enable or disable AI features, and configure their processes.

Global AI/RAG properties

ai.enable.rag: Set to “true” to enable RAG features (default: true).
ai.enable.summarization: Enable processing logs (default: false).

Web services parameters

rag.api.endpoint: The URL of the API you want to call. By default, OpenAI LlmService uses https://api.openai.com/v1/. If you are using Datafari AI Agent, consider using http://[your-aiagent-ws-address]:8888
rag.api.token: Your API token. Required to use OpenAI services. Please use your own token.
rag.llm.service: The API you are using. Currently, the only accepted value is "openai" (default: openai).

LLM parameters

rag.model: The LLM model to be used. Can be left blank to use the API’s default model. (default: gpt-3.5-turbo).
rag.temperature: Temperature controls the randomness of the text that the LLM generates. With GPT, you can set a Temperature of between 0 and 2 (the default is 1). With others LLM, it can be set between 0 and 1. We recommand setting it to 0. (default: 0)
rag.maxTokens: Integer value. The maximum number of tokens in the LLM response. (default: 200)

Datafari RAG pre-processing properties

rag.enable.vector.search: Enable vector search using a local vector store. This significantly increased processing time, but also highly improves the quality of the response. Recommanded, unless you are using Solr Vector Search. (default: true).
rag.enable.logs: Enable processing logs. Must log to DEBUG level. (default: false).
rag.enable.chunking: Enable chunking. Highly recommended if vector search is disabled. (default: true).
rag.maxFiles : Integer value. The maximum number of files that should be included in the process. Allowing too many files decreases performances. (default: 3).
rag.maxChunks : Integer value. The maximum number of document chunks that should be sent to the LLM when Vector Search is enabled. Allowing too many chunks decreases performances. (default: 5)
rag.chunk.size : Integer value. The maximum length in character of a chunk. Setting an exceeding value may cause errors or malfunction. (default: 30000)
rag.max.request.size : Integer value. The maximum total length in character of all the messages that should be handled by the LLM in a single request. Setting an exceeding value may cause errors or malfunction. (default: 40000)
rag.operator : “AND” or “OR”. This fields defines the operator (q.op in Solr) used in the Search process in Datafari. Using “AND” may increase the relevancy of a response, but significantlty decreases the chance to find relevant sources. (default: OR)

Solr Vector Search

solr.enable.vector.search : Set to “true” to enable Solr Vector Search into RAG process (default: false).
solr.embeddings.model : The name of the model uploaded to Solr through a JSON file. (default: default_model).
solr.vector.field : The Solr field name containing the vector. Must be an existing DenseVectorField, with a dimension that is compatible with the provided model. Datafari comes with a list of commons vector fields (“vector_1536”, “vector_512”…), but you may need to create a new one depending on the embeddings model you are using. (default: vector_1536).
solr.topK : The expected number of search results (default: 10)

Note: If you intend to use Solr Vector Search, refer to /wiki/spaces/DATAFARI/pages/3503751175 for configuring chuncking.

Examples of configuration

For OpenAI API, using Solr Vector Search

Code Block

##############################################
###        GLOBAL AI/RAG PROPERTIES        ###
##############################################

ai.enable.rag=true
ai.enable.summarization=true


##############################################
###        WEB SERVICES PARAMETERS         ###
##############################################

ai.api.endpoint=
ai.api.token=sk-xxxxxxxxxxxxxxxxxx  # Use your own OpenAI API Token
ai.llm.service=openai

##############################################
###             LLM PARAMETERS             ###
##############################################

rag.model=
rag.temperature=0.2
rag.maxTokens=200

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

rag.enable.vector.search=false
rag.enable.logs=false
rag.enable.chunking=false
rag.maxFiles=3
rag.maxChunks=5
rag.chunk.size=30000
rag.max.request.size=40000
rag.operator=OR

##############################################
###           SOLR VECTOR SEARCH           ###
##############################################

solr.enable.vector.search=true
solr.embeddings.model=my-solr-openai-model
solr.vector.field=vector_1536
solr.topK=5

For Datafari AI Agent services, using InMemory Vector Search

Code Block

language	none

##############################################
###        GLOBAL AI/RAG PROPERTIES        ###
##############################################

ai.enable.rag=true
ai.enable.summarization=true


##############################################
###        WEB SERVICES PARAMETERS         ###
##############################################

rag.api.endpoint=http://my-ai-agent.com:8888
rag.api.token=
rag.llm.service=openai

##############################################
###             LLM PARAMETERS             ###
##############################################

rag.model=mistral-7b-openorca.Q4_0.gguf
rag.temperature=0
rag.maxTokens=400

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

rag.enable.vector.search=true
rag.enable.logs=false
rag.enable.chunking=false
rag.maxFiles=3
rag.maxChunks=5
rag.chunk.size=30000
rag.max.request.size=40000
rag.operator=OR

##############################################
###           SOLR VECTOR SEARCH           ###
##############################################

solr.enable.vector.search=false
solr.embeddings.model=default-model
solr.vector.field=vector_1536
solr.topK=5

Info
Default URL for OpenAI Service is https://api.openai.com/v1/. If you are using you own OpenAI-compatible API, you can set your own URL into the rag.api.endpoint.

Technical specification

Process description

Depending on the configuration and on the Retrieval approach, the global RAG process can take three forms.

Image Added

The client sends a query to the Datafari API, using one of the “RAG” endpoint:

Code Block
GET https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/select?q={prompt}&action=rag

Code Block
POST https://{DATAFARI_HOST}/Datafari/rest/v2.0/ai/rag

Parameters are extracted from the HTTPS request, and configuration is retrieved from rag.properties.

A search query is processed using Search API methods, based on the user prompt, in order to retrieve a list of potentially relevant documents. This search can be either a keyword-based BM25 search, or a Solr Vector Search.

In the first case, the search will return whole documents, that will require chunking to be processed by the LLM.

In the case of Vector Search, Solr will return a number of length-limited excerpts.

The “InMemory Vector Search” is managed in the VectorUtils class. It receives results from a BM25 documents, chunk them into short semantic excerpts, embeds them and store them into an “In Memory Embeddings Store”. Then, it runs a vector search using the embedded user query. Finally, it returns a list of N relevant excerpts (N is defined in rag.properties, as rag.maxChunks). Since the excerpts are short, a chunking strategy is not required if “InMemory Vector Search” is enabled.

Enable or disable this step by editing the rag.enable.vector.search property in rag.properties. This step is not related to Solr Vector Search.

Info
If the feature is enabled, “Solr Vector Search” and “Chunking” options should be disabled.

Documents from BM25 search might be to big to be handled in one time by the LLM. Chunking allows to cut large documents into smaller pieces to process them sequentially.

Enable or disable this step by editing the rag.enable.chunking property in rag.properties.

Info

This feature is optional, but is highly recommended if you don’t use a vector search solution. The chunking uses Langchain4j DocumentByParagraphSplitter class. See this link for more chunking strategies.

If “Solr Vector Search” or “InMemory Vector Search” are enabled, don’t use this solution.

During prompting, the list of documents/snippets is converted into a list of prompts that will be processed by the LLM. Each prompt contains instructions (instructions are defined in the /opt/datafari/tomcat/webapps/Datafari/WEB-INF/classes/prompts folder), documents excerpts, and the user prompt as a question.

If prompts are short enough, they might be sent to the LLM into one single request to improve performances. If that is not the case, we use a “Stuff Chain” method to process all chunks.

In the future, we should conduct a benchmark to compare the "Stuff Chain" and "Refining" methods for RAG. Read more about those chunking strategies here: /wiki/spaces/DATAFARI/pages/3462168580

Our solution is designed to be able to interact with various LLM API. The dispatcher selects the proper connector (LlmService) to interact with the configured LLM API.

A connector is a Java class implementing our LlmService interface. It contains the “invoke” method, taking as parameter a list of prompts, and returning a simple String response from the LLM.

To this day, we one provide one LlmService:

OpenAILlmService, compatible with OpenAI API and any other OpenAI-like API (including /wiki/spaces/DATAFARI/pages/3522854915)

The selected connector prepares one or multiple HTTP/HTTPS queries, one for each prompt from the list. Then, it calls the external LLM API, and extracts the response.

If the list of prompts contains multiple entries, all the responses are concatenated and sent to the LLM again, to generate a final summary.

The response is formatted in JSON and sent back to the user.

Chunking

Most documents stored in the FileShare Solr collection are too large to be processed in a single request by a Large Language Model. To address this, implementing a chunking strategy is essential, allowing us to work with manageable, concise, and contextually relevant text snippets.

The chunking strategy depends on the Retrieval method. The three cases are detailed below.

Case 1: BM25 Search

Case 2: InMemory Vector Search

Case 3: Solr Vector Search

The BM25 Search returns large and whole documents from FileShare. Since no Vector Search solution is enabled in that scenario, the chunking option must be enabled in rag.properties:

rag.enable.chunking=false

When enabled, all retrieved documents will be processed by the ChunkUtils Java class. The chunkContent() method uses a Langchain4j solution: DocumentByParagraphSplitter. This splitter divides a document into paragraphs (defined by two or more consecutive newline characters).

The size of the chunks (currently in character, but should be in tokens in the future) can be configured in rag.properties:

rag.chunk.size=30000

Since this method uses documents retrieved by a BM25 search from FileShare collection, it also requires a chunking solution.

This solution is handled by the InMemoryStoreIngester itself. When documents are ingested, they are automatically chunked by the default Ingester DocumentSplitter.

Those chunks are then embedded and stored into the InMemoryEmbeddingsStore, before the Vector Search returns some of them.

All associated code can be found in the VectorUtils class.

In this scenario, chunking occurs during document indexing within the /wiki/spaces/DATAFARI/pages/3503751175. All files uploaded to FileShare are processed and split into smaller chunks using the DocumentByParagraphSplitter.

These chunks are then stored as new "child" documents, inheriting their parent's metadata. The chunked content replaces the original content in the child documents.

The child documents are stored in a separate Solr collection, VectorMain. Once created, each child’s content is embedded using the Solr TextToVectorUpdateProcessor.

When Vector Search is executed in Datafari, it retrieves documents from the VectorMain collection instead of FileShare, eliminating the need for additional chunking steps.

Prompts

Prompts are a collection of "Message" objects sent to the LLM. Each "Message" contains:

A role: "user" for the user query and document content, "assistant" for AI-generated messages, or "system" for instructions.
Content: The body of the message, which may include instructions, the user query, or document content.

If the RAG process needs to manage too many or too large snippets, it may not be able to fit all of the into one single LLM request. In this situation, a prompt strategy is required. Our current approach is the Stuff Chain method, but that is subject to change. Read more about chunking management strategies in the /wiki/spaces/DATAFARI/pages/3462168580.

Below are three prompt chain, associated with a RAG query.

Case 1: All prompts can fit in a single LLM Request

Case 2 : Prompts are to large to be processed at once (Stuff Chain method)

Case 3 : Prompts have to large (Refine method)

The LLM is call only once, with the following prompt chain.

Image Added

First, the LLM is called one per chunk, each time with the following Message list:

Image Added

Then, the LLM is call one final time:

Image Added

Note
This feature has not been implemented yet, and the prompts have not been tested.

First, the LLM is called with the first N chunks (as many chunks as a request can fit)

Image Added

Then, it will recursively call the LLM for each chunk (or pack of chunks), with the following Message list:

Image Added

Info
To determine whether the prompt chain can fit within a single request, we use our own size calculator. It compares the total size of the message contents in the collection to the maximum allowed size (in characters), as defined by the `rag.max.request.size` parameter in rag.properties.

Available LlmServices

An LlmService is a class that implements our “LlmService.java” interface, and acts as an interface between Datafari and an external APIs leveraging Large Language Models (LLMs).

All “LlmService” classes should implement the invoke() method.

Code Block

/**
 *
 * @param prompts A list of prompts. Each prompt contains instructions for the model, document content and the user query
 * @return The string LLM response
 */
String generate(List<Message> prompts, HttpServletRequest request) throws IOException;

The generate() method takes, as parameter, a list of String prompts ready to be sent. All the prompts are sent to the associated LLM API, and a single String response is returned.

Message is a Datafari Java class, with the following attributes:

String role: The role associated to the Message. Either “user”, “system”, or “assistant”.
String content: The content of the message.

The LlmService must fulfill multiple tasks:

Override the generate() method.
Provide default configuration if relevant (default model, maxToken, specific configuration…)
Call an external LLM Service, using the proper configuration as defined in the rag.properties file (endpoint temperature, maxTokens, API key…).
Format the LLM response to a simple String.
Implement a constructor taking at least a RagConfiguration object as parameter.

Info
Currently there is only one available LlmServices: OpenAiLlmService. It can be used with any OpenAI-compatible API, including our Datafari AI Agent. More may be developped in the future, implementing the LlmService Java interface.

OpenAI LLM service

This connector can be used to access OpenAI API, or any other API that uses OpenAI signature, including our /wiki/spaces/DATAFARI/pages/3522854915 . The default model, gpt-3.5-turbo, can be changed by editing the rag.model property.

If you are planning to use your own OpenAI-like solution, edit the rag.api.endpoint property. Default value is https://api.openai.com/v1/

Vector Search

Note
This feature is still in early stages of development, and is subject to changes.

To enhance the relevance of document excerpts sent to the LLM, we have implemented vector search solutions. This machine learning-based approach represents semantic concepts as vectors, offering more accurate results than traditional keyword-based search. Additionally, vector search improves retrieval quality in multilingual datasets, as it relies on semantic meaning rather than exact wording.

InMemory Vector Search

In a first time, we implemented a simple and local vector-store solution, provided by Langchain4j : InMemoryEmbeddingStore.

In this scenario, documents are first retrieved with a keyword BM25 search. Then the are processed by the EmbeddingStoreIngestor : they are chunked, chunks are translated to vectors, and stored in the local vector database.

Then, the user query is embedded and used to retrieve relevant chunks. This solution can be considered as an Hybrid Search approach, since it combines keywords-based search and vector search.

Solr Vector Search

Solr Vector Search uses the new text-to-vector feature provided by Solr 9.8. The purpose is to replace the current BM25 search and the local vectore store by a full vector search solution (and in the future, an hybrid search solution for even more relevant results).

Our VectorUpdateProcessor process all documents that are indexed into the FileShare Solr collection. Documents are chunks, embedded, and store into the VectorMain collection.

Those chunks can know be searched using our new “/vector” handler.

The following query can be used to process a vector search through the API.

Code Block
https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/vector?queryrag={prompt}&topK={topK}

queryrag or q (required) : The user query. The "queryrag" parameter is required by Solr; however, if it is missing, Datafari will automatically populate it with the value of "q".
topK (optional) : The number of results to return. (default: 10)

Security

Security is a major concern, and was a central element in our technical decisions. One of the main advantages of the Retrieval Augmented Generation is that the LLM only uses the data he is provided to answer a question. As long as we control the data sent to the model, we can prevent any leaks.

The “prompt injection” is a set of technique used to override original instructions in the prompt, through the user input. As our prompt does not contain secret, confidential or structural elements, we consider that it is not a serious issue if a user is able to read it.

Datafari Enterprise Edition offers provides a security solution, allowing enterprises to set up access restrictions on their files in Datafari. In any case, our RAG solution must not break this security. If security is configured, any user can process a RAG search on the server. However, a “classic search” search (BM25 or Vector) will be processed through Datafari SearchAPI first to retrieve available documents. If the user is not allowed to see a document, this document will not be retrieved and won’t be sent to the External RAG Webserviceexternal LLM service. That way, it is impossible for a user to use the RAG tools to retrieve information he should not be able to access.

More information in Datafari Enterprise Edition here.

Version	Old Version 35	New Version Current
Changes made by	Emeric Bernet-Rollande	Emeric Bernet-Rollande
Saved on	14 Oct, 2024	14 Feb, 2025

Content Comparison

Versions Compared

Key

Valid from Datafari 6.1 (to be confirmed)

Introduction

What is RAG?

Classic search VS Vector Search

What does Datafari-RagAPI do?

What is this external webservice?

Endpoints

Parameters

Response structure

What is RAG?

Classic search VS Vector Search

How does RAG works in Datafari ?

What is this external webservice?

Endpoints

Parameters

Response structure

Configuration

Global RAG properties

Web services parameters

LLM parameters

Datafari RAG pre-processing properties

Examples of configuration

Technical specification

Process description

Available LlmConnectors

OpenAI LLM connector

Datafari LLM connector

Configuration

Global AI/RAG properties

Web services parameters

LLM parameters

Datafari RAG pre-processing properties

Solr Vector Search

Examples of configuration

Technical specification

Process description

Chunking

Prompts

Available LlmServices

OpenAI LLM service

Vector Search

InMemory Vector Search

Solr Vector Search

Security