Valid from Datafari 6.2

Introduction

As we have been working on the implementation of a RAG (Retrieval Augmented Generation) solution into Datafari, we came up with a new feature: “Datafari RagAPI”. RagAPI is a collection of Java classes and methods designed to handle RAG-related processes within Datafari. For more AI-related features, see also AI Powered Datafari API .

RAG processes can be triggered from two different API contexts:

Search API: Enables RAG functionality within the search engine. RAG through Search API is deprecated.
AiPowered API: Supports both RAG and additional AI-powered features, such as document summarization.

All our AI features can be used calling the proper API endpoint, or by using the AI chatbot widget available on Datafari UIv2.

At the core of RagAPI are LLM Services, a set of classes that act as interfaces between Datafari and external APIs leveraging Large Language Models (LLMs). These services allow integration with third-party AI providers like OpenAI API, as well as Datafari AI Agent, our in-house LLM (and embedding models) API solution.

This documentation covers the details of the RAG processes, the functioning of LLM Services, and the common configuration for all AI-related features.

To read more about AI-related features, check our AI Powered Datafari API documentation.
Summarization and categorization can be handled during indexing, thanks to our LLM Transformation Connector.
For more information about Solr Vector Search and its associated document chunking method, see Vector Update Processor - BETA VERSION.

What is RAG?

RAG stands for Retrieval-Augmented Generation. It is the exploitation of a Large Language Model in order to generate the response to a user question or prompt leveraging only some contextual information provided with the prompt (these contextual information can be relevant documents or chunks of documents, coming from sources such as Datafari search results).

Here are some sources, for a better understanding of RAG:

Classic search (BM25) VS Vector Search

The “Retrieve” part of the RAG is an important step. In this step, a search is processed to identify a list of documents that may contain the desired information, and extract relevant fragments that can be interpreted by the LLM. In our own terms, the “classic” method is the keywords-based search, implemented in Datafari. The vector search is based on Machine Learning algorithms to capture the meaning and the context of unstructured data, converted into vectors. The advantage of vector search is to “understand” natural language queries, thus finding more relevant documents, that may not necessarily use the same terms as the ones in the query.

Datafari currently offers two different approaches to RAG retrieval:

Keyword-based Search (classic BM25): Documents are retrieved using a traditional BM25 Datafari search, followed by a chunking process.
Solr Vector Search: During indexing, documents are pre-chunked, and each chunk is vectorized. The classic keyword-based search is replaced by a fully vector-based retrieval process, using Text to Vector Solr features. Short size chunks are returned, instead of whole documents.

How does RAG work in Datafari ?

The RAG process can be started through two different Datafari API endpoints:

“/search/*” from Search API (GET) (see details in the “Endpoints” section). This endpoint is deprecated. The /rag endpoint is the recommended way.
“/rag” from AiPowered API (POST), documented in AI Powered Datafari API.

Query reception

A “RAG” query is received from the user, through one of the API endpoints.

History retrieval (optional)

If “chat memory” is enabled, the chat history is retrieved from the request to be used in the prompts.

Query rewriting (optional)

If “query rewriting” is enabled, the search query is rewritten by the LLM before the Solr search (source retrieval). This only applies to the search step. The initial user query is still used in RAG process. If “chat memory” is enabled, the conversation history is used for query rewriting.

Source retrieval

Documents are retrieved from Solr using Datafari search. The retrieval process can use Vector Search technology, or classic BM25 Search. If the “query rewriting” feature is enabled, the rewritten query is used for the search. Otherwise, the initial user query used.

Chunking

Any document content (or document extract, in case of vector search) larger than the maximum chunk size defined in configuration is chunked into smaller pieces. Each piece is called a “chunk”.

Prompting

A list of prompts (including instructions for the model, relevant documents chunks and the user query) is prepared and sent to the LLM External Service.

If the prompt exceeds the length limit for a single request, each chunk is processed separately. Once all chunks have been handled, the LLM is invoked again to generate a final, consolidated response.
This process should be optimized soon to process multiple chunks at once.

Response Generation

The LLM generates a text response, citing the sources it used to generate the response to the user query.

Response formatting

Datafari will format the webservice response into a standard JSON, attach the relevant sources, and send it to the user.

Simplified view of the RAG process with Vector Search

LLM external webservice?

Our solution currently supports OpenAI-compatible APIs. For example:

OpenAI API is a service provided by OpenAI. Its chat completion endpoint can be used to process RAG searches.
Datafari AI Agent is an experimental solution developped by France Labs. It is a Python-based OpenAI-like API, hosting at least one Model (LLM for the RAG case, but can also host an embedding model for vector search). This solution currently supports text generation and vector embeddings.

In both cases, we use the LLM to extract a response to the user question from the provided documents chunks.

You can use different models for text generation tasks (RAG, summarization…) and for vector embeddings. For example, you can use OpenAI API to generated responses, and a locally installed Datafari AI Agent for embeddings (which does not require a GPU).

Endpoints

METHOD	URL	DESCRIPTION	QUERY BODY	RESPONSE	PROTECTED	EDITION

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

EDITION

POST

ai/rag

More details about this API here:
AI Powered Datafari API

{
    "query": "[user_query]",
    "id": "[any_solr_document_id]",
    "lang": "[language_code]"
}

Or

{
    "query": "[user_query]",
    "lang": "[language_code]"
}

CE

GET

search/select?q={user_query}&action=rag

search/select?q={user_query}&action=rag&format={format}&lang={lang}

Deprecated

More details about this API further down in this documentation !
Perform a search query before sending the results to the LLM.

This endpoint is deprecated, the /rag endpoint is the recommended way.

SPECIFIC RESPONSE FORMAT. See below for more explanations.

CE

Deprecated

RAG via search endpoint is deprecated, and will be disabled soon.

Parameters for the /search/*?action=rag endpoint

The endpoint above handles the following parameters.

action : Mandatory, must be set to rag.

user_query: Mandatory. This parameter contains the user’s query (natural language or keywords-based). Avoid line breaks in the query, or characters that might cause issues or be misinterpreted by the LLM (such as \<>^|{}~ ). We recommend sticking to alphanumerical characters, and common punctuation marks such as ?'.:!',”-_%&().

format : Optional. This parameter allows the user to enforce a format of the generated response. Allowed values for this field are "bulletpoint", "text", "stepbystep" or “default”. It can be left blank or unset if you don’t need a specific format.

If you wish to use the format parameter, make sure that your instruction prompt template contains the {format} tag. This tag will be replaced, in the final prompt, by one of the following statement:

stepbystep: " If relevant, your response should take the form step-by-step instructions.\n"
bulletpoint: " If relevant, your response should take the form of a bullet-point list.\n"
text: " If relevant, your response should take the form of a text.\n"
default (or empty): ""

lang : Optional. The expected language of the response. Requests from DatafariUI specify the user’s preferred language here.

Allowed values are “en”, “fr”, “it”, “pt”, “pt_br”, “de”, “es”, and “ru”.
When using the chatbot tool, this field is set with the language selected in Datafari UI.
If no language is selected:
- The API tries to retrieve the logged user’s preferred language.
- If the user is not logged in, the API tries to use the browser language from the HTTPS request.
- If for any reason no language can be retrieved, English will be used by default.

Response structure for the /search/*?action=rag endpoint

Responses are formatted using the following template:

{
  "status": "OK|ERROR",
  "content": {
      "message": "...",
      "documents": [
        {
          "url": "...",
          "id": "...",
          "title": "...",
          "content": "..."
        },
        {
          "url": "...",
          "id": "...",
          "title": "...",
          "content": "..."
        },
        ...
      ]
  }
}

In this case, the “message” contains the text generated by the Large Language Model. The “documents” section is a JSONArray containing information about the document chunks used to generate the answer, and mentioned by the LLM. Each document has an ID (id), URL (url), a title (title) and a content (content). The content comes from the documents exactContent field. If vector search is enabled in your RAG configuration, only the chunk content will be returned, rather than the entire document.

For errors, the content follows the following structure:

{
  "code": {int},
  "reason": {String}
}

Example of a valid response:

{
  "status": "OK",
  "content": {
      "message": "According to the document titled 'CHECK-IN', the plane for Nice takes off at 15:50.",
      "documents": [
        {
          "id": "file://///localhost/fileshare/volotea.pdf",
          "title":"CHECK-IN",
          "url":"file://///localhost/fileshare/volotea.pdf",
          "content":"Boarding pass Vos bagages 1 Bagage(s) cabine + 1 accessoire SEC. AF1599:026 Total 12 Kg John Doe Bagage cabine 55 x 35 x 25 cm max. Vol Brest Nice AF1599DépartBES NIC 15:50 / 06 FEBEffectué par HOP ..."
        }
      ]
  }
}

Example of a valid formatted response, with a “stepbystep” format.

GET https://DATAFARI_HOST/Datafari/rest/v2.0/search/select?action=rag&q=comment%20faire%20un%20clafoutis%20%C3%A0%20la%20cerise&format=stepbystep

{
  "status": "OK",
  "content": {
    "documents": [
        {
            "content": "Recettes Recettes Catégories Salades Les bases Apéritif Entrées Plats ... Le clafoutis à la cerise ... ", 
			"id": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx",
            "title": "Recette Clafoutis à la cerise",
            "url": "https://www.cuisineaz.com/recettes/clafoutis-a-la-cerise-65064.aspx"
        }
    ],
    "message": "Pour réaliser un clafoutis à la cerise, suivez ces étapes :\n\n 1. Préchauffez votre four à 350°F (180°C).\n\n 2. Battrez les œufs dans un saladier jusqu'à l'obtention d'une texture lisse. Ajoutez la farine, le sucre et le lait. Mélangez bien pour obtenir une pâte lisse. Salez légèrement si vous le souhaitez.\n\n 3. Graissez votre moule ou plaque de four avec du beurre. Disposez les cerises dans le fond du moule, laissant un petit espace entre elles. Versez la pâte dessus, en la répartissant régulièrement.\n\n 4. Mettez au four pendant 35 à 40 minutes, jusqu'à ce que la brochette en bois retirée du centre vienne propre. Retirez le four et saupoudrez-en de sucre glace en haut. Laissez refroidir un peu avant de servir.\\n\\n Astuces :\\n\\n - Vous pouvez remplacer les cerises par d'autres fruits tels que des pommes, des abricots ou des myrtilles pour une variante classique du clafoutis.\n - Pour ajouter plus de saveur, vous pouvez également ajouter un sachet de sucre vanillé, 1 cc de vanille en poudre, ou 1 cc d'extrait de vanille à la batterie.\n - Si vous préférez ne pas utiliser du lactose, vous pouvez remplacer le lait par un alternative sans lactose comme du lait de noix ou du lait de soja."
  }
}

In this example, the generated response contains a list of “step-by-step” instructions.

You may notice that line breaks are written as “\n”. To display properly the response into an HTML page, you might need to replace these by “<br/>”.

Example of an error response:

{
  "status": "ERROR",
  "content": {
      "message": "Sorry, I could not find an answer to your question.",
      "documents":[],
      "error": {
        "code": 428,
        "label": "ragNoValidAnswer"
      }
  }
}

The “message” field is not localized. It only has an informative value. In order to allow translations, we recommend using the “label” thing. Here are the different labels, and the associated message.

"ragErrorNotEnabled": "Sorry, it seems the feature is not enabled."
"ragNoFileFound": "Sorry, I couldn't find any relevant document to answer your request."
"ragTechnicalError": "Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator."
"ragNoValidAnswer": "Sorry, I could not find an answer to your question."

Configuration

This configuration applies not only to RAG but also to the features provided by the AiPowered API.

Via Admin UI

The easiest and fastest way to configure RAG and other AI-powered features is to use the dedicated page on the Admin interface. This page can be found in the section Extra Functionalities > RAG & AI configuration. See the section above for more information about each parameter.

Field label	Input Type	Associated property in rag.properties	Description

Field label	Input Type	Associated property in rag.properties	Description
Enable RAG endpoint	Checkbox (On/Off)	ai.enable.rag (true/false)	Enable the `POST /rag` endpoint from Datafari AI-Powered API.
Enable summarization endpoint	Checkbox (On/Off)	ai.enable.summarization (true/false)	Enable the `POST /summarization` endpoint from Datafari AI-Powered API.
External service endpoint	Text	ai.api.endpoint	The base URL of the API you want to call. Default value for OpenAI is: `https://api.openai.com/v1/`. If you are using a local Datafari AI Agent, consider using: `http://localhost:8888`
Service API key	Password	ai.api.token	Your API token. Required to use OpenAI services. Please use your own token. If you are using Datafari AI Agent as your LLM inference engine, you must use a fake API token because as of now, this parameter cannot be empty (example: `XXX`).
Type of service	Select (`OpenAI`)	ai.llm.service (`openai`)	The type of API you are using. Use `OpenAI` for any OpenAI-compatible API, such as Datafari AI Agent. Currently, the only available service is for OpenAI-compatible APIs, associated to the OpenAI LlmService.
Large Language Model (e.g.: gpt4o-mini, mistral7B.gguf...)	Text	llm.model	The LLM model to be used. Can be left blank to use the service’s default model. OpenAI: `gpt-4o-mini` Datafari AI Agent: default model defined in the Agent’s `.env`.
Temperature (between 0 and 1, recommended value is 0)	Number (decimal, between 0 and 1)	llm.temperature	Temperature controls the level of randomness of the text that the LLM generates. It can be set between 0 (low randomness) and 1 (very random). We recommend setting it to 0. (default: 0)
Max size (in tokens) of the LLM responses	Number (integer, greater than 0)	llm.maxTokens	The maximum number of tokens in the LLM response. Default value has been arbitrarily set to 200.
Chunk management strategy	Select	prompt.chunking.strategy	The strategy for chunks management. Read more about chunk management strategies in the Prompt section.
Maximum size in characters of the requests sent to the LLM.	Number (integer, greater than 0)	prompt.max.request.size
Maximum size (in characters) of the chunks	Number (integer, greater than 0)	chunking.chunk.size
Enable query rewriting (recommended with chat memory)	Checkbox (On/Off)	chat.query.rewriting.enabled
Enable chat memory	Checkbox (On/Off)	chat.memory.enabled
History size (maximum number of messages)	Number (integer, greater than 0)	chat.memory.history.size
Retrieval method	Select (`BM25` / `Vector Search`)	solr.enable.vector.search true: “Vector Search” false: “BM25”
Embeddings model (e.g.: text-embedding-3-small, all-MiniLM-L6-v2.Q8_0.gguf...)	Text	solr.embeddings.model
Vector Field	Text (readonly)	solr.embeddings.model	Vector Search only ! Readonly ! This field shows the semantic vectors in Solr, generated by the active embeddings model. This field can only be changed from the “Solr Vector Search” AdminUI.
Number of document snippets (chunks) retrieved for RAG vector search.	Number (integer, greater than 0)	solr.topK	Vector Search only !
Maximum number of files processed by the LLM	Number (integer, greater than 0)	chunking.maxFiles	BM25 only !
Search operator (q.op Solr parameter)	Select (`OR` / `AND`)	rag.operator	BM25 only !

Via properties file

Configuration properties related to RAG and other AI features in Datafari are stored in the rag.properties file. Those can be directly edited without using the dedicated AdminUI. This file can be found:

In git repository:
datafari-ce/datafari-tomcat/conf-datafari/rag.properties
On the Datafari server:
/opt/datafari/tomcat/conf/rag.properties

This file contains the parameters you need to call your LLM solution, enable or disable AI features, and configure their processes.

Global AI/RAG properties

ai.enable.rag: Set to “true” to enable RAG features (default: false).
ai.enable.summarization: Enable summarization endoint from AI Powered API (default: false).

LLM Web services parameters

These parameters are related to the LLM Service that will be called to respond to a user request (more precisely, to respond to the prompt sent to it).

ai.llm.service: The type API you are using. Currently, the only accepted value is "openai" for OpenAI-compatible APIs, associated to the OpenAI LlmService (default: openai).
ai.api.endpoint: The URL of the API you want to call. By default, OpenAI LlmService uses https://api.openai.com/v1/. If you are using Datafari AI Agent, consider using http://[your-aiagent-ws-address]:8888
ai.api.token: Your API token. Required to use OpenAI services. Please use your own token. If you are using Datafari AI Agent as your LLM inference engine, you must use a fake API token because as of now, this parameter cannot be empty (example: rag.api.token=X).

LLM parameters

These parameters are for the configuration of the LLM model used for the text generation phase (not the retrieval phase of the RAG). Must be consistent with the LLM service configured above.

llm.model: The LLM model to be used. Can be left blank to use the API’s default model. (default: gpt-3.5-turbo, which means that by default it can only work when calling the openai cloud API).
llm.temperature: Temperature controls the randomness of the text that the LLM generates. With OpenAI cloud models, you can set a Temperature of between 0 and 2. With others LLM, it can be set between 0 and 1. We recommend setting it to 0. (default: 0)
llm.maxTokens: Integer value. The maximum number of tokens in the LLM response. (default value has been arbitrarily set to 200).

Datafari RAG pre-processing properties

chunking.enable: Enable chunking of documents to the RAG process, AFTER the retrieval step. If enabled, documents coming from the retrieval algorithm are chunked. Required if retrieval is done using BM25 (Solr Vector Search already provide a chunking solution) (default: true).
chunking.chunk.size : Integer value. The maximum length in character of a chunk (chunks here are the ones descibed in chunking.enable above). Setting an exceeding value may cause exceptions with the LLM, sending a “ragTechnicalError” message to the user. (default value arbitrarily set to 3000, which corresponds to approximatly 750 words/1000 tokens per chunk)
chunking.maxFiles : Integer value. The maximum number of documents retrieved with BM25 search. Allowing too many files decreases performances (default: 3).
rag.operator : “AND” or “OR”. This fields defines the operator (q.op in Solr) used in the BM25 search process in Datafari. Using “AND” may increase the relevancy of a response, but significantlty decreases the chance to find relevant sources. (default: OR)

Prompting

prompt.chunking.strategy : The strategy for chunks management. Accepted values are mapreduce for “Map-Reduce” method, and refine for “Iterative Refining” method. Read more about chunk management strategies in the Prompt section (default: refine)
prompt.max.request.size : Integer value (required). The maximum total length in characters of the prompt to be sent to the LLM in a single request (including instructions, sources, query, and chat history if enabled). Setting an exceeding value exception in the LLM, returning a “ragTechnicalError” message to the user. Each request will contain as many snippets/chunks as possible without exceeding this limit (minimum one chunk per request). (recommended value arbitrarily set to 40000, which corresponds to approximatly 10.000 words/13.000 tokens)

Chat Memory

chat.memory.enabled : Enable chat memory. See Chat Memory section for more information (default: false).
chat.query.rewriting.enabled : Enable query rewriting. See Query Rewriting section for more information (default: false).
chat.memory.history.size : Integer value. The maximum number of messages from the chat history sent to the LLM (default value arbitrarily set to 6). See Chat Memory section for more information.

Solr Vector Search

Parameters for vector search via Solr. Solr must be configured accordingly (see Note below)

solr.enable.vector.search : Set to “true” to enable Solr Vector Search into RAG process (default: false).
solr.embeddings.model : The name of the model configuration uploaded to Solr through a JSON file. (default: default_model). Must reflect what has been configured in Solr, see Note below. This field can be edited using the “Solr Vector Search” AdminUI.
solr.vector.field : The Solr field name containing the vector. Must be an existing DenseVectorField, with a dimension that is compatible with the provided model. Datafari comes with a list of commons vector fields (“vector_1536”, “vector_512”…), but you may need to create a new one depending on the embeddings model you are using. (default: vector_1536). Must reflect what has been configured in Solr. This field can be edited using the “Solr Vector Search” AdminUI.
solr.topK : The expected number of relevant documents for the vector search in RAG process. Also used as default topK value for non-RAG vector search. (default: 10)

Note: If you intend to use Solr Vector Search, refer to:

Vector Update Processor - BETA VERSION for configuring chunking.
Datafari Vector Search for configuring Solr to manage Vector Search and indexing vectors.
See more on Solr query further in this document Datafari RagAPI - RAG - BETA VERSION | Solr Vector Search.1

Examples of configuration

For OpenAI API, using Solr Vector Search	For Datafari AI Agent services, using InMemory Vector Search

For OpenAI API, using Solr Vector Search

For Datafari AI Agent services, using InMemory Vector Search

##############################################
###        GLOBAL AI/RAG PROPERTIES        ###
##############################################

ai.enable.rag=true
ai.enable.summarization=true

##############################################
###        WEB SERVICES PARAMETERS         ###
##############################################

ai.api.endpoint=                              # No endpoint provided, default OpenAI URL is used.
ai.api.token=sk-xxxxxxxxxxxxxxxxxx            # Use your own OpenAI API Token
ai.llm.service=openai

##############################################
###             LLM PARAMETERS             ###
##############################################

llm.model=gpt-4o-mini
llm.temperature=0.2
llm.maxTokens=200

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

chunking.maxFiles=
chunking.chunk.size=4000
rag.operator=OR

##############################################
###               PROMPTING                ###
##############################################

prompt.chunking.strategy=mapreduce
prompt.max.request.size=40000

##############################################
###              CHAT MEMORY               ###
##############################################

chat.memory.enabled=true
chat.query.rewriting.enabled=true
chat.memory.history.size=6

##############################################
###           SOLR VECTOR SEARCH           ###
##############################################

solr.enable.vector.search=true
solr.embeddings.model=my-solr-model
solr.vector.field=vector_1536
solr.topK=5

##############################################
###        GLOBAL AI/RAG PROPERTIES        ###
##############################################

ai.enable.rag=true
ai.enable.summarization=true

##############################################
###        WEB SERVICES PARAMETERS         ###
##############################################

ai.api.endpoint=http://my-ai-agent.com:8888
ai.api.token=XXX                              # Use a placeholder API Token
ai.llm.service=openai

##############################################
###             LLM PARAMETERS             ###
##############################################

llm.model=mistral-7b-openorca.Q4_0.gguf
llm.temperature=0
llm.maxTokens=300

##############################################
### DATAFARI RAG PRE-PROCESSING PROPERTIES ###
##############################################

chunking.maxFiles=3
chunking.chunk.size=3000
rag.operator=OR

##############################################
###               PROMPTING                ###
##############################################

prompt.chunking.strategy=refine
prompt.max.request.size=30000

##############################################
###              CHAT MEMORY               ###
##############################################

chat.memory.enabled=false
chat.query.rewriting.enabled=false
chat.memory.history.size=

##############################################
###           SOLR VECTOR SEARCH           ###
##############################################

solr.enable.vector.search=false
solr.embeddings.model=
solr.vector.field=
solr.topK=

Default URL for OpenAI Service is https://api.openai.com/v1/. If you are using you own OpenAI-compatible API, you can set your own URL into the rag.api.endpoint.

Technical specification

Process description

Depending on the configuration and on the Retrieval approach, the global RAG process can take three forms.

The client sends a query to the Datafari API, using one of the “RAG” endpoint:
GET https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/select?q={prompt}&action=rag
POST https://{DATAFARI_HOST}/Datafari/rest/v2.0/ai/rag

Parameters are extracted from the HTTPS request, and configuration is retrieved from rag.properties.

A search query is processed using Search API methods, based on the user prompt, in order to retrieve a list of potentially relevant documents. This search can be either a keyword-based BM25 search, or a Solr Vector Search.

In the first case, the search will return whole documents, that will require chunking to be processed by the LLM.

In the case of Vector Search, Solr will return a number of length-limited excerpts.

The “InMemory Vector Search” is managed in the VectorUtils class. It receives results from a BM25 documents, chunk them into short semantic excerpts, embeds them and store them into an “In Memory Embeddings Store”. Then, it runs a vector search using the embedded user query. Finally, it returns a list of N relevant excerpts (N is defined in rag.properties, as inMemory.maxChunks). Since the excerpts are short, a chunking strategy is not required if “InMemory Vector Search” is enabled.

Enable or disable this step by editing the rag.enable.vector.search property in rag.properties. This step is not related to Solr Vector Search.

If the feature is enabled, “Solr Vector Search” and “Chunking” options must be disabled.

Documents from BM25 search might be to big to be handled in one time by the LLM. Chunking allows to cut large documents into smaller pieces to process them sequentially.

Enable or disable this step by editing the chunking.enable property in rag.properties.

This feature is optional, but is highly recommended if you don’t use a vector search solution. The chunking uses Langchain4j DocumentSplitters.recursive(...) splitter. See this link for more information about chunking strategies.

If “Solr Vector Search” or “InMemory Vector Search” are enabled, don’t use this solution.

During prompting, the list of documents/snippets is converted into a list of prompts that will be processed by the LLM. Each prompt contains instructions (instructions are defined in the /opt/datafari/tomcat/webapps/Datafari/WEB-INF/classes/prompts folder), documents excerpts, and the user prompt as a question.

If prompts are short enough, they might be sent to the LLM into one single request to potentially improve performances. If that is not the case, we use a “Stuff Chain” method to process all chunks.

In the future, we should conduct a benchmark to compare the "Stuff Chain" and "Refining" methods for RAG. Read more about those chunking strategies here: LLM Transformation Connector

Our solution is designed to be able to interact with various LLM API. The dispatcher selects the proper connector (LlmService) to interact with the configured LLM API.

A connector is a Java class implementing our LlmService interface. It contains the “invoke” method, taking as parameter a list of prompts, and returning a simple String response from the LLM.

To this day, we one provide one LlmService:

OpenAILlmService, compatible with OpenAI API and any other OpenAI-like API (including Datafari AI Agent)

The selected connector prepares one or multiple HTTP/HTTPS queries, one for each prompt from the list. Then, it calls the external LLM API, and extracts the response.

If the list of prompts contains multiple entries, all the responses are concatenated and sent to the LLM again, to generate a final summary (see prompting strategies).

The response is formatted in JSON and sent back to the user.

Chunking

Most documents stored in the FileShare Solr collection are too large to be processed in a single request by a Large Language Model. To address this, implementing a chunking strategy is essential, allowing us to work with manageable, concise, and contextually relevant text snippets.

The chunking strategy depends on the Retrieval method. The two cases are detailed below.

Case 1: BM25 Search	Case 2: Solr Vector Search

Case 1: BM25 Search

Case 2: Solr Vector Search

The BM25 Search returns large and whole documents from FileShare. Those documents are chunked into smaller pieces during the chunking step of the RAG process.

All retrieved documents are processed by the ChunkUtils Java class. The chunkContent() method uses a Langchain4j solution: Recursive DocumentSplitters. This splitter recursively divides a document into paragraphs (defined by two or more consecutive newline characters), lines, sentences, words (…), in order to fit as many content as possible without exceeding the configured chunk size limit[1].

[1] The size of the chunks (currently in character, but should be in tokens in the future) can be configured in the AdminUI, or in rag.properties.

“RAG & AI configuration” AdminUI

rag.properties:

chunking.chunk.size=4000

In this scenario, chunking occurs during document indexing within the VectorUpdateProcessor. All files uploaded to FileShare are processed and split into smaller chunks using the DocumentByParagraphSplitter.

These chunks are then stored as new "child" documents, inheriting their parent's metadata. The chunked content replaces the original content in the child documents.

The child documents are stored in a separate Solr collection, VectorMain. Once created, each child’s content is embedded using the Solr TextToVectorUpdateProcessor.

When Vector Search is executed in Datafari, it retrieves documents from the VectorMain collection instead of FileShare, eliminating the need for additional chunking steps.

The chunking step described in Case 1 is still applied on Vector Search retrieved documents. However, depending on your configuration, this may have no effect since the retrieved contents are probably short enough.

Chunking solution with properties description.

Prompts

Prompts are a collection of "Message" objects sent to the LLM. Each "Message" contains:

A role: "user" for the user query and document content, "assistant" for AI-generated messages, or "system" for instructions.
Content: The body of the message, which may include instructions, the user query, or document content.

If the RAG process needs to manage too many or too large snippets, it may not be able to fit all of the into one single LLM request. In this situation, a prompt strategy is required. Our current approach is the Stuff Chain method, but that is subject to change. Read more about chunking management strategies in the LLM Tranformation Connector documentation.

Below are three prompt chains, associated with a RAG query.

Case 1: All prompts can fit in a single LLM Request	Case 2 : Prompts are to large to be processed at once (Map-Reduce method)	Case 3 : Prompts are too large to be processed at once (Iterative Refining method)

Case 1: All prompts can fit in a single LLM Request

Case 2 : Prompts are to large to be processed at once (Map-Reduce method)

Case 3 : Prompts are too large to be processed at once (Iterative Refining method)

The LLM is called only once, with the following prompt chain.

First, the LLM is called once per chunk set, each time with the following Message list:

Then, the LLM is called one final time:

First, the LLM is called with the first N chunks (as many chunks as a request can fit)

Then, it will recursively call the LLM for each chunk (or pack of chunks), with the following Message list:

To determine whether the prompt chain can fit within a single request, we use our own size calculator. It compares the total size of the message contents in the collection to the maximum allowed size (in characters), as defined by the llm.max.request.size parameter in rag.properties.

Available LlmServices

An LlmService is a class that implements our “LlmService.java” interface, and acts as an interface between Datafari and an external APIs leveraging Large Language Models (LLMs).

All “LlmService” classes should implement the invoke() method.

/**
 *
 * @param prompts A list of prompts. Each prompt contains instructions for the model, document content and the user query
 * @return The string LLM response
 */
String generate(List<Message> prompts, HttpServletRequest request) throws IOException;

The generate() method takes, as parameter, a list of String prompts ready to be sent. All the prompts are sent to the associated LLM API, and a single String response is returned.

Message is a Datafari Java class, with the following attributes:

String role: The role associated to the Message. Either “user”, “system”, or “assistant”.
String content: The content of the message.

The LlmService must fulfill multiple tasks:

Override the generate() method.
Provide default configuration if relevant (default model, maxToken, specific configuration…)
Call an external LLM Service, using the proper configuration as defined in the rag.properties file (endpoint temperature, maxTokens, API key…).
Format the LLM response to a simple String.
Implement a constructor taking at least a RagConfiguration object as parameter.

Currently there is only one available LlmServices: OpenAiLlmService. It can be used with any OpenAI-compatible API, including our Datafari AI Agent. More may be developped in the future, implementing the LlmService Java interface.

OpenAI LLM service

This connector can be used to access OpenAI API, or any other API that uses OpenAI signature, including our Datafari AI Agent . The default model, gpt-3.5-turbo, can be changed by editing the rag.model property.

If you are planning to use your own OpenAI-like solution, edit the rag.api.endpoint property. Default value is https://api.openai.com/v1/

Vector Search

To enhance the relevance of document excerpts sent to the LLM, we have implemented vector search solutions. This machine learning-based approach represents semantic concepts as vectors, offering more accurate results than traditional keyword-based search. Additionally, vector search improves retrieval quality in multilingual datasets, as it relies on semantic meaning rather than exact wording.

InMemory Vector Search (deprecated)

In a first alpha version, we implemented a local vector-store solution, provided by Langchain4j : InMemoryEmbeddingStore.

In this scenario, documents were first retrieved with a keyword BM25 search. Then the were processed by the EmbeddingStoreIngestor : they were chunked, chunks were translated to vectors, and stored in the local vector database.

Then, the user query was embedded and used to retrieve relevant chunks. This solution could be considered as an Hybrid Search approach, since it combined keywords-based search and vector search.

However, this solution had low performances, have been replaced with Solr Vector Search, and is now deprecated. It is now deprecated, and will be removed in a future version.

Solr Vector Search

Solr Vector Search uses the new text-to-vector feature provided by Solr 9.8. The purpose is to replace the current BM25 search and the local vectore store by a full vector search solution (and in the future, an hybrid search solution for even more relevant results).

Our VectorUpdateProcessor process all documents that are indexed into the FileShare Solr collection. Documents are split into chunks, those are embedded, and stored into the VectorMain collection.

Those chunks can know be searched using our new “/vector” handler.

The following query can be used to process a vector search through the API.

https://{DATAFARI_HOST}/Datafari/rest/v2.0/search/vector?queryrag={prompt}&topK={topK}

queryrag or q (required) : The user query. The "queryrag" parameter is required by Solr; however, if it is missing, Datafari will automatically populate it with the value of "q".
topK (optional) : The number of results to return. (default: 10, editable in “RAG & AI confirugation” AdminUI)
model (optional) : The active embeddings model name, as defined in Solr. By default, Datafari automatically uses the value stored in solr.embeddings.model in rag.properties (editable in “Solr Vector Search” AdminUI). Unless you are experimenting with multiple models, or you are directly requesting Solr API (and bypassing Datafari API), you probably don’t need to use this parameter.

Read more about Solr Vector Search set-up and configuration in the dedicated documentation: Datafari Vector Search

Chat Memory (for RAG)

For models that support conversational context, it is possible to enable chat memory within the RAG process.

As of April 2025, no back-end storage is provided. The chat history must be managed client-side, typically in the UI or frontend application.

Enable Chat Memory

To activate chat memory:

Enable the option in the AdminUI or in rag.properties:

In “RAG & AI configuration” AdminUI:

In rag.properties:

chat.memory.enabled=true

Define the maximum number of messages to include in the context with:

In “RAG & AI configuration” AdminUI:

In rag.properties:

chat.memory.history.size=8

By default, 6 messages are included: 3 user messages + 3 assistant responses.

Keep in mind: all chat history is included in the prompt and consumes part of the model’s context window.

Therefore, prefer using models with a large context length based on your needs, and adjust chat.memory.history.size and prompt.max.request.size accordingly.

Using Chat Memory in API calls

To include chat history when calling the /ai/rag endpoint, use the optional history field in your JSON payload. Chat history will be added to the LLM context during RAG Generation processes.

Example:

POST https://DATAFARI_HOST/Datafari/rest/v2.0/ai/rag

{
    "query": "What is my dog's name ?",
    "lang": "fr",
    "history": [
        {
            "role":"user",
            "content": "I just adopted a black labrador. I called her Jumpy."
        },
        {
            "role":"assistant",
            "content": "How nice ! I am sure she will be happy with you."
        },
        {
            "role":"user",
            "content": "What is the capital of France?"
        },
        {
            "role":"assistant",
            "content": "La capitale de la France est Paris, d'après le document `Capitale de la France`."
        }
    ]
}

This chat history will be included in the prompt and passed to the LLM (in each request), providing contextual awareness for more coherent and personalized responses.

Datafari will run a full RAG process, based on the query "What is my dog's name ?". Optionnaly, the query may be dynamically rewritten to include the chat history based on the query rewriting method. But if not, the chat history will be used AFTER the chunks have been retrieved. This also means that the documents chunks retrieved from earlier questions of the user (in the same discussion or any other) are NOT used again. Only the documents chunks retrieved at the nth query, are used for the nth answer.

To summarise:

Case 1: query rewriting is not activated

Step 1.1: only the query is used for retrieving documents chunks
Step 1.2: the query is combined with past couples [query/generated response]
Step 1.3: this modified query is sent to the LLM

Case 2: query rewriting is activated

Step 2.1: the initial query is rewritten to take into account past couples [query/generated response]
Step 2.2: this modified initial query is used for retrieving documents chunks
Step 2.3: the initial query is combined with past couples [query/generated response]
Step 2.4: this initial query is sent to the LLM

Here is the response to the example request.

{
    "content": {
        "documents": [],
        "message": "Le nom de votre chien est Jumpy."
    },
    "status": "OK"
}

Query rewriting

The user queries sent to the RAG endpoint are written into a chatbot, and therefor may not be a proper search query. That is why we added an optional “query rewriting” step, that call the LLM in order to generate a new search query, based on the chat history (if provided) and the user initial query. We use the following prompt (template-rewriteSearchQuery.txt):

Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
######
Conversation history:
{conversation}
######
New question:
- user: {userquery}
######

You have access to a Search Engine index with 100's of documents.
Generate a search query based on the conversation and the new question.
Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
Do not include any text inside [] or <<>> in the search query terms.
Do not include any special characters like '+'.
If the question is not in English, translate the question to English
before generating the search query.
If you cannot generate a search query, return just the number 0.

Sources: RAG techniques: Cleaning user questions with an LLM

The {conversation} tag is replaced with multiple lines (one per message from the provided history), each using the following format:

- {message.role}: {message.content}

If no history in provided, the “conversation” remains empty.

Enable query rewriting in the “RAG & AI configuration” AdminUI, or in rag.properties configuration file:

In “RAG & AI configuration” AdminUI:

In rag.properties:

chat.query.rewriting.enabled=true

For a better user experience, we highly recommend enabling this feature if chat memory is enabled.

Security

Document security

Security is a major concern, and was a central element in our technical decisions. One of the main advantages of the Retrieval Augmented Generation is that the LLM only uses the data it is provided with to answer a question. As long as we control the data sent to the model, we can prevent any leaks.

“Prompt injection” is a set of techniques used to override original instructions in the prompt, through the user input. As our prompt does not contain secret, confidential or structural elements, we consider that it is not a serious issue if a user is able to read it.

Datafari Enterprise Edition provides a security solution, allowing enterprises to set up access restrictions on their files in Datafari. Our RAG solution respects this security. If security is enabled, any user can run a RAG search on Datafari Enterprise Edition. However, the retrieval part of the RAG (BM25 or Vector) will be processed through Datafari SearchAPI to retrieve available documents. If the user is not allowed to see a document, this document will not be retrieved and won’t be sent to the external LLM service. That way, it is impossible for a user to use the RAG tools to retrieve information he should not be able to access.

More information in Datafari Enterprise Edition here.

Prompt security

To reduce the risk of malformed prompt due to bad poor quality (or malicious) user input or indexed content, those are cleaned when added to the prompt.

The following method is applied to RAG sources (document chunks):

    /**
     * @param context The context, containing documents content
     * @return A clean context, with no characters or element that could cause an error or a prompt injection
     */
    public static String cleanContext(String context) {
        context = context.replace("\\", "/")
                .replace("\n", "\\n")
                .replace("\r", " ")
                .replace("\t", " ")
                .replace("\b", "")
                .replace("'''", "")
                .replace("######", "") // This string is specifically used as separator in our default prompts, and should be avoided in context
                .replace("\"", "`");
        return context;
    }

The user query is sanitize with the following method:

    /**
     * @param query The user query
     * @return A clean query, with no characters or element that could cause an error or a prompt injection
     */
    public static String sanitizeInput(String query) {
        if (query == null || query.isEmpty()) {
            return "";
        }

        // Normalize Unicode characters (é → e)
        query = Normalizer.normalize(query, Normalizer.Form.NFD);
        query = query.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

        // Remove control characters (including newlines)
        query = query.replaceAll("\\p{Cntrl}", " ");

        // Escape or neutralize Lucene/Solr special characters
        // These characters have special meaning in Solr query parsers (e.g., edismax)
        // They may also have side effects with the LLM
        // Here, we replace them by space to avoid misparsing
        String[] specialChars = {
                "+", "&&", "||", "{", "}", "[", "]", "\n",
                "^", "~", "*", "\\", "<", ">", "=", "#"
        };
        for (String ch : specialChars) {
            query = query.replace(ch, " ");
        }
        // Replace multiple whitespace with single space
        query = query.replaceAll("\\s+", " ");
        // Length limit for the user query arbitrarily set to 500 char
        int maxLength = 500;
        if (query.length() > maxLength) {
            query = query.substring(0, maxLength);
        }

        return query.trim();
    }

Datafari RagAPI - RAG - BETA VERSION

Valid from Datafari 6.2

Introduction

What is RAG?

Classic search (BM25) VS Vector Search

How does RAG work in Datafari ?

LLM external webservice?

Endpoints

Deprecated

Parameters for the /search/*?action=rag endpoint

Response structure for the /search/*?action=rag endpoint

Configuration

Via Admin UI

Via properties file

Global AI/RAG properties

LLM Web services parameters

LLM parameters

Datafari RAG pre-processing properties

Prompting

Chat Memory

Solr Vector Search

Examples of configuration

Technical specification

Process description

Chunking

Prompts

Available LlmServices

OpenAI LLM service

Vector Search

InMemory Vector Search (deprecated)

Solr Vector Search

Chat Memory (for RAG)

Enable Chat Memory

Using Chat Memory in API calls

Query rewriting

Security

Related content