AI Powered Datafari API

AI Powered Datafari API

Valid from Datafari 7.0

Introduction

This feature is part of the Datafari APIs available, and it may be added later to the page Datafari API.

This part of the API contains all the AI-related endpoints, that may be called by the Datafari Chat Bot.

Configuration

Before using the AiPowered endpoint, your Datafari needs to be properly configured. Use the “RAG & AI configuration” AdminUI to set up your environment. You will also need an AI web service that can run Large Language Models.

API endpoints

Every endpoint in this API are prefixed by /rest/v2.0/.

Do also remember to add the relevant path to your Datafari web app (https://[datafaridomain]/Datafari/rest/v2.0/ai/[endpoint])

Currently, the AiPowered API provides two endpoints.

 

POST /ai

POST /ai/stream

 

POST /ai

POST /ai/stream

Description

The “no-stream” endpoints takes a JSON as request body, and returns a full JSON once the process is done.

The “stream” endpoints takes a JSON as request body, and streams “NDJSON” (Newline-Delimited JSON) events to the client, allowing progressive rendering of the messages and transparence on the progression in the Chat Bot.

Payload structure

The endpoints takes a JSON input body. Each parameter is detailed below. Some may be optional or required depending on the action.

{ "query": "[user_query]", "action": "[rag|summarize|agentic]", "id": "[any_solr_document_id]", "agent": "[any_solr_document_id]", "lang": "[language_code]", "history": [ [chat_history] ], "filters": { "id": [ [list_of_ids] ], "fq": [ [list_of_solr_filter_queries] ], }, "conversationId": "[conversationId]" }

The endpoints takes a JSON input body. Each parameter is detailed below. Some may be optional or required depending on the action.

{ "query": "[user_query]", "action": "[rag|summarize|agentic]", "id": "[any_solr_document_id]", "agent": "[any_solr_document_id]", "lang": "[language_code]", "history": [ [chat_history] ], "filters": { "id": [ [list_of_ids] ], "fq": [ [list_of_solr_filter_queries] ], }, "conversationId": "[conversationId]" }

Response structure

The returned response is a standard Datafari API JSON response, with a status (OK or ERROR) and a content (that can be an “error” object”).

{ "status":"OK|ERROR", "content": { "message":"[ai_generated_response]",, "conversationId": "[conversationId]", "sources": [ [list_of_retrieved_sources] ], "docs": [ [search_results] ], "error":{ "code": "[error_code]", "label":"[error_label]", "message": "[error_userfriendly_message]", "reason": "[technical_error_desc]" } } }

The server progressively streams events with the form of Newline-delimited JSON (application/x-ndjson). Each line corresponds to one event. Each event has the following properties:

  • type: The type of event (stream.started, sources.add, messages.add, tool.call…)

  • data: The payload of the event. It is a key → object map, containing information and details about the event.

  • ts: The timestamp of the event.

{"type":"[event_type]","data":{ [event_payload] },"ts":[timestamp]}

The different event types and their associated payloads are detailed in the “Stream events” section.

The connection can be closed in client’s side when an “error” or “stream.complete” is received.

{"type":"stream.completed","data":{"status":"OK"},"ts":1759406565419}
{"type":"error","data":{"code":"[err_code]","label":"[err_label]",...},"ts":1759406565419}

In case of error, the API provides a default response in English that can be displayed via a chatbot. However, since this “error.message” is not localized, we recommend using “error.label” as an i18n translation key to render the message in the proper language.

Request body fields description

Field

Required for

Optional for

Description

Field

Required for

Optional for

Description

action

SUMMARIZE, AGENTIC, SEARCH, SYNTHESIZE

RAG

The action that must be processed. Available actions are:

“rag”: (default) Retrieval Augmented Generation.

“summarize”: Document summarization.

“agentic”: Call an existing Agent.

“search”: Run a search in Datafari.

query

RAG, AGENTIC, SEARCH

SUMMARIZE, SYNTHESIZE

The user query, for RAG or AGENTIC.

For SUMMARIZE or SYNTHESIZE, an optional query can be provided to set the content of the message stored in database.

id

SUMMARIZE

RAG

The ID of a Solr document.

  • For summarization: defines the document to summarize.

  • RAG and AGENTIC services only use the associated document.

filters

SYNTHESIZE

RAG, AGENTIC

Add one filter or more to restrict document retrieval in RAG and agentic processes.

Current allowed values are:

  • id: a list of document IDs

  • fq: a list of Solr filter queries. It can be used to provide facets filters.

You can provide search parameters other than “fq” to customize retrieval operations. However, we highly recommend sticking to the “fq” to avoid search failures.

The LLM will not be able to use excluded documents.

For SYNTHESIZE action, the filters.id field is required, and must contain the IDs of all the documents to synthesize.

Example:

"filters": { “id”: [“docIdNumber1”, “docIdNumber2”, “docIdNumber6”], "fq": ["repo_source:FileShare", "{!tag%3Dlanguage}(language%3A"en")"] }

In the example above, any document with an ID different from the ones listed, that does not belong to the “FileShare” repository OR that is not in English will be excluded.

agent

 

AGENTIC

Only for AGENTIC. Select one of the available agents (default: “rag”, for Agentic RAG).

Currenty, only the “rag” agent is available. More agents may be added in the future.

lang

 

ALL SERVICES

A two letters language code to specify the expected language of the response (“en” for English, “fr” for French…).

If empty, the server will try to retrieve the user’s favorite language in database. English is used by default.

conversationId

ALL SERVICES

 

The ID of the conversation.

  • Only for logged users.

  • This ID can be retrieved from a previous response, a “conversation” stream event, or by calling the /users/conversations API endpoint.

  • If no ID is provided and if the conversation storage is enabled and if the user is logged, a new conversation is created. The ID of the created conversation is returned in the server response.

  • This ID is used to save chat messages in the Database. Anonymous user’s conversations can’t be saved.

history

 

RAG, AGENTIC

Include the chat history to the request. If enabled in RAG configuration, the LLM can use this history during query rewriting and response generation.

This field is a list of ChatMessage objects, with a “role” property (either “user” for user queries or “assistant” for AI generated messages), and a “content” property containing the text of the message.

[ { "role": "user", "message": "what is enron ?" }, { "role": "assistant", "message": "ENRON is a multinational (...)" } ]

Error messages and label

Above is the list of the existing error labels, with their associated default message (in English):

  • ragErrorNotEnabled: Sorry, it seems the feature is not enabled.

  • ragNoFileFound: Sorry, I couldn't find any relevant document to answer your request.

  • ragTechnicalError: Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator.

  • ragNoValidAnswer: Sorry, I could not find an answer to your question.

  • ragBadRequest: Sorry, It appears there is an issue with the request. Please try again later, and if the problem remains, contact an administrator.

  • summarizationErrorNotEnabled: Sorry, it seems the feature is not enabled.

  • summarizationNoFileFound: Sorry, I cannot find this document.

  • summarizationTechnicalError: Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator.

  • summarizationBadRequest: Sorry, It appears there is an issue with the request. Please try again later, and if the problem remains, contact an administrator.

  • summarizationEmptyFile: Sorry, I am unable to generate a summary, since the file has no content.

  • synthesisErrorNotEnabled: Sorry, it seems the feature is not enabled.

  • synthesisNoFileContent: Sorry, I am unable to generate a synthesis of those documents, since the files are missing or have no available content.

  • synthesisTechnicalError: Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator.

 

Available services

WORK IN PROGRESS

Both endpoint (/ai and /ai/stream) can be used to call any available AI Services, by providing the action parameter.

Service

Value of “action” field

Description

Parameters

 

Service

Value of “action” field

Description

Parameters

 

RAG

“rag”

The services runs a search in Datafari to retrieve relevant sources, and provides them to the LLM so it can answer the user query or question.

More information about RAG here: Retrieval-Augmented Generation (RAG)

  • action (optional): “rag”

  • query: The user query.

  • id (optional): If provided, the LLM will read the whole document (and only this one, and by document we mean the extracted text that is stored in the index, not the binary of the source document) to answer the user query.

  • filters(optional): If provided, the LLM will only be able to retrieve the documents that match the specified conditions.

  • lang (optional): The expected language of the response.

  • history (optional): The conversation history.

  • conversationId(optional): The conversation ID.

 

SUMMARIZE

“summarize”

The service retrieves or generates the summary of a document.

  • action: “summarize”

  • id: This document is retrieve, and recursively summarize using the “iterative refine” method.

  • lang (optional): The expected language of the generated summary.

  • conversationId(optional): The conversation ID.

 

SYNTHESIZE

“summarize”

The service generates a synthesis of multiple documents, based on individual summaries.

  • action: “synthesize”

  • filters: The “id” field contains the list of the documents IDs to include in the synthesize.

  • lang (optional): The expected language of the generated summary.

  • conversationId(optional): The conversation ID.

 

AGENTIC

“agentic”

The service calls an “agent”, an AI Augmented program that is able to use various tools at its disposal (such as search, entity extraction, RAG, summarization…) to answer the user query.

Currenty, only the “rag” agent is available. More agents may be added in the future.

  • action: “rag”

  • query: The user query.

  • agent (optional): Select an existing agent. Default is “rag” for Agentic RAG.

  • filters(optional): If provided, the LLM will only be able to retrieve the documents that match the specified conditions.

  • lang (optional): The expected language of the response.

  • history (optional): The conversation history. Can be read by the agent using a dedicated tool.

  • conversationId(optional): The conversation ID.

 

SEARCH

“search”

Runs a simple search in Datafari. The search results are stored in the “docs” section of the API response or stream event.

  • action: “search”

  • query: The search query.

  • conversationId(optional): The conversation ID.

 

No-stream responses

Response structure

The POST /ai endpoints returns a JSON with the following structure:

{ "status":"OK|ERROR", "content": { "message": "[ai_generated_response]", "conversationId": "[conversationId]", "sources": [ [list_of_retrieved_sources] ], "docs": [ [list_of_search_results] ], "error":{ "code": "[error_code]", "label": "[error_label]", "message": "[error_userfriendly_message]", "reason": "[technical_error_desc]" } } }

This structure is always the same, whatever the action is.

Field

Description

Application in the Chatbot

Field

Description

Application in the Chatbot

status

“ERROR” or “OK”. The status indicates if the process was successful, or if it met an error.

-

content.message

The AI generated message. Can be null or empty in case of error.

If the message is neither null nor empty, it must be rendered in the chatbot, and added to the chat history as an “assistant” message.

content.conversationId

The conversation ID. It is only present for logged users.

The conversationId must be included in the next request to the server, in order to ensure conversation continuity.

Only for logged users.

content.docs

Used for search results. If provided, content.message must be empty.

It contains a list of document, using the following structure:

"docs": [ { "url": "...", // The URL of the document "docId": "...", // The Solr ID of the document "title": "...", // The first title of the document "content": "..." // A truncated part of the document }, { "url": "...", "docId": "...", "title": "...", "content": "..." }, ... ]

When handling a message with search results, the chatbot renders of formatted assistant message instead of a regular message.

content.sources

A list of sources retrieved during the process. Sources use the following structure:

"documents": [ { "url": "...", // The URL of the document "id": "...", // The Solr ID of the document "title": "...", // The first title of the document "content": "..." // A truncated part of the document }, { "url": "...", "id": "...", "title": "...", "content": "..." }, ... ]

This field can be null or empty.

The sources must be displayed as clickable links, as long as the URL is not empty.

<a target="_blank" rel="noopener noreferrer" href="${url}">${title}</a>

 

content.error

Only present in case of error or exception.

Is null if the process is successful.

-

content.error.code

An HTTP error code of the error (ex: “500”, “400”…)

-

content.error.label

A code that can be used as an i18n key for translation, when rendering a message in the chatbot.

If content.message is null or empty, the chatbot uses the label as an i18n key and attempts to find a localized message to render. If it failes, it uses the content.error.message instead.

content.error.message

An English, user-friendly message that can be displayed in the chatbot if no translation is available.

If content.message is null or empty, the chatbot uses the label as an i18n key and attempts to find a localized message to render. If it failes, it uses the content.error.message instead.

content.error.reason

A technical description of the error.

-

Examples

Request

Response (OK)

Response (ERROR)

Request

Response (OK)

Response (ERROR)

RAG

curl -k -X POST "https://localhost/Datafari/rest/v2.0/ai" \ -H "Content-Type: application/json" \ -d '{ "query": "What is enron ?", "action": "rag", "lang": "fr", "history": [] }'

Success

{ "content": { "message": "Enron est l'une des principales entreprises mondiales dans le secteur de l'énergie (...) Enron a été reconnue par le magazine Fortune comme \"l'entreprise la plus innovante d'Amérique\" pendant six années consécutives.", "sources": [ { "content": "Enron Offgrid\n\tEnron Financial(...) \nRenewable Power Desk\nEnron North A…", "id": "file://///localhost/enron/ElliotRPD%20Overview%205_18_012.ppt", "title": "ElliotRPD Overview 5_18_012.ppt", "url": "file://///localhost/enron/ElliotRPD%20Overview%205_18_012.ppt" }, { "content": "ENRON RESERVATION PRICE\t\tCOUNTDOWN (...) AUCTION NEWS\n\t\t\tHOW TO SUBMIT A BID OR OFFE…", "id": "file://///localhost/enron/Emissions%20Auction%20SiteText.doc", "title": "Emissions Auction SiteText.doc", "url": "file://///localhost/enron/Emissions%20Auction%20SiteText.doc" }, (...) { "content": "113\n7\nContinental Power\n(Millions MWh)(...) and Physical Settled Volumes\n1999\n200…", "id": "file://///localhost/enron/EGM_Final.ppt", "title": "EGM_Final.ppt", "url": "file://///localhost/enron/EGM_Final.ppt" } ] }, "status": "OK", "conversationId": "ag9dfoilb-bdun2hh5j-15vio77fz" }

Error: No document retrieved.

{ "content": { "error": { "code": "428", "label": "ragNoFileFound", "message": "Sorry, I couldn't find any relevant document to answer your request.", "reason": "The query cannot be answered because no associated documents were found." }, "sources": [ ] }, "status": "ERROR" }

SUMMARIZE

curl -k -X POST "https://localhost/Datafari/rest/v2.0/ai" \ -H "Content-Type: application/json" \ -d '{ "id": "file://///localhost/enron/ELPASO.pdf", "action": "summarize", "lang": "en", "conversationId": "ag9dfoilb-bdun2hh5j-15vio77fz" }'

Success

{ "content": { "message": "The document provides a detailed report on gas capacity and deliveries (...) useful for analyzing trends in the energy sector.", "sources": [ ], "conversationId": "ag9dfoilb-bdun2hh5j-15vio77fz" }, "status": "OK" }

Error: Invalid document ID

{ "content": { "error": { "code": "422", "label": "summarizationNoFileFound", "message": "The document cannot be retrieved.", "reason": "Index 0 out of bounds for length 0" }, "sources": [ ], "conversationId": "ag9dfoilb-bdun2hh5j-15vio77fz" }, "status": "ERROR" }

AGENTIC

curl -k -X POST "https://localhost/Datafari/rest/v2.0/ai" \ -H "Content-Type: application/json" \ -d '{ "query": "Cite moi le nom de trois employés féminins de ENRON", "action": "agentic", "agent": "rag", "lang": "fr" }'

Success

{ "content": { "message": "Trois employés féminins de Enron sont :\n\n1. Cindy Olson - Vice-présidente des ressources humaines.\n2. Wanda Curry - Vice-présidente.\n3. Peggy Fowler - Vice-présidente et conseillère générale.", "sources": [ { "content": "Cindy Olson is Enron’s EEO officer. As Executive Vice President (...) we achiev…", "id": "file://///localhost/enron/EEO.doc", "title": "EEO.doc", "url": "file://///localhost/enron/EEO.doc" }, { "content": "ENRON\nENRON ENERGY SERVICES (...) Wanda Curry\nVice Presi…", "id": "file://///localhost/enron/EES%20Org%20Chart.ppt", "title": "EES Org Chart.ppt", "url": "file://///localhost/enron/EES%20Org%20Chart.ppt" }, (...) { "content": "ENRON EMPLOYEE REFERRAL INCENTIVE PROGRAM\n\nEnron Corp.,(...) To provide incentives for cu…", "id": "file://///localhost/enron/Employee%20Referral.doc", "title": "Employee Referral.doc", "url": "file://///localhost/enron/Employee%20Referral.doc" } ] }, "status": "OK" }

Error: Disabled feature

{ "content": { "error": { "code": "422", "label": "ragErrorNotEnabled", "message": "Sorry, it seems the feature is not enabled.", "reason": "Agentic service is disabled in configuration." }, "sources": [ ] }, "status": "ERROR" }

SEARCH

curl -k -X POST "https://localhost/Datafari/rest/v2.0/ai" \ -H "Content-Type: application/json" \ -d '{ "query": "enron", "action": "search" }'
{ "status": "OK", "content": { "message": "", "sources": [ { "id": "83e03438969c5702091caf38e9e49b5ccd868a68f2689c24dac46561b2315115", "title": "McConnell.Shankman%20List.doc", "url": "file://///fileshare.datafari.com/share/McConnell.Shankman%20List.doc", "content": "[\"Director \/ Officer Positions, etc... \\nNovember 26, 2001\\n\\nMichael S. McConnell \\n\\tCompany\/Title\\n\\t\\n\\t\\n\\t\\n\\n\\nECT Overseas Holding Corp.\\n\\n\\tDirector\\n\\t\\n\\t\\n\\n\\tChairman and Preside…" }, { "id": "cc34c60dcdb23f401267349e621bf59705751cff2b473200ecf0cbe3d23e49b0", "title": "eNovateFinSvcsAgrmt.doc", "url": "file://///fileshare.datafari.com/share/eNovateFinSvcsAgrmt.doc", "content":"[\"FINANCIAL SERVICES AGREEMENT\\n\\n\\nTHIS FINANCIAL SERVICES AGREEMENT (this \“Agreement\”), dated as of April ___, 2001, by and between ENRON CORP., an Oregon corporation (\“Enron\”), …" } (...) ], "docs": [ { "docId": "83e03438969c5702091caf38e9e49b5ccd868a68f2689c24dac46561b2315115", "title": "McConnell.Shankman%20List.doc", "url": "file://///fileshare.datafari.com/share/McConnell.Shankman%20List.doc", "content": "Director / Officer Positions, etc... \nNovember 26, 2001\n\nMichael S. McConnell \n\tCompany/Title\n\t\n\t\n\t\n\n\nECT Overseas Holding Corp.\n\n\tDirector\n\t\n\t\n\n\tChairman and President\n\t\n\t\n\n\nEI Global Fuels Ltd.\n\n\tDirector\n\t\n\t\n\n\tChairman\n\t\n\t\n\n\nEnron (Bermuda) Limited\n\n\tDirector\n\t\n\t\n\n\tChairman\n\t\n\t\n\n\nEnron A..." }, { "docId": "cc34c60dcdb23f401267349e621bf59705751cff2b473200ecf0cbe3d23e49b0", "title": "eNovateFinSvcsAgrmt.doc", "url": "file://///fileshare.datafari.com/share/eNovateFinSvcsAgrmt.doc", "content": "FINANCIAL SERVICES AGREEMENT\n\n\nTHIS FINANCIAL SERVICES AGREEMENT (this “Agreement”), dated as of April ___, 2001, by and between ENRON CORP., an Oregon corporation (“Enron”), and ENRON SUB [insert name of the Enron entity that is the managing member of enovate, LLC], a ____________ (“Enron Sub”)..." }, (...) { "docId": "5c96241ada7160ebbad05f4cbbaf903b5757a9f8484123396f155a4fb1af8f92", "title": "form%20financial%20services.doc", "url": "file://///fileshare.datafari.com/share/form%20financial%20services.doc", "content": "FINANCIAL SERVICES AGREEMENT\n\n\nTHIS FINANCIAL SERVICES AGREEMENT (this “Agreement”), dated as of April ___, 2001, by and between ENRON CORP., an Oregon corporation (“Enron”), and ENRON MW, L.L.C., a Delaware limited liability company (“Enron MW”).\n\nW I T N E S S E T H:\n\nWHEREAS, Enron MW is the m..." } ] } }

 

 

 

 

Streaming events

Traditional API calls return the full response only once the processing is complete. When using a slow model API, this can lead to noticeable latency, especially when heavy pipelines or Agents are involved.

Streaming solves this problem by:

  • Reducing perceived latency: tokens appear as they are generated, the progression of the process is visible by the user.

  • Increasing transparency: intermediate events (tool calls, source retrieval) are visible as they happen.

  • Improving UX: users receive immediate feedback and understand what the system is doing.

Event types

The response stream is composed of a sequence of events. Each event has:

  • event: the type of message (string)

  • data: the associated payload (object)

{"type":"<TYPE>","data":"<DATA>"}

The following events can be sent by any streaming chatbot-related endpoint (RAG, agentic, summarization…). More types can be implemented in the future.

type

data

Description

Application in chatbot

“stream.started”

{}

The connection is open.

The “Source section” is cleared.

“phase”

{     "name": "<phase>" }

 

 

Indicates the phase of the current process.

The data contains a single String. Below is a non-exhaustive of phase name examples.

"agent:start", "agent:done", "rag:retrieval", "rag:response generation", "summarize:start", "summarize:4/11", "summarize:done"

The phase name is displayed in the “Phase indicator”.

“message.delta”

{     "text": "<text_fragment>" }

Add a response fragment (token) to progressively display the response tokens, one by one.

The token is added to the assistant’s message in the chat window.

Due to the recursive algorithms used in our processed, summarize and RAG services do not support progressive response writing, and do not send message.delta events.

The generated response is sent in a message.final event.

“message.final”

{     "text": "<full_response>" }