Valid from Datafari 6.2

Introduction

This feature is a work-in-progress and experimental collection of API endpoints, that should be added to Datafari API.

See our RAG feature: Datafari RagAPI - RAG . As RAG is considered as a part as the search process, it is not included in the AiPowered API. However, both share common elements, such as configuration.

Configuration

Before using the AiPowered endpoint, your Datafari needs to be properly configured. Use the “RAG & AI configuration” AdminUI to set up your environment. You will also need an AI web service that can run Large Language Models.

Prefix

Every endpoint in this API are prefixed by /rest/v2.0/ai.

Do also remember to add the relevant path to your Datafari web app (https://[datafaridomain]/Datafari/rest/v2.0/ai/[endpoint])

Response format

Responses may differ depending on the endpoint. All responses are JSON.

{
    "status":"OK",
    "content": {
        ...
    }
}

This is the template for error responses:

{
    "status":"ERROR",
    "content":{
        "message":"...",
        "documents": [],
        "error":{
            "code":...,
            "label":"...",
            "message": "..."
        }
    }
}

The “content” provides a default responses in English that can be displayed as a chatbot. However, since its “message” is not localized, we recommend using “error.label” as a translation key to display a message in the proper language.

The “error.message” field is only used in case of exceptions, and provides a technical description of the incident.

Field	Description

Field	Description
`status`	`"OK"` in case of success, `"ERROR"` in case of failure
`content.message`	The generated response of the LLM to be displayed to the user. In case of error, it provides a default message in English, safe to display in a chatbot UI
`content.documents`	An array, always present to maintain response structure. It can be empty, or contain a list of documents (e.g. sources used by the LLM to generate a RAG response)
`content.error.code`	Numeric error code, for internal or programmatic handling
`content.error.label`	Translation key, to display a localized user-friendly message (instead of the `content.message`)
`content.error.reason`	Technical error description. Optional, not user friendly.

{
    "status":"ERROR",
    "content":{
        "message":"Sorry, I couldn't find any relevant document to answer your request.",
        "documents": [],
        "error":{
            "code":428,
            "label":"ragNoFileFound"
        }
    }
}

Above is the list of the existing error labels, with their associated default message (in English):

ragErrorNotEnabled: Sorry, it seems the feature is not enabled.
ragNoFileFound: Sorry, I couldn't find any relevant document to answer your request.
ragTechnicalError: Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator.
ragNoValidAnswer: Sorry, I could not find an answer to your question.
ragBadRequest: Sorry, It appears there is an issue with the request. Please try again later, and if the problem remains, contact an administrator.
summarizationErrorNotEnabled: Sorry, it seems the feature is not enabled.
summarizationNoFileFound: Sorry, I cannot find this document.
summarizationTechnicalError: Sorry, I met a technical issue. Please try again later, and if the problem remains, contact an administrator.
summarizationBadRequest: Sorry, It appears there is an issue with the request. Please try again later, and if the problem remains, contact an administrator.
summarizationEmptyFile: Sorry, I am unable to generate a summary, since the file has no content.

Endpoints

All available AiPower API endpoint are documented here.

Method	Endpoint	Description

Method	Endpoint	Description
POST	/summarize	Returns the summary of a Solr documents if it exists, or generate it otherwise.
RAG	/rag	Process a RAG search. If ID is provided, only the associated document is used for information retrieval.

POST /summarize

Returns the summary of a Solr documents if it exists, or generate it otherwise. The Solr is retrieved from the FileShare collection using Datafari search, so the security is maintained.

REQUEST

POST https://[datafaridomain]/Datafari/rest/v2.0/ai/summarize

Request body:

{
    "id": "[any_solr_document_id]",
    "lang": "[language_code]"
}

RESPONSE

POST https://[datafaridomain]/Datafari/rest/v2.0/ai/summarize

Response body:

{
    "content": {
        "message": "The document provides detailed financial information and ...[truncated]... & Trade Resources."
    }
    "status":"OK"
}

Parameters

Name	Description	Optional ? (Default value)

Name	Description	Optional ? (Default value)
id	The Solr ID of the document to summarize.	No
lang	The code of the expected response language. Allowed values are “en”, “fr”, “it”, “pt”, “de”, “es”, and “ru”.	Yes (`en` if user has no preferred language)

Processing details

As most documents are too large to be processed at once by a Large Language Model, the service requires a chunking solution.

Document is retrieved from Solr by its ID, using Datafari search.
The document content is extracted from field exactContent.
The content is chunked into smaller sub-documents. Chunk size (in characters) can be configured in the AdminUI (RAG & AI configuration), or in rag.properties by setting the chunking.chunk.size property.
Each document is sent to a LLM for summarization.
The LLM is sent one more time (if needed) the generate a global summary from its previous responses.

The summary generation can be disabled from the “RAG & AI configuration” AdminUI, or by directly setting ai.summarization.enabled attribute to false in rag.properties.

POST /rag

Processes a RAG (Retrieval Augmented Generation) request. The LLM generates a response to the user query/question, based on retrieved documents.