AI Agent - API documentation

AI Agent - API documentation

Valid for Datafari 6.2

Introduction

The AI Agent is a feature developped and provided by France Labs, in order to answer various needs requiring Artificial Intelligence solutions, such as RAG (Retrieval Augmented Generation) or vector embeddings (used for semantic vector search).

It is a simple API that encapsulates one or several AI models. The AI Agent can be installed on a server or on a virtual machine.

The purpose of the AI Agent is to improve Datafari’s capabilities by adding a powerful AI layer. The main features of this component are:

  1. Generic LLM services: The web service uses advanced language models. These can be used, for example, for RAG (Retrieval Augmented Generation), summarization or categorization…

  2. Use your favorite LLMs: Our solution allows you to use any compatible Large Language Model (requires GGUF format). Models can be manually downloaded and installed, or dynamically downloaded from Hugging Face.

  3. Vector Embeddings: Converting data, images or concepts into vectors is a central step for Vector Semantic Search, or for Natural Language Processing (NLP).

Find the project on our Gitlab: https://gitlab.datafari.com/sandboxespublic/datafari-ai-agent

Installation documentation: AI Agent - Installation and configuration

Technical documentation: AI Agent - Technical documentation

Datafari AI Agent is an experimental feature, mosty used for internal tests and vector embeddings. Since Datafari 7.0, it is no longer supported as LLM inference, since it does not support streaming and tool calling.

API endpoints

The agent provides multiple endpoints. Each of them is detailed in a dedicated section.

Method

Endpoint

Description

Method

Endpoint

Description

GET

/models

Returns the list of available models.

POST

/embeddings

Convert a text or la list of texts into vectors. This OpenAI-like endpoint can be called using OpenAI-compatible librairies like Langchain or Langchain4j.

POST

/chat/completions

An OpenAI-like endpoint for chat completion, that can be called using OpenAI-compatible librairies like Langchain or Langchain4j.

GET

/health

Check the status of the API and the size of the queue.

GET

/debug/perf

Runs a simple LLM call to debug and check LLM performances.

GET

/debug/perf_embeddings

Runs a simple embeddings request to debug and check embeddings performances.

Any error or exception occurring in the agent generates a JSON, containing the description of the event:

{ "error": "{error_message}" }

 


GET /models

Get a list of available models from the ./model repository. It also specifies the default model.

REQUEST

GET http://[HOST]:[PORT]/models

RESPONSE

GET http://[HOST]:[PORT]/models

Response body:

{ "models": [ "mistral-7b-openorca.Q4_0.gguf", "llama-2-7b-chat.Q2_K.gguf" ], "default_model":"mistral-7b-openorca.Q4_0.gguf" }

Those are locally installed or downloaded models, and can be used by setting their filename into the "model", in /chat/completionor /embeddings endpoints. However, unless the “ONLY_LOCAL_MODELS” option is enabled in the .env file, you can still specify "model_repository" and "model" to download new models from Hugging Face platform.


POST /embeddings

This endpoint can be used to convert a text (into) info a vector. The dimensions of the generated vector depends on the selected model.

REQUEST

POST http://[HOST]:[PORT]/embeddings

Request body:

{ "input": "Roses are red, violets are blue.", "model": "mistral-7b-instruct-v0.1.Q2_K.gguf", "model_repository": "TheBloke/Mistral-7B-Instruct-v0.1-GGUF", }

The request body is a JSON object.

RESPONSE

POST http://[HOST]:[PORT]/embeddings

Response body:

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [ -0.006929283495992422, -0.005336422007530928, ... (omitted for spacing) -4.547132266452536e-05, -0.024047505110502243 ], } ], "model": "./models/mistral-7b-instruct-v0.1.Q2_K.gguf", "usage": { "prompt_tokens": 23, "total_tokens": 23 } }

 

Parameters

Name

Description

Optional ? (Default value)

Name

Description

Optional ? (Default value)

input

The maximum size (in tokens) of the response. Recommended value: 200

No

model

The file name of the model to be used. To see available models, use the /models endpoint.

If the select models does not exist, the default model might be used.

Yes (all-MiniLM-L6-v2.Q8_0.gguf)

model_repository

Use this if you want to download a new GGUF Hugging Face model, that is not listed in the /models endpoint.

If you use this parameter, the model is required.

If this value is set and if the AI Agent can’t find the model (locally or from Hugging Face), it will return an error, and won’t use the default model.

Yes (leliuga/all-MiniLM-L6-v2-GGUF)

The embeddings endpoint uses OpenAI-like API signatures in order to be compatible with existing tools, such as Langchain.


POST /chat/completions

This OpenAI-like endpoint calls a method that can process one or multiple simple requests to the LLM.

REQUEST

POST http://[HOST]:[PORT]/chat/completions

Request body:

{ "messages": [ { "role": "system", "content": "You are given a document content. Categorize it in one of the following categories. If it does not fit into a category, just say 'Others': Poem, Invoice, Call for Tenders, Train ticket, Spam " }, { "role": "user", "content": "Roses are red, violets are blue. \nAll those butterflies really wan't to kill you." } ], "temperature": 1, "max_tokens": 10, "model": "mistral-7b-instruct-v0.1.Q2_K.gguf", "model_repository": "TheBloke/Mistral-7B-Instruct-v0.1-GGUF", }

The request body is a JSON object.

RESPONSE

POST http://[HOST]:[PORT]/embeddings

Response body:

{ "id": "chatcmpl-d7b61a4c-d7f3-4c81-82d5-0399c2988686", "object": "chat.completion", "created": 1733237293, "model": "./models/mistral-7b-instruct-v0.1.Q2_K.gguf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Poem" }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 120, "completion_tokens": 2, "total_tokens": 122 } }

 

Parameters

Name

Description

Optional ? (Default value)

Name

Description

Optional ? (Default value)

max_tokens

The maximum size (in tokens) of the response. Recommended value: 200

Yes (200)

model

The file name of the model to be used. To see available models, use the /models endpoint.

If the select models does not exist, the default model might be used.

Yes (default model from .env is used)

model_repository

Use this if you want to download a new GGUF Hugging Face model, that is not listed in the /models endpoint.

If you use this parameter, the model is required.

If this value is set and if the AI Agent can’t find the model (locally or from Hugging Face), it will return an error, and won’t use the default model.

Yes (default model from .env is used)

temperature

This parameter controls the randomness of the LLM answer. It is a number that should be in a range from 0 to 1. Recommended value: 0.

Yes (0)

messages

Contains a list of messages. Each message is a JSON object, that contains:

  • A “role”: “user”, “system” or “assistant”. Default: “user”.

  • A “content”: The string content of the message.

No (at least one message required)

The chat_completions endpoint uses OpenAI-like API signatures in order to be compatible with existing tools, such as Langchain.


GET /health

Get a list of available models from the ./model repository. It also specifies the default model.

REQUEST

GET http://[HOST]:[PORT]/health

RESPONSE

GET http://[HOST]:[PORT]/health

Response body:

{ "status": "ok" "queue_size":0 }

GET /debug/perf

Runs a simple LLM request using the default LLM and returns the duration of the process.

REQUEST

GET http://[HOST]:[PORT]/debug/perf

RESPONSE

GET http://[HOST]:[PORT]/debug/perf

Response body:

{ "latency_seconds": "12.2211" }

Those are locally installed or downloaded models, and can be used by setting their filename into the "model", in /chat/completionor /embeddings endpoints. However, unless the “ONLY_LOCAL_MODELS” option is enabled in the .env file, you can still specify "model_repository" and "model" to download new models from Hugging Face platform.


GET /debug/perf_embeddings

Runs a simple embeddings request using the default embeddings model, and returns the duration of the process. It also returns the size of the generated vector, and the 5 first values.

REQUEST

GET http://[HOST]:[PORT]/debug/perf_embeddings

RESPONSE

GET http://[HOST]:[PORT]/debug/perf_embeddings

Response body:

{ "latency_seconds": "0.1204", "dimensions":394, "first_values": [0.2354, -1.221, -0.9802, 0.0107, -0.9892] }

Those are locally installed or downloaded models, and can be used by setting their filename into the "model", in /chat/completionor /embeddings endpoints. However, unless the “ONLY_LOCAL_MODELS” option is enabled in the .env file, you can still specify "model_repository" and "model" to download new models from Hugging Face platform.

Start and stop the services

Once the AI Agent fully installed and configured, you can start and stop the services using the script available in the ./bin directory.

Use the following commands from the project root.

To start the services:

bash bin/start.sh

To stop the services:

bash bin/stop.sh

Execution logs can be found in ./logs/aiagent-*.log.

Dynamically add new GGUF models

By default, our solution is configured to download models from Hugging Face. The current default model is bartowski/Ministral-8B-Instruct-2410-GGUF/Ministral-8B-Instruct-2410-Q4_K_M.gguf (from https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-GGUF ).

New models can be download via the /chat/completions endpoint, by setting the “model” and the “model_repository” parameters in your request.

If the names model does not exist into the models folder AND if the "ONLY_LOCAL_MODELS" is not set to true, the AI Agent will try to retrieve and download it from Hugging Face. The model will then be foundable in the models folder, and referenced by the /models endpoint.

To manually install a new models or to download models using the model_manager script, check the AI Agent - Installation and configuration documentation.

Deployment options