This documentation is a recap of our AI-related features (such as RAG, summarization, LLM Transformation Connector…).

RAG and other AI-Powered APIs

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3136552962

The Datafari RagAPI documentation provides technical and functional about our Retrieval-Augmented Generation (RAG) solution.

RAG API endpoint
How does RAG work?
AI-Related features configuration (rag.properties, AdminUI)
Prompting and chunking strategies for RAG
What are LlmServices and how to create a new one?
Vector Search options: Solr Vector Search or InMemory Vector Search?
Security notions in RAG

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3619946497

AI Powered API is a collection of AI-related API endpoints. See Datafari RagAPI for configuration.

RAG
per-document RAG
Dynamic summarization (for index-time summarization, see LLM Transformation Connector )

Vector Update Processor : Chunking and Solr Vector Search

The Solr Vector Search (used by RAG processes or through Datafari API) requires that all indexed documents are chunked into short snippets, and that all snippets content are converted into semantic vectors. Those snippets are stored in a separated Solr collection: VectorMain.

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3920297985

The VectorUpdateProcessor processes all documents added into the main collection. Documents are chunked, and chunks are sent to the VectorMain collection.

What is Vector Search?
How does it work?
Technical specifications

This documentation needs to be updated

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3503751175

This document provides information about the global Solr Vector Search process.

How to enable Vector Search features ?
How does it work?

This documentation needs to be updated

LLM Transformation Connector

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3462168580

The LLM Transformation Connector can be used in a ManifoldCF pipeline to process LLM-related operations on crawled documents (Summarization, categorization…)

How does it work?
How to use it?
How to create a “Categories” facet in the UI?

Datafari AI Agent

Our AI-Powered features require a external service able to run Large Language Models and/or Embeddings Models. We currently provide a compatibility with any OpenAI-like API, by using Langchain4j library.

We also created our own self-hosted LLM API : Datafari AI Agent (DAIA).

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3524853768

Work in progress

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3522854915

API documentation for the Datafari AI Agent.

List and description of API endpoints
Start and stop the services
Dynamically add new models

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3536781313

Technical documentation for the Datafari AI Agent.

Tools and frameworks
Supported models
How to configure it?
How does it work?
GPU support
Exploitation scripts

https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3674013697

Internal only

RAG and others AI-powered features - Recap

RAG and other AI-Powered APIs

Vector Update Processor : Chunking and Solr Vector Search

LLM Transformation Connector

Datafari AI Agent