RAG and others AI-powered features - Recap

RAG and others AI-powered features - Recap

This documentation is a recap of our AI-related features (such as RAG, summarization, LLM Transformation Connector…).

RAG and other AI-Powered APIs

The Datafari RagAPI documentation provides technical and functional about our Retrieval-Augmented Generation (RAG) solution.

  • RAG API endpoint

  • How does RAG work?

  • AI-Related features configuration (rag.properties, AdminUI)

  • Prompting and chunking strategies for RAG

  • What are LlmServices and how to create a new one?

  • Vector Search options: Solr Vector Search or InMemory Vector Search?

  • Security notions in RAG

AI Powered API is a collection of AI-related API endpoints. See Datafari RagAPI for configuration.

Vector Update Processor : Chunking and Solr Vector Search

The Solr Vector Search (used by RAG processes or through Datafari API) requires that all indexed documents are chunked into short snippets, and that all snippets content are converted into semantic vectors. Those snippets are stored in a separated Solr collection: VectorMain.

The VectorUpdateProcessor processes all documents added into the main collection. Documents are chunked, and chunks are sent to the VectorMain collection.

  • What is Vector Search?

  • How does it work?

  • Technical specifications

This documentation needs to be updated

This document provides information about the global Solr Vector Search process.

  • How to enable Vector Search features ?

  • How does it work?

This documentation needs to be updated

LLM Transformation Connector

The LLM Transformation Connector can be used in a ManifoldCF pipeline to process LLM-related operations on crawled documents (Summarization, categorization…)

  • How does it work?

  • How to use it?

  • How to create a “Categories” facet in the UI?

Datafari AI Agent

Our AI-Powered features require a external service able to run Large Language Models and/or Embeddings Models. We currently provide a compatibility with any OpenAI-like API, by using Langchain4j library.

We also created our own self-hosted LLM API : Datafari AI Agent (DAIA).

Work in progress

API documentation for the Datafari AI Agent.

  • List and description of API endpoints

  • Start and stop the services

  • Dynamically add new models

Technical documentation for the Datafari AI Agent.

  • Tools and frameworks

  • Supported models

  • How to configure it?

  • How does it work?

  • GPU support

  • Exploitation scripts

Internal only