Datafari Vector Search
Valid from Datafari v6.2
This documentation explains how to install, configure and use the Solr Vector Search within the RAG features or through the Datafari API. It is subject to change.
What is vector search and how is it useful?
Vector search is using a vectorised representation of documents. More precisely, a dense vector representation in our case, since one could see BM25 as a sparse vector search mechanism. Dense vector search using certain pre-trained sentence transformers allows to manage semantic search, better than BM25, that is why vector search is useful in certain scenarios.
How to enable vector search features?
Since the v6.3, the Solr Vector Search feature comes with a dedicated page on the AdminUI. At each of the steps below, we illustrate how to configure it for using openAI cloud with GPT4oXYZ
with your account token ZYXW
.
Go to the Extra Functionalities > Solr Vector Search page, in the Admin Menu.
The textarea is a read-only field that shows the JSON configuration that will be stored in Solr. It can be edited by setting the associated fields above.
More information about the model configuration in the Solr Text-to-vector documentation.Switch the “Enable vector search” button to “On”.
In the “Select an existing model configuration, or create a new one” list, pick “Add a new embeddings model”.
If there is already one model (or more) configured in Solr, it appears in this list. You can select it here and skip the model creation (steps 4 to 8), or edit it. You still need to make sure that is it tag as “Active model” (step 9).Select a model configuration templates (required). Available templates are:
- OpenAI (for OpenAI API or any other compatible API) => the one we pick for our example
- Datafari AI Agent (same interface than OpenAI, but the template’s default values are for the Datafari AI Agent)
- Hugging Face (for Hugging Face API)
- Mistral (for Mistral Cloud)
- Cohere (for Cohere’s API)Name the model configuration (required).
Model configuration names are identifier, and must be unique. If you create a new model configuration with the name of an existing one, the existing one will be overriden.
Only use alphanumerical characters,
Default (and recommanded) value is “default_model”
In our example, we are using the default value: “default_model”.
Write the name of the embeddings model that will be used by the external service (required). Depending on the selected template, a default value is provided.
In our example, we are using gpt4o-XYZ.Type the base URL of the external service (required). Depending on the selected template, a default value is provided.
In our example, we are using OpenAI API.Enter your security token in the “API key” field (required). This If you are using Datafari AI Agent, use a placeholder key (e.g.:
XXXXX
). => for our openAI example, put the key available in your openAI account.For our openAI example, our key is “ZYXW”. Use the key available in your openAI account.Set this model as Solr active embeddings model. Unless you are not planning to use the model you are adding for vector embeddings, you probably want to check this option.
Select a vector field (required). The vector field must match the dimension of the vector generated by the selected models. If you can’t find the dimension you need amongst the available model, consider creating it in Solr configuration.
Supposing that or “gpt4o-XYZ” generates 384 dimensions vectors, we will be using the “vector_384” field in our example.Configure the filters that will be applied during chunking. Content that does not match all requirements will not be embedded, nor indexed into VectorMain. Set to 0 to ignore those filters.
Save, and wait a few seconds. If everything went fine, the model list should now contain your new model configuration.
The newly created model configuration now appears in the list. As it has been set as the “active model”, it is automatically selected on page loading.
Launch your job in ManifoldCF. The VectorMain should soon be populated with subdocuments, containing semantic vectors.
Manual configuration is documented below.
Examples of configuration
Here are two examples of Vector Search configuration. For a quick & easy installation:
Check “Enable vector search”
Select “Add a new embeddings model”.
Check all the checkboxes
Configure the embeddings model:
| Embeddings with OpenAI | Embeddings with Datafari AI Agent |
---|---|---|
Requirements |
|
|
Configuration |
|
|
Save
How does it work?
Work in progress