Datafari API

Datafari API

Valid from Datafari 6.3

The documentation below is valid from Datafari 6.3 onwards. Some parts are still under development.

SECURITY: For Datafari Enterprise Edition only: To manage security and user ACLs in results, see this page : https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/783253542

Introduction

The Datafari API is trying to respect the principles of RESTFUL applications as much as possible and uses json as the mean to exchange data. This means that everything returned by the API will be in JSON, and any payload in POST and PUT requests made to the API will be in JSON too.

On GET and DELETE requests, some parameters may be passed through url parmeters (either as part of the URL or in the search parameter String - anything that is after the “?”)

Prefix

Every endpoint in this API is prefixed by /rest/VX.Y/ where X and Y are the major and minor versions of this API.

Do also remember to add the relevant path to your Datafari web app (https://datafaridomain/Datafari/rest/VX.Y/endpoint)

Response Structure

All responses are formatted using the following template:

{ "status": "OK|ERROR", "content": {} }

In the following, any object presented as the response is showing the format of the content key in the above template.

For errors, the content follows the following structure:

{ "code": {int}, "reason": {String}, "extra": {Object} }
  1. code: an integer using the HTTP response code nomenclature

  2. reason: a String in English giving some details if relevant to help solve the problem or give context

  3. extra: a JSON object providing some extra information from the endpoint if relevant, can be null or empty

Do keep in mind that on some occasions the server might respond with an HTTP error code without any JSON payload, you must handle this case in your applications.

Endpoints

V2.0

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

EDITION

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

EDITION

GET

search/{handler}?{query}

More details about this API further down in this documentation !
Perform a search or suggest query. See below for more explanations.

 

SPECIFIC RESPONSE FORMAT. See below for more explanations.

 

CE

GET

results/export?query={query}&facetQuery[]={facetQuery}&facetField[]={facetField}&fq[]={fq}&sort={sort}&fl={fl}&nbResults={nbResults}&type=excel

More details about this API further down in this documentation !
Create an excel file containing the results of the provided query and query parameters. The number of results is limited by the nbResults parameter. See below for more explanations.

 

A stream of the created export file. See below for more explanations.

 

CE

GET

users/current/history

Retrieve the current user history (past queries information).

 

{ "history":[ { "action":"SEARCH", "query_id": ..., "time_stamp":..., "user_id":..., "parameters":{ "query":"training", "num_hit":0 } }, { "action":"SEARCH", "query_id": ..., "time_stamp":..., "user_id":..., "parameters":{ "query":"exactContent%3Aenergy OR innovation future", "num_hit":0 } }, { ... }] }

History is an array of object containing the last queries information from the current user. User must be authenticated to have an history, will return an error if not authenticated.

 

CE

GET

users/current/history?query={query}

Retrieve the current user history (past queries information).

Results will be filtered by query in the API.

 

{ "history":[ { "action":"SEARCH", "query_id": ..., "time_stamp":..., "user_id":..., "parameters":{ "query":"exactContent:energy OR innovation future", "num_hit":0 } }, { ... }] }

Example : users/current/history?query=exactContent%3Aenergy OR innovation future

History is an array of object containing the last queries information from the current user. User must be authenticated to have an history, will return an error if not authenticated.

The field stored in parameters.query of the returned entries should contain the user query.

 

CE

POST

/ai/summarize

Returns the summary of a Solr documents if it exists, or generate it otherwise.

See https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3619946497 for documentation.

 

 

CE

POST

/ai/rag

Process a RAG search. If ID is provided, only the associated document is used for information retrieval

See https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3619946497 for documentation.

 

 

CE

GET /rest/v2.0/search/*

This is the endpoint to perform a search request. This endpoint expects a search handler to be provided as well as all the elements of a solr query as URL parameters. A simple example for a search query would be:

GET {DATAFARI_BASE_URL}/rest/v2.0/search/select?q=*:*

Note that {DATAFARI_BASE_URL} is by default of the following shape: [DATAFARI_DOMAIN_NAME]/Datafari

This endpoints allows the use of only a few selected search handlers for security reasons. The handlers that can be called are different depending on the value of the special “action” URL parameter.

action=

handlers allowed

action=

handlers allowed

search

OR

unset

/select

/stats

/statsQuery

/noaggregator

/vector

/rrf

 

suggest

/suggest

/proposals

Configuration specific handlers for advanced autocomplete functionalities (entity recognition, …)

Warning! /stats and /statsQuery handlers are deprecated since Datafari 6.0, and should be removed in Datafari 6.1.

Keyword-based search (BM25)

The main handlers to keep in mind are /select and /suggest.

It is also possible to add more allowed handlers. To do it, add them into $DATAFARI_HOME/tomcat/conf/datafari.properties into : userAllowedHandlers property.

Example :

userAllowedHandlers=/newselect

The /select handler is used to perform search queries while the /suggest handler is used to get suggestions for autocomplete.

If you wonder how to format your queries (either for search or query), please refer to the Solr documentation as well as the rest of Datafari’s documentation for the fields available.

Proposals

The /proposals handler works just like /suggest. However, instead of providing autocomplete suggestions, it processes a search based on the query. This one is meant to be used for autoproposals. The service only returns those fields:

  • title

  • exactContent

  • url

Vector search

Vector search must be enabled and configured before indexing (see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3920297985 )

The /vector handler is used to perform vector search. Vector search features must be enabled before indexing (see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3920297985 ). Instead of returning whole documents from the main Solr collection, this handler returns documents snippets from the VectorMain collection. This handler accepts the following parameters:

  • q or queryrag (required): a keywords-based or natural language-based query

  • topK: (Optional) The number of results returned by the vector distance calculation. Must be greater than start + rows (default: 100).

  • rows, start: (Optional) Pagination parameter. Refer to Solr documentation.

  • fl: (Optional) The list of fields that must be returned in the results. Refer to Solr documentation.

  • model: (Optional) The ID of the embeddings model used for query embeddings. By default, the Active Embeddings Model is used.

  • vectorField: (Optional) The Dense Vector Field (in Solr) used for vector search. If you override this parameter, make sure that the vectors contained in this field have been generated by the provided model. By default, the Active Vector Field is used.

Usual Solr parameters (including facets, fq…) can be used here, except for those associated to eDisMax Query Parser.

Example:

GET {DATAFARI_BASE_URL}/rest/v2.0/search/vector?q=who%20are%20France%20Labs%20founders&rows=5&topK=10&fl=title,exactContent,url

In this example, a vector search wil be processed using the query “who are France Labs founders”, returning title, content and url of the 5 first entries from the top 10 results.

The resulting json will look as follows:

{"response": {"docs": [ { "folder_url":"https://tatetitotu.datafari.com/Datafari/rest/v2.0/url?url=https%3A%2F%2Fwww.francelabs.com%2Fen", "exactContent": ["Our company France Labs was established in 2011. Its founders come from SAP Research in Sophia-Antipolis, and they love open source and innovation. France Labs is the only startup that was accepted at the two incubators of Sophia Antipolis. We aim at creating the best enterprise search solution with Datafari, as well as being the recognised leader in France on Apache Lucene/Solr and Elasticsearch expertise.\n\nTEST DATAFARI \n or look at our services" ], "title":["About << France Labs: Open source enterprise search","about.html"], "click_url":"https://tatetitotu.datafari.com/Datafari/rest/v2.0/url?url=https%3A%2F%2Fwww.francelabs.com%2Fen%2Fabout.html&id=a4f51fe2-a9b5-4e59-90fd-913233e94b14", "url":"https://www.francelabs.com/en/about.html " }, { "folder_url":"https://tatetitotu.datafari.com/Datafari/rest/v2.0/url?url=https%3A%2F%2Fwww.francelabs.com%2Fen", "exactContent": ["# WebTimeMedias \n WebTimeMedias presents France Labs and its platinum partnership with Doculibre in Europe.\n\nApril 18th 2012\n\nThe article\n\n# L'Avenir Côte d'Azur \n L’Avenir Côte d’Azur did a short article to present France Labs.\n\nFeb. 24th 2012\n\nScreenshot of article\n\n# Nice Matin \n Nice Matin did a nice article about the kick off of France Labs.\n\nFeb. 7th 2012\n\nScreenshot of article\n\nKeep in touch\n\nNewsletter \n Subscribe to our newsletter to stay informed.\n\nCongratulations! You have successfully subscribed.\n\nSend\n\nLast Tweets\n\nPlease wait...\n\nContact\n\nAddress: Résidence du Grand Large - La Goelette, 2 rue de la Foux, 06800 Cagnes-sur-Mer \n Phone: +33 (0)9 72 43 72 85\n\nFollow us\n\n© Copyright 2024 France Labs.\n\nHome \n Sitemap \n Contact" ], "title":["Press review << France Labs: Open source enterprise search","news_articles.html"], "click_url":"https://tatetitotu.datafari.com/Datafari/rest/v2.0/url?url=https%3A%2F%2Fwww.francelabs.com%2Fen%2Fnews_articles.html&id=a4f51fe2-a9b5-4e59-90fd-913233e94b14", "url":"https://www.francelabs.com/en/news_articles.html " } ], "numFound":2, "start":0, "numFoundExact":true }, "responseHeader": { "zkConnected":true, "QTime":41, "params": { "topK":"2", "org.springframework.web.util.UrlPathHelper.PATH":"/rest/v2.0/search/vector", "qt":"/vector", "stateVer":"VectorMain:7", "org.apache.tomcat.util.net.secure_protocol_version":"TLSv1.3", "AuthenticatedUserName":"topolino", "fl":"title,exactContent,url", "queryrag":"who are France Labs founders", "javax.servlet.request.cipher_suite":"TLS_AES_128_GCM_SHA256", "version":"2.2", "javax.servlet.request.ssl_session_id":"cd1586af3dc44a0f687b394a598637f95ae6d508c9bbcde751568937cd2e4204", "q":"who are France Labs founders", "org.springframework.web.servlet.HandlerMapping.bestMatchingPattern":"/rest/v2.0/search/*", "org.springframework.web.servlet.HandlerMapping.pathWithinHandlerMapping":"/rest/v2.0/search/vector", "model":"default_model", "id":"a4f51fe2-a9b5-4e59-90fd-913233e94b14", "vectorField":"vector_384", "wt":"json" }, "status":0 } }

Hybrid search (RRF)

Vector search must be enabled and configured before indexing (see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/3920297985 )

The /rrf endpoint is used to perform two searches through Datafari API: one keyword-based (BM25, using /select handler), and one vector search (using /vector handler). Both searches are targeting the VectorMain collection. Then, the results of both searches are merged using a “Reciprocal Rank Fusion” algorithm.

Instead of returning whole documents from the main Solr collection, this handler returns documents snippets from the VectorMain collection. This handler accepts the following parameters:

  • q or queryrag (required): a keywords-based or natural language-based query. If both parameters are provided, q is used for BM25 search, and queryrag is used for vector search. If only one of these is provided, it is used by both searches.

  • topK: (Optional) The number of results returned by both initial searches. Must be greater than start + rows (default: 100).

  • rows, start: (Optional) Pagination parameter. Refer to Solr documentation.

  • fl: (Optional) The list of fields that must be returned in the results. Refer to Solr documentation.

  • model: (Optional) The ID of the embeddings model used for query embeddings. By default, the Active Embeddings Model is used.

  • vectorField: (Optional) The Dense Vector Field (in Solr) used for vector search. If you override this parameter, make sure that the vectors contained in this field have been generated by the provided model. By default, the Active Vector Field is used.

Usual Solr parameters (including facets, fq…) can be used here, except for those associated to eDisMax Query Parser.

Example:

GET {DATAFARI_BASE_URL}/rest/v2.0/search/rrf?queryrag=what%20is%20enron%20address&q=enron%20address&fl=title%2Cid%2Cparent_doc&rows=4&start=0&topK=100

In this example:

  • A vector search wil be processed using the query “what is enron addressenron%20address

  • A BM25 search wil be processed using the query “enron%20address

  • Datafari merges the results from both searches, and returns the 4 most relevant ones.

The resulting json will look as follows:

{ "response": { "docs": [ { "id": "file://///localhost/enron/ElecIndexContractEnron2.doc_11", "parent_doc": "file://///localhost/enron/ElecIndexContractEnron2.doc", "title": [ "ElecIndexContractEnron2.doc", "LICENSE AGREEMENT" ] }, { "id": "file://///localhost/enron/Emissions%20Auction%20SiteText.doc_2", "parent_doc": "file://///localhost/enron/Emissions%20Auction%20SiteText.doc", "title": [ "Emissions Auction SiteText.doc", "[EnronOnlineEAuction Home Page]" ] }, { "id": "file://///localhost/enron/EGA%20Monthly%20Report%203-15.doc_0", "parent_doc": "file://///localhost/enron/EGA%20Monthly%20Report%203-15.doc", "title": [ "EGA Monthly Report 3-15.doc", "Enron Nigeria Power Holding Limited" ] }, { "id": "file://///localhost/enron/e-mail%20letter%20rpt%20enron.doc_4", "parent_doc": "file://///localhost/enron/e-mail%20letter%20rpt%20enron.doc", "title": [ "e-mail letter rpt enron.doc", "Hessler Associates, Inc" ] } ], "numFound": 200, "numFoundExact": true, "start": 0 } }

Note that we also provide {DATAFARI_BASE_URL}/rest/v2.0/search/noaggregator that does exactly the same thing as the search API, except that there is no aggregation on it to avoid infinite loops when using the aggregator mode.

Response format

The response format of this endpoint does not follow the standard response format of this API.

The payload of the response is the response from solr (please refer to the Solr documentation to see what those response look like or experiment with the endpoint).
If solr encountered an error, the endpoint will most of the time respond with 200 status, the error being in the solr response itself.

If an error occurred while Datafari is querying Solr, the response may be a 500 or 5xx http response with possibly no JSON (and maybe an html page as a response body).

For each document in the response, they have two fields for the url:

  1. doc.url : the original url of the document

  2. doc.click_url : The url to use for the href link to send the user to the document (and will track user clicks to get statistics in the search engine)

Starting from Datafari 6.1, there is also a specific url to open folder links :

  1. doc.folder_url

GET /rest/v2.0/results/export?*

This is the endpoint to perform an export of the result of a provided query. Here are the parameters to provide:

  • query: the main query to perform. ex:
    query=*:*

  • facetQuery: An array containing all the facet queries. ex:
    facetQuery[]={!key=From%20100KB%20To%2010MB}original_file_size:[102400 TO 10485760]&facetQuery[]={!key=More%20Than%2010MB}original_file_size:[10485760 TO *]

  • facetField: An array containing all the facet fields. ex:
    facetField[]=extension&facetField[]=language&facetField[]=source

  • fq: An array containing all the active filter queries. ex:
    fq[]={!tag=query}original_file_size:[102400 TO 10485760]&fq[]={!tag=author}(author:"agence de l'eau artois picardie")

  • sort: The sort method to apply to results, can be either "score desc", or "score asc". ex:
    sort=score desc

  • fl: The list of fields that must be returned in the results. The fields must be separated by a coma. ex:
    fl=title,last_modified,url

  • nbResults: The number of results that the export file must contain. ex:
    nbResults=500
    Be careful with this parameter, because the higher is the number, the more the query will take time and the export file will be big

  • type: The type of the export file to generate. Currently (December 2022) the only supported type is “excel”:
    type=excel

Here is an example of a GET request to the export API:

/rest/v2.0/results/export?query=*:*&facetQuery[]=original_file_size:[102400 TO 10485760]&facetQuery[]=original_file_size:[10485760 TO *]&facetField[]=extension&facetField[]=language&fq[]=original_file_size:[102400 TO 10485760]&sort=score desc&fl=title,last_modified,url&nbResults=10&type=excel

The response of this endpoint is a bytes input stream of the generated export file. In a standard web browser like chrome, it will trigger a download process

V1.0

Bellow is a set of API endpoints that are implemented or in the roadmap. Endpoints that are not implemented are clearly identified by the “NOT IMPLEMENTED” string in the response column. Those endpoints return a 501 HTTP Response code when queried (which is the code for the NOT_IMPLEMENTED http error). For implemented endpoints, query body and format for the content object of the response in case of success are provided.

The auth endpoint is particular and can't be called using AJAX, the user must be redirected to this endpoint and Datafari will redirect the user back to the callback URL once the authentication is performed successfully.

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

METHOD

URL

DESCRIPTION

QUERY BODY

RESPONSE

PROTECTED

GET

users/current

Get the information about the currently connected user

 

{ "name":{String}, "roles":[{Strings}], "lang": {String} "uiConfig": {Obj} }

see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2625634305 for more information about the uiConfig object.

Authenticated User

PUT

users/current

Update user information, only the language and uiConfig can be modified.

{ "lang"?:{String} "uiConfig"?: {Obj} }

see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2625634305 for more information about the uiConfig object.

Both arguments are optional, if none is provided, nothing is done

{ "name":{String}, "roles":[{Strings}], "lang": {String} "uiConfig": {Obj} }

 

Authenticated User

GET

users/current/uiconfig

Get only the uiConfig for the current user

 

{ "uiConfig": {Obj} }

see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2625634305 for more information about the uiConfig object.

Authenticated User

PUT

users/current/uiconfig

Update only the uiConfig for the current user

{ "uiConfig": {Obj} }

see https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/2625634305 for more information about the uiConfig object.

 

{ "uiConfig": {Obj} }

echos the query

Authenticated User

GET

status/features/favorites

Provide the status of the favorite feature

 

{ "activated":{boolean as String} }

 

GET

users/current/alerts

Provide the list of alerts for the current user

 

{ "alerts":[ { "_id":{String}, "core":{String}, "frequency":{String}, "mail":{String}, "subject":{String}, "keyword":{String}, "filters":{String - optional}, "user": {String - username} }, ... ] }

Authenticated User

POST

users/current/alerts

Create a new alert for the current user

{ "core":{String - the solr core for the query}, "frequency":{String - }, "mail":{String}, "subject":{String}, "keyword":{String - optional, query keywords, defaults to *:*}, "filters":{String - optional, for facets etc., defaults to null}, }
{ "_id": {String}, "core":{String}, "frequency":{String - }, "mail":{String}, "subject":{String}, "keyword":{String}, "filters":{String}, }

Echos the parameters with the created id

Authenticated User

PUT

users/current/alerts/{id}

Updates alert {id} for the current user

{ "core":{String - the solr core for the query}, "frequency":{String - }, "mail":{String}, "subject":{String}, "keyword":{String - optional, query keywords, defaults to *:*}, "filters":{String - optional, for facets etc., defaults to null}, }
{ "_id": {String}, "core":{String - the solr core for the query}, "frequency":{String - }, "mail":{String}, "subject":{String}, "keyword":{String - optional, query keywords, defaults to *:*}, "filters":{String - optional, for facets etc., defaults to null}, }

Response should be an echo of the query with an updated ID

Authenticated User

DELETE

users/current/alerts/{id}

Delete alert {id} from the current user alerts

 

{ "_id": {String}, "core":{String - the solr core for the query}, "frequency":{String - }, "mail":{String}, "subject":{String}, "keyword":{String - optional, query keywords, defaults to *:*}, "filters":{String - optional, for facets etc., defaults to null}, }

The deleted alert object is returned.

Authenticated User

GET

users/current/savedsearches

Retrieve the set of saved searches for the current user

 

{ "savedsearches":[ { "name":{String}, "search":{String}, }, ... ] }

 

POST

users/current/savedsearches

Creates a new saved search for the current user

{ "name":{String - the name}, "search":{String - the query}, }
{ "name":{String - the name}, "search":{String - the query}, }

Echos the parameters

 

PUT

users/current/savedsearches/{savedSearchname}

Updates the saved search with name savedSearchname for the current user

{ "name":{String - the name}, "search":{String - the query}, }

The name property of the object must be the same as the name provided in the URL

{ "name":{String - the name}, "search":{String - the query}, }

Echos the saved data

 

DELETE

users/current/savedseaches/{savedSearchname}

Delete the saved search with name savedSearchname for the current user.

 

{ "name":{String - the name}, "search":{String - the query}, }

Echos the deleted saved search data

 

GET

users/current/favorites

Retrieve the list of favorites for the current user

 

{ "favorites":[ { "id": {String}, "title": {String} }, ... ] }

 

POST

users/current/favorites

Adds a new favorite for the current user

{ "id":{String}, "title":{String} }
{ "id":{String}, "title":{String} }

Echos the information of the saved favorite

 

DELETE

users/current/favorites

Delete the favorite with the id provided in the body for the current user.

{ "id":{String} }
{ "id":{String}, "title":{String} }

Echos the information of the deleted favorite

 

GET