SearchAggregator Configuration
Valid from Datafari v5.4 upwards
Note: We have a dedicated page for more technical information on the aggregator.
If you use Keycloak with the Search aggregator, go to this page : https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/1067712513
All of the Solr fields MUST BE DECLARED in all of the clusters. So in case you need to add a Solr field to your main cluster, make sure that this same Solr field is declared on all the other Solr clusters.
1. Configuration
1.1 Admin UI for general parameters
To configure the SearchAggregator, a dedicated UI is available in the admin section of Datafari:
1.1.1 Activation and password:
Enable/Disable results aggregation: Set to ‘On’ to dispatch search requests to external Datafari instances and aggregate results. Set to ‘Off’ to disable the requests dispatch (SearchProxy behavior). NOTE: This button must only be activated on the main search server, it must be set to OFF on all the external servers.
Renew ‘search-aggregator’ client’s password: As specified in the first section of this documentation, the SearchAggregator makes use of a client on the Search API to get access tokens. The client name is ‘search-aggregator’ and is hardcoded in Datafari (cannot be changed), but its secret password has to be generated at least one time. This button generates a new secret password for the search-aggregator client. Obvioulsy, this Renew button is to be used on the External Datafari servers. You will then insert this secret into the second part of this admin UI page (External Datafaris) presented further below.
1.1.2 Timeouts:
Timeout per request: The timeout used on each request sent to an external Datafari instance. It is expressed in milliseconds. By default it is set to 30000 ms .
Global timeout: The timeout used by the ThreadExecutor described in the first section of this documentation, after which, if some external Datafari instances still not have responded, the SearchAggregator will construct a response upon the available ones. By default it is set to 60 secs.
Once you set the wanted timeout, click on the ‘save’ button of this section
1.1.3 External Datafaris:
In this section of the configuration, you can add or modify external Datafari instances which will receive dispatched requests from the SearchAggregator. To add a new external Datafari configuration, select “Add a new external Datafari configuration” in the dropdown list, to modify an existing one, select it in the dropdown list.
Here is the description of the parameters:
Datafari name: The name you want to use to identify the external Datafari instance you are configuring (must be unique)
Search API URL: The URL of the Search API of the external Datafari instance. You MUST aim for the search API that disable the aggregator otherwise you will face two main problems in case the external Datafari has enabled the aggregator: first, the results of the external Datafari will contain results from other external Datafaris that should not be included, second, if the external Datafari is itself, then the request will end in an infinite loop !
In Datafari v5.4 upwards, the endpoint of the search API that does not trigger the aggregator in any case is/rest/v2.0/search/noaggregator
, so you MUST aim for the following URL: https://EXTERNAL_DATAFARI/Datafari/rest/v2.0/search/noaggregatorToken Request URL: The URL to request to get valid Oauth2 access tokens compatibles with the Search API.
If the external Datafari instance does not use an Identity Provider like Keycloak then the default URL is http://EXTERNAL_DATAFARI_HOST:PORT/Datafari/oauth/token
If the external Datafari instance uses an Identity Provider like Keycloak then you need to specify the URL of the Identity Provider that delivers access tokensPassword of ‘search-aggregator’ user: The secret password of the ‘search-aggregator’ client on the external Datafari instance (generated thanks to the ‘renew’ button but on the external Datafari instance)
Enabled: You can enable or disable the external Datafari to tell the SearchAggregator to either dispatch it or not the requests. It can be useful if for any reason the external Datafari instance is down or something.
Once you are done with the parameters, click on the ‘save’ button of this section.
1.1.4 Default Datafari
In this section of the configuration, you can select external Datafaris that will be used as the default, i.e. that will be used to perform the default search when landing on the search page.
If none is selected, all available external Datafari will be used to perform the default search.
The dropdown only shows the external Datafari defined and activated.
To add a Datafari to the list of default, select it in the dropdown and click on the save button.
To remove a Datafari from the list of default, click on the trash can icon next to it in the list of current default.
Note that if you remove or disable an external Datafari that is selected as default, it is removed from the list.
The “Always use default” toggle switch allows you to choose between two behaviors for the default Datafaris when used together with user specific default Datafaris (explained below):
When set to off, user default Datafaris will replace the global default set in this section of the configuration (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, only datafari1 will be querried for the default query).
When set to on, user default Datafaris will be added to the global default in this section to constitute the set of default to be called (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, both datafari1 and datafariglobal will be querried for the default query).
1.2 Configuring per user default Datafari and allowed remote Datafaris (requires terminal access to the Datafari web app server)
It is possible to configure per user default and to restrict each user to access only to a subset of the defined external Datafaris. This section explains how to do this.
1.2.1 User specific default Datafari
Configuration
To set a default Datafari, you first need to build a csv file with the following format:
username;externalDatafari1,externalDatafari2
username2;externalDatafari1
username3;externalDatafari2
Where:
The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)
The second column contains the list of names of the external Datafari as defined in the external Datafaris definition separated by comas
If there is an error in the name and / or the default Datafari name, the feature will not behave as expected.
Then, you must drop the file on the Datafari web server and make it readable by the user running the Datafari web app (the user datafari by default).
Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:
#Sun Jun 28 15:00:19 UTC 2020
USERS_DEFAULT_SOURCE_FILE=
GLOBAL_TIMEOUT=60
EXTERNAL_DATAFARIS=[{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/searc","enabled"\:true},{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main bis","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/search","enabled"\:true}]
USERS_ALLOWED_SOURCES_FILE=
ACTIVATED=true
DEFAULT_DATAFARI=
TIMEOUT_PER_REQUEST=30000
Be careful to not modify anything except for the line “USERS_DEFAULT_SOURCE_FILE=” where you need to set the path to the file containing the default Datafari definitions (formatted as described above).
The line should look something like the following after edition:
USERS_DEFAULT_SOURCE_FILE=/opt/datafari/tomcat/conf/user-default.csv
Once you saved this file, your user specific default Datafari is all setup and ready to work. No need to restart Datafari or anything, it should work right away.
Troubleshooting
If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log
If the format of the file is incorrect, there is a great chance that all your user will default back to the global default external Datafari (or searching on all external Datafari is no global default is set)
If the name of a user is not set correctly, it will default to the global default (or to all external if no global default is set)
If the name of the default Datafari for a user is not set properly, he will probably see an error message when first landing on the Datafari search page (but should be able to perform searches afterward by selecting the external Datafaris he wants in the aggregator facet).
BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)
1.2.2 Restrict user to a subset of external Datafaris
You can restrict for each user the set of external Datafaris it can search in.
To do so, you need to create a CSV file that has the following format:
Where:
The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)
The second column contains the names of the external Datafaris (as defined in the external Datafaris definition) separated by comas
Any external Datafari defined here is accessible to the corresponding user.
If a user is not present in this file, it can access all external Datafaris.
Then drop the file on the Datafari web app server and make it readable by the user running the Datafari webapp (user datafari by default).
Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:
Be careful to not modify anything except for the line “USERS_ALLOWED_SOURCES_FILE=” where you need to set the path to the file containing the list of Datafari servers available to each user (formatted as described above).
The line should look something like the following after edition:
Once you saved this file, your user restrictions to external Datafaris is all setup and ready to work. No need to restart Datafari or anything, it should work right away.
Troubleshooting
If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log
If the format of the file is incorrect, there is a great chance that no restriction will be applied to all users, although other behavior could be all users having access to only one external source or to nothing depending on the formatting error.
If the name of a user is not set correctly, no restriction will be applied to this user
If the name of an external Datafari for a user is not set properly, he won’t be able to access to the wrongly typed external source.
BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)