Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Valid from Datafari Enterprise v4.6 and Datafari Community v5.04 upwards

Note: We have a dedicated page for more technical information on the aggregator.

Note

If you use Keycloak with the Search aggregator, go to this page : /wiki/spaces/DATAFARI/pages/1067712513

Note

All of the Solr fields MUST BE DECLARED in all of the clusters. So in case you need to add a Solr field to your main cluster, make sure that this same Solr field is declared on all the other Solr clusters.

1. Configuration

1.1 Admin UI for general parameters

...

  • Enable/Disable results aggregation: Set to ‘On’ to dispatch search requests to external Datafari instances and aggregate results. Set to ‘Off’ to disable the requests dispatch (SearchProxy behavior). NOTE: This button must only be activated on the main search server, it must be set to OFF on all the external servers.

  • Renew ‘search-aggregator’ client’s password: As specified in the first section of this documentation, the SearchAggregator makes use of a client on the Search API to get access tokens. The client name is ‘search-aggregator’ and is hardcoded in Datafari (cannot be changed), but its secret password has to be generated at least one time. This button generates a new secret password for the search-aggregator client. Obvioulsy, this Renew button is to be used on the External Datafari servers. You will then insert this secret into the second part of this admin UI page (External Datafaris) presented further below.

The generated secret
Note
Info

Use it only if you do not use an Identity Provider like Keycloak

Note

The generated secret password is not saved in clear anywhere for security reasons ! Thus, once you click on the ‘renew’ button it will appear in clear just once so you will need to carefully keep it or you will need to renew it again !

...

  • Datafari name: The name you want to use to identify the external Datafari instance you are configuring (must be unique)

  • Search API URL: The URL of the Search API of the external Datafari instance

  • Token Request URL: The URL to request to get valid Oauth2 access tokens compatibles with the Search API.
    If the external Datafari instance does not use an Identity Provider like Keycloak then the default URL is http://EXTERNAL_DATAFARI_HOST:PORT/Datafari/oauth/token
    If the external Datafari instance uses an Identity Provider like Keycloak then you need to specify the URL of the Identity Provider that delivers access tokens

  • Password of ‘search-aggregator’ user: The secret password of the ‘search-aggregator’ client on the external Datafari instance (generated thanks to the ‘renew’ button but on the external Datafari instance)

  • Enabled: You can enable or disable the external Datafari to tell the SearchAggregator to either dispatch it or not the requests. It can be useful if for any reason the external Datafari instance is down or something.

Once you are done with the parameters, click on the ‘save’ button of this section.

Info

When you select an existing external Datafari configuration, in addition to the ‘save’ button there is a ‘delete’ button which can be used to remove an external Datafari configuration

1.1.4 Default Datafari

In this section of the configuration, you can select an external Datafari that will be used as the default, i.e. that will be used to perform the default search when landing on the search page.
If none is selected, all available external Datafari will be used to perform the default search.

...

The dropdown only shows the external Datafari defined and activated.
Note that if you remove or disable the external Datafari that is selected as default, the default is automatically set back to none.

...

  • . You MUST aim for the search API that disable the aggregator otherwise you will face two main problems in case the external Datafari has enabled the aggregator: first, the results of the external Datafari will contain results from other external Datafaris that should not be included, second, if the external Datafari is itself, then the request will end in an infinite loop !
    In Datafari v5.4 upwards, the endpoint of the search API that does not trigger the aggregator in any case is /rest/v2.0/search/noaggregator, so you MUST aim for the following URL: https://EXTERNAL_DATAFARI/Datafari/rest/v2.0/search/noaggregator

  • Token Request URL: The URL to request to get valid Oauth2 access tokens compatibles with the Search API.
    If the external Datafari instance does not use an Identity Provider like Keycloak then the default URL is http://EXTERNAL_DATAFARI_HOST:PORT/Datafari/oauth/token
    If the external Datafari instance uses an Identity Provider like Keycloak then you need to specify the URL of the Identity Provider that delivers access tokens

  • Password of ‘search-aggregator’ user: The secret password of the ‘search-aggregator’ client on the external Datafari instance (generated thanks to the ‘renew’ button but on the external Datafari instance)

  • Enabled: You can enable or disable the external Datafari to tell the SearchAggregator to either dispatch it or not the requests. It can be useful if for any reason the external Datafari instance is down or something.

Once you are done with the parameters, click on the ‘save’ button of this section.

Info

When you select an existing external Datafari configuration, in addition to the ‘save’ button there is a ‘delete’ button which can be used to remove an external Datafari configuration

1.1.4 Default Datafari

In this section of the configuration, you can select external Datafaris that will be used as the default, i.e. that will be used to perform the default search when landing on the search page.
If none is selected, all available external Datafari will be used to perform the default search.

...

The dropdown only shows the external Datafari defined and activated.

To add a Datafari to the list of default, select it in the dropdown and click on the save button.
To remove a Datafari from the list of default, click on the trash can icon next to it in the list of current default.

Note that if you remove or disable an external Datafari that is selected as default, it is removed from the list.

The “Always use default” toggle switch allows you to choose between two behaviors for the default Datafaris when used together with user specific default Datafaris (explained below):

  1. When set to off, user default Datafaris will replace the global default set in this section of the configuration (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, only datafari1 will be querried for the default query).

  2. When set to on, user default Datafaris will be added to the global default in this section to constitute the set of default to be called (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, both datafari1 and datafariglobal will be querried for the default query).

Info

If a user has no personalized default set, then it will fallback to using the global default Datafari, and if there are none, it will default to searching in all available external Datafaris.

1.2 Configuring per user default Datafari and allowed remote Datafaris (requires terminal access to the Datafari web app server)

It is possible to configure per user default and to restrict each user to access only to a subset of the defined external Datafaris. This section explains how to do this.

1.2.1 User specific default Datafari

Configuration

To set a default Datafari, you first need to build a csv file with the following format:

Code Block
username;externalDatafari1,externalDatafari2
username2;externalDatafari1
username3;externalDatafari2

Where:

  • The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)

  • The second column contains the list of names of the external Datafari as defined in the external Datafaris definition separated by comas

Note

DO NOT PUT SPACES BEFORE OR AFTER THE “;” OR THE “,” SEPARATORS

If there is an error in the name and / or the default Datafari name, the feature will not behave as expected.

Then, you must drop the file on the Datafari web server and make it readable by the user running the Datafari web app (the user datafari by default).

Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:

Code Block
#Sun Jun 28 15:00:19 UTC 2020
USERS_DEFAULT_SOURCE_FILE=
GLOBAL_TIMEOUT=60
EXTERNAL_DATAFARIS=[{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/searc","enabled"\:true},{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main bis","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/search","enabled"\:true}]
USERS_ALLOWED_SOURCES_FILE=
ACTIVATED=true
DEFAULT_DATAFARI=
TIMEOUT_PER_REQUEST=30000

Be careful to not modify anything except for the line “USERS_DEFAULT_SOURCE_FILE=” where you need to set the path to the file containing the default Datafari definitions (formatted as described above).
The line should look something like the following after edition:

Code Block
USERS_DEFAULT_SOURCE_FILE=/opt/datafari/tomcat/conf/user-default.csv
Note

Note that the value has NO QUOTES ! If you put quotes you will get a FileNotFound exception !

Once you saved this file, your user specific default Datafari is all setup and ready to work. No need to restart Datafari or anything, it should work right away.

Info

You can update the csv file at any time without modifying anything else, it will be taken into account immediately.

Troubleshooting

  • If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log

  • If the format of the file is incorrect, there is a great chance that all your user will default back to the global default external Datafari (or searching on all external Datafari is no global default is set)

  • If the name of a user is not set correctly, it will default to the global default (or to all external if no global default is set)

  • If the name of the default Datafari for a user is not set properly, he will probably see an error message when first landing on the Datafari search page (but should be able to perform searches afterward by selecting the external Datafaris he wants in the aggregator facet).

  • BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)

1.2.2 Restrict user to a subset of external Datafaris

You can restrict for each user the set of external Datafaris it can search in.
To do so, you need to create a CSV file that has the following format:

Code Block
username;externalDatafari1,externalDatafari2
username2;externalDatafari1
username3;externalDatafari2

Where:

  • The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)

  • The second column contains the names of the external Datafaris (as defined in the external Datafaris definition) separated by comas

Note

DO NOT PUT SPACES BEFORE OR AFTER THE “;” OR THE “,” SEPARATORS

Any external Datafari defined here is accessible to the corresponding user.
If a user is not present in this file, it can access all external Datafaris.

Info

Please note that this restriction does not override ACLs checks, and users will still not be able to see any documents that they don’t have the right to access in the results.
This is meant to reduce the number of calls to external sources when one knows in advance that no data is available to a user in a given external Datafari.

Then drop the file on the Datafari web app server and make it readable by the user running the Datafari webapp (user datafari by default).

Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:

Code Block
#Sun Jun 28 15:00:19 UTC 2020
USERS_DEFAULT_SOURCE_FILE=
GLOBAL_TIMEOUT=60
EXTERNAL_DATAFARIS=[{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/searc","enabled"\:true},{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main bis","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/search","enabled"\:true}]
USERS_ALLOWED_SOURCES_FILE=
ACTIVATED=true
DEFAULT_DATAFARI=
TIMEOUT_PER_REQUEST=30000

Be careful to not modify anything except for the line “USERS_ALLOWED_SOURCES_FILE=” where you need to set the path to the file containing the list of Datafari servers available to each user (formatted as described above).
The line should look something like the following after edition:

Code Block
USERS_ALLOWED_SOURCES_FILE=/opt/datafari/tomcat/conf/user-allowed-datafaris.csv
Note

Note that the value has NO QUOTES ! If you put quotes you will get a FileNotFound exception !

Once you saved this file, your user restrictions to external Datafaris is all setup and ready to work. No need to restart Datafari or anything, it should work right away.

Info

You can update the csv file at any time without modifying anything else, it will be taken into account immediately.

Troubleshooting

  • If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log

  • If the format of the file is incorrect, there is a great chance that no restriction will be applied to all users, although other behavior could be all users having access to only one external source or to nothing depending on the formatting error.

  • If the name of a user is not set correctly, no restriction will be applied to this user

  • If the name of an external Datafari for a user is not set properly, he won’t be able to access to the wrongly typed external source.

  • BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)

...

Expand
titleValid from Datafari Enterprise v4.6 and Datafari Community v5.0 up to 5.3
Info

Valid from Datafari Enterprise v4.6 and Datafari Community v5.0 up to 5.3

Note: We have a dedicated page for more technical information on the aggregator.

Note

If you use Keycloak with the Search aggregator, go to this page : /wiki/spaces/DATAFARI/pages/1067712513

Note

All of the Solr fields MUST BE DECLARED in all of the clusters. So in case you need to add a Solr field to your main cluster, make sure that this same Solr field is declared on all the other Solr clusters.

1. Configuration

1.1 Admin UI for general parameters

To configure the SearchAggregator, a dedicated UI is available in the admin section of Datafari:

Image Added

1.1.1 Activation and password:

  • Enable/Disable results aggregation: Set to ‘On’ to dispatch search requests to external Datafari instances and aggregate results. Set to ‘Off’ to disable the requests dispatch (SearchProxy behavior). NOTE: This button must only be activated on the main search server, it must be set to OFF on all the external servers.

  • Renew ‘search-aggregator’ client’s password: As specified in the first section of this documentation, the SearchAggregator makes use of a client on the Search API to get access tokens. The client name is ‘search-aggregator’ and is hardcoded in Datafari (cannot be changed), but its secret password has to be generated at least one time. This button generates a new secret password for the search-aggregator client. Obvioulsy, this Renew button is to be used on the External Datafari servers. You will then insert this secret into the second part of this admin UI page (External Datafaris) presented further below.

Note

The generated secret password is not saved in clear anywhere for security reasons ! Thus, once you click on the ‘renew’ button it will appear in clear just once so you will need to carefully keep it or you will need to renew it again !

1.1.2 Timeouts:

  • Timeout per request: The timeout used on each request sent to an external Datafari instance. It is expressed in milliseconds. By default it is set to 30000 ms .

  • Global timeout: The timeout used by the ThreadExecutor described in the first section of this documentation, after which, if some external Datafari instances still not have responded, the SearchAggregator will construct a response upon the available ones. By default it is set to 60 secs.

Once you set the wanted timeout, click on the ‘save’ button of this section

1.1.3 External Datafaris:

In this section of the configuration, you can add or modify external Datafari instances which will receive dispatched requests from the SearchAggregator. To add a new external Datafari configuration, select “Add a new external Datafari configuration” in the dropdown list, to modify an existing one, select it in the dropdown list.

Image Added

Here is the description of the parameters:

  • Datafari name: The name you want to use to identify the external Datafari instance you are configuring (must be unique)

  • Search API URL: The URL of the Search API of the external Datafari instance

  • Token Request URL: The URL to request to get valid Oauth2 access tokens compatibles with the Search API.
    If the external Datafari instance does not use an Identity Provider like Keycloak then the default URL is http://EXTERNAL_DATAFARI_HOST:PORT/Datafari/oauth/token
    If the external Datafari instance uses an Identity Provider like Keycloak then you need to specify the URL of the Identity Provider that delivers access tokens

  • Password of ‘search-aggregator’ user: The secret password of the ‘search-aggregator’ client on the external Datafari instance (generated thanks to the ‘renew’ button but on the external Datafari instance)

  • Enabled: You can enable or disable the external Datafari to tell the SearchAggregator to either dispatch it or not the requests. It can be useful if for any reason the external Datafari instance is down or something.

Once you are done with the parameters, click on the ‘save’ button of this section.

Info

When you select an existing external Datafari configuration, in addition to the ‘save’ button there is a ‘delete’ button which can be used to remove an external Datafari configuration

1.1.4 Default Datafari

In this section of the configuration, you can select external Datafaris that will be used as the default, i.e. that will be used to perform the default search when landing on the search page.
If none is selected, all available external Datafari will be used to perform the default search.

Image Added

The dropdown only shows the external Datafari defined and activated.

To add a Datafari to the list of default, select it in the dropdown and click on the save button.
To remove a Datafari from the list of default, click on the trash can icon next to it in the list of current default.

Note that if you remove or disable an external Datafari that is selected as default, it is removed from the list.

The “Always use default” toggle switch allows you to choose between two behaviors for the default Datafaris when used together with user specific default Datafaris (explained below):

  1. When set to off, user default Datafaris will replace the global default set in this section of the configuration (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, only datafari1 will be querried for the default query).

  2. When set to on, user default Datafaris will be added to the global default in this section to constitute the set of default to be called (i.e. if antoine has datafari1 in his defaults and datafariglobal is defined in the config above, both datafari1 and datafariglobal will be querried for the default query).

Info

If a user has no personalized default set, then it will fallback to

this

using the global default Datafari, and if

this is set to

there are none, it will default to searching in all available external Datafaris.

1.2 Configuring per user default Datafari and allowed remote Datafaris (requires terminal access to the Datafari web app server)

It is possible to configure per user default and to restrict each user to access only to a subset of the defined external Datafaris. This section explains how to do this.

1.2.1 User specific default Datafari

Configuration

To set a default Datafari, you first need to build a csv file with the following format:

Code Block
username;externalDatafari1,externalDatafari2
username2;externalDatafari1
username3;externalDatafari2

Where:

  • The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)

  • The second column contains the

...

  • list of names of the external Datafari as defined in the external Datafaris definition separated by comas

Note

DO NOT PUT SPACES BEFORE OR AFTER THE “;”

SEPARATOR

OR THE “,” SEPARATORS

If there is an error in the name and / or the default Datafari name, the feature will not behave as expected.

Then, you must drop the file on the Datafari web server and make it readable by the user running the Datafari web app (the user datafari by default).

Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:

Code Block
#Sun Jun 28 15:00:19 UTC 2020
USERS_DEFAULT_SOURCE_FILE=
GLOBAL_TIMEOUT=60
EXTERNAL_DATAFARIS=[{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/searc","enabled"\:true},{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main bis","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/search","enabled"\:true}]
USERS_ALLOWED_SOURCES_FILE=
ACTIVATED=true
DEFAULT_DATAFARI=
TIMEOUT_PER_REQUEST=30000

Be careful to not modify anything except for the line “USERS_DEFAULT_SOURCE_FILE=” where you need to set the path to the file containing the default Datafari definitions (formatted as described above).
The line should look something like the following after edition:

Code Block
USERS_DEFAULT_SOURCE_FILE=/opt/datafari/tomcat/conf/user-default.csv
Note

Note that the value has NO QUOTES ! If you put quotes you will get a FileNotFound exception !

Once you saved this file, your user specific default Datafari is all setup and ready to work. No need to restart Datafari or anything, it should work right away.

Info

You can update the csv file at any time without modifying anything else, it will be taken into account immediately.

Troubleshooting

  • If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log

  • If the format of the file is incorrect, there is a great chance that all your user will default back to the global default external Datafari (or searching on all external Datafari is no global default is set)

  • If the name of a user is not set correctly, it will default to the global default (or to all external if no global default is set)

  • If the name of the default Datafari for a user is not set properly, he will probably see an error message when first landing on the Datafari search page (but should be able to perform searches afterward by selecting the external Datafaris he wants in the aggregator facet).

  • BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)

1.2.2 Restrict user to a subset of external Datafaris

You can restrict for each user the set of external Datafaris it can search in.
To do so, you need to create a CSV file that has the following format:

Code Block
username;externalDatafari1,externalDatafari2
username2;externalDatafari1
username3;externalDatafari2

Where:

  • The first column contains the username as extracted from the authentication system (AD, Keycloack, internal to Datafari, …)

  • The second column contains the names of the external Datafaris (as defined in the external Datafaris definition) separated by comas

Note

DO NOT PUT SPACES BEFORE OR AFTER THE “;” OR THE “,” SEPARATORS

Any external Datafari defined here is accessible to the corresponding user.
If a user is not present in this file, it can access all external Datafaris.

Info

Please note that this restriction does not override ACLs checks, and users will still not be able to see any documents that they don’t have the right to access in the results.
This is meant to reduce the number of calls to external sources when one knows in advance that no data is available to a user in a given external Datafari.

Then drop the file on the Datafari web app server and make it readable by the user running the Datafari webapp (user datafari by default).

Once this is done, edit the file /opt/datafari/tomcat/conf/search-aggregator.properties (assuming a default installation), it should look something like the following:

Code Block
#Sun Jun 28 15:00:19 UTC 2020
USERS_DEFAULT_SOURCE_FILE=
GLOBAL_TIMEOUT=60
EXTERNAL_DATAFARIS=[{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/searc","enabled"\:true},{"token_request_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/oauth\\/token","search_aggregator_secret"\:"VL9zt+Kf-GSxuD7h&Kf3SXfPF6WBsaV.wpM.8cMi&RueW6n\=yAY-7+Kqu&acRhM7kj2g","label"\:"Datafari Main bis","search_api_url"\:"http\:\\/\\/localhost\:8080\\/Datafari\\/api\\/search","enabled"\:true}]
USERS_ALLOWED_SOURCES_FILE=
ACTIVATED=true
DEFAULT_DATAFARI=
TIMEOUT_PER_REQUEST=30000

Be careful to not modify anything except for the line “USERS_ALLOWED_SOURCES_FILE=” where you need to set the path to the file containing the list of Datafari servers available to each user (formatted as described above).
The line should look something like the following after edition:

Code Block
USERS_ALLOWED_SOURCES_FILE=/opt/datafari/tomcat/conf/user-allowed-datafaris.csv
Note

Note that the value has NO QUOTES ! If you put quotes you will get a FileNotFound exception !

Once you saved this file, your user restrictions to external Datafaris is all setup and ready to work. No need to restart Datafari or anything, it should work right away.

Info

You can update the csv file at any time without modifying anything else, it will be taken into account immediately.

Troubleshooting

  • If the path to the csv file is incorrect of the file is not readable, you should see an error message in the Datafari log

  • If the format of the file is incorrect, there is a great chance that no restriction will be applied to all users, although other behavior could be all users having access to only one external source or to nothing depending on the formatting error.

  • If the name of a user is not set correctly, no restriction will be applied to this user

  • If the name of an external Datafari for a user is not set properly, he won’t be able to access to the wrongly typed external source.

  • BE EXTREMELY CAREFUL WITH THE SPACES (see warning above)