Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Valid from Datafari v5.4 upwards

...

The reason why the requesting user is passed as a request parameter is because it is the only safe way ! Indeed, normally the user used to request the SOLR index is deduced from the session and ONLY from the session. A user passed as a request parameter to the Search API is automatically ignored for security reasons: anybody would be able to impersonate another user to access its data !
But when the SearchAggregator servlet sends requests to external Datafari instances, the external Datafaris cannot deduce the user from the session because the SearchAggregator cannot authenticate itself to the external Datafari with the requesting user. It would require to be aware of the user credentials and use them for each request which would not be safe in addition to be technically very complex.
This is the reason why the SearchAggregator is the only “entity” allowed by the Search API to pass the requesting user as a request parameter. The SearchAggregator has dedicated client credentials on the Search API (client name: search-aggregator) and those credentials are used to retrieve an OAuth2 access token that the SearchAggregator must include in every request so that the Search API can identify it.
To summarise: When the SearchAggregator sends a request to an external Datafari Search API, it must include in the request an access token that it previously obtained from the targeted external Datafari. That way, the external Datafari instance is able to recognize the SearchAggregator and then, it considers the user passed as a request parameter instead of ignoring it.
The SearchAggregator uses a TokenManager to retrieve access tokens for each external Datafari instance to request and to renew them when they expire.

To be fast and efficient, the SearchAggregator parallelizes the requests to the different external Datafari instances. Each request is executed in its own thread and a timeout is set after which, the request is cancelled. The whole thread requests are monitored by a ThreadExecutor that sets another timeout which we call “global timeout” after which, all threads are killed no matter their status. This ensures that after the global timeout, in any case, a response will be created upon the available responses.

...

Info

In aggregation mode, the further in pagination a user will go, the slower it will be.

We write it again here as a warning as it is fairly important to understand this limitation.

...

Info

Valid for Datafari Enterprise v4.6 and Datafari Community v5.0 up to 5.3

This documentation is valid from Datafari Enterprise v4.6 and Datafari Community v5.0 up to 5.3

The SearchAggregator is a servlet that replaces the SearchProxy API [DEPRECATED]. The main difference is that it is able to dispatch the request to several external Datafari sites and aggregate the responses with the local one, keeping the standard format described in the SearchProxy API [DEPRECATED]

...

1. Working details

When the SearchAggregator servlet receives a request, there are two kinds of behavior:

  • If the results aggregation is enabled: the request is dispatched to the configured external Datafari instances (see section 2) and the returned results are aggregated with the local ones into a unique response formatted in a standard way

  • If the results aggregation is disabled: the servlet acts as the SearchProxy API [DEPRECATED] and executes the request locally

...

https://54.36.146.228/Datafari/SearchAggregator/select?fl=title%2Curl%2Cid%2Cextension%2Cpreview_content%2Clast_modified%2Ccrawl_date%2Cauthor%2Coriginal_file_size%2Cemptied&facet=true&q=*%3A*&rows=10&facet.field={!ex%3Drepo_source}repo_source&facet.field={!ex%3Dextension}extension&facet.field={!ex%3Dentity_person}entity_person&facet.field={!ex%3Dentity_phone_present}entity_phone_present&facet.field={!ex%3Dentity_phone}entity_phone&facet.field={!ex%3Dentity_special_present}entity_special_present&facet.field={!ex%3Dlanguage}language&facet.field={!ex%3Dsource}source&facet.query={!key%3DMoins%20de%20un%20mois}last_modified%3A[NOW-1MONTH TO NOW]&facet.query={!key%3DMoins%20de%20un%20an}last_modified%3A[NOW-1YEAR TO NOW]&facet.query={!key%3DMoins%20de%20cinq%20ans}last_modified%3A[NOW-5YEARS TO NOW]&facet.query={!key%3DMoins%20de%20100ko}original_file_size%3A[0 TO 102400]&facet.query={!key%3DDe%20100ko%20%C3%A0%2010Mo}original_file_size%3A[102400 TO 10485760]&facet.query={!key%3DPlus%20de%2010Mo}original_file_size%3A[10485760 TO *]&id=d36f89ed-03fa-41ee-9aee-ff67c4d8a352&aggregator=&sort=score desc&q.op=AND&spellcheck.collateParam.q.op=AND&spellcheck=false&wt=json&json.wrf=jQuery34104197056299893184_1594988462839&_=1594988462840

https://54.36.146.228/Datafari/SearchAggregator/select?fl=title%2Curl%2Cid%2Cextension%2Cpreview_content%2Clast_modified%2Ccrawl_date%2Cauthor%2Coriginal_file_size%2Cemptied&facet=true&q=*%3A*&rows=10&facet.field={!ex%3Drepo_source}repo_source&facet.field={!ex%3Dextension}extension&facet.field={!ex%3Dentity_person}entity_person&facet.field={!ex%3Dentity_phone_present}entity_phone_present&facet.field={!ex%3Dentity_phone}entity_phone&facet.field={!ex%3Dentity_special_present}entity_special_present&facet.field={!ex%3Dlanguage}language&facet.field={!ex%3Dsource}source&facet.query={!key%3DMoins%20de%20un%20mois}last_modified%3A[NOW-1MONTH TO NOW]&facet.query={!key%3DMoins%20de%20un%20an}last_modified%3A[NOW-1YEAR TO NOW]&facet.query={!key%3DMoins%20de%20cinq%20ans}last_modified%3A[NOW-5YEARS TO NOW]&facet.query={!key%3DMoins%20de%20100ko}original_file_size%3A[0 TO 102400]&facet.query={!key%3DDe%20100ko%20%C3%A0%2010Mo}original_file_size%3A[102400 TO 10485760]&facet.query={!key%3DPlus%20de%2010Mo}original_file_size%3A[10485760 TO *]&id=d36f89ed-03fa-41ee-9aee-ff67c4d8a352&sort=score desc&q.op=AND&spellcheck.collateParam.q.op=AND&fq={!tag%3Drepo_source}repo_source%3A"New Enron"&aggregator=Centos1%2CLocal&spellcheck=false&wt=json&json.wrf=jQuery34104197056299893184_1594988462839&_=1594988462842

...