To implement security during search for database issued documents, two cases are possible:
Your documents ACLs are managed by an Active Directory or an LDAP
Your documents ACLs are managed directly by the database
For both cases, the documents ACLs must be returned by the database during the crawling phase. It means that you must have in your database, a table or a view, that is able to provide all the ACLs of a document based on its database id. This table/view must return several tuples for a given document ID, where each tuple represent an access token.
For example, assuming I have one document named “management_procedure.pdf” and that this document has “1” as ID and that the groups “managers” and “developers” have read access to it. I need to have the following table or view named “acl_view” in my database:
acl_id | document_id | access_token |
---|---|---|
1 | 1 | managers |
2 | 1 | developers |
Now if I perform the following query:
SELECT document_id, access_token WHERE document_id="1";
I will retrieve the 2 access tokens of the document in 2 result tuples !
It is very important to have such a structured table/view and that 1 result tuple corresponds to 1 access token for a given document ID because this is what is expected by the MCF JDBC connector during the crawl phase when we enable the security on documents.
Indeed, the job configuration of the JDBC connector expects an “access token query”
As you can see, the query has to be formatted like this template:
SELECT docidfield AS $(IDCOLUMN), aclfield AS $(TOKENCOLUMN) FROM acltable WHERE docidfield IN $(IDLIST)
As you can see, the query must use 3 specific parameters:
$(IDCOLUMN)
: it is used by the connector to identify the document ID in the results tuples. So, the document ID field must be returned in the SELECT statement and mapped to this parameter like in the query template above !$(TOKENCOLUMN)
: it is used by the connector to identify the access token in the results tuples. So, the access token field must be returned in the SELECT statement and mapped to this parameter like in the query template above !$(IDLIST)
: it is replaced by the connector to the list of document IDs that the query must match and return results tuples for. So, the WHERE clause of the query MUST use it as a filter like in the query template above !
If we apply this template to our example case, we should obtain the following “access token query”:
SELECT document_id AS $(IDCOLUMN), access_token AS $(TOKENCOLUMN) FROM acl_view WHERE document_id IN $(IDLIST)
Once the “Access token query” correctly set in the job configuration to fit your case, you need to enable the security in the “Security” tab of the job, otherwise, the access tokens will not be considered by the connector:
Now, your crawled documents with that job will be indexed with their access tokens list and during search in Datafari, the documents in the search results will only be displayed if the user that performs the search, has at least one access token on his side, matching the access tokens list of the documents.
This is were the real difference between the two cases presented in introduction happens !
1. Database documents ACLs managed by an Active Directory/LDAP
If the documents ACLs are managed by an Active Directory/LDAP then you need to /wiki/spaces/DATAFARI/pages/223313957. Be careful though, the authority connector needs to be linked to the same authority group as the JDBC repository connector used in the job ! If the AD/LDAP authority connector has been automatically created by the AD/LDAP configuration UI of Datafari, it will for sure not be in the same authority group, you will need to change it manually in the “Type” tab of the authority connector ! To better understand the reasons of this and the best practices concerning authority connectors please read this documentation: /wiki/spaces/DATAFARI/pages/2188115979
Next, ensure that the ACLs returned by the authority connector are of the same kind and can match the access tokens returned by the data table/view you used in the job configuration of the JDBC connector. If this is not the case, then you will need to modify your table/view so that they are compatible and that a user allowed to access to a specific database document has matching authority connector token(s) with the access tokens of the document.
2. Database documents ACLs managed by the Database itself
If the documents ACLs are managed internally by the database, then you will need to configure a JDBC authority connector that will connect to the database and retrieve the access tokens of users. To do that, follow these steps:
go to the MCD admin UI and navigate to Authorities => List Authority Connections and add a new connection
enter a name (mandatory) and a description (optionnal) in the “Name” tab
Select the “JDBC” connection type and the authority group corresponding to the authority group associated to the repository connector used in the crawl job of your database
Click on the “Continue” button and in the “Database Type” tab that will appear, select the database type corresponding to yours and select the “by label” access method.
NB: The access method serves to find the columns in the resultsets by name or by label, “by name” will search for original column name whereas the “by label” option will search for the column label. A column label is by default the original column name if there is no mapping (e.g ‘SELECT id AS doc_id’ is a mapping of the original column name ‘id’ to ‘doc_id’), if a mapping is defined, then the label will be the mapping name. In our case we definitely want to use labels instead of original column names !In the “Server” tab, fulfill the parameter as you did for the repository connector
Do the same in the “Credentials” tab
In the “Queries” tab you must define two queries:
- the “user ID query” that must return the database user ID based on the username provided by Datafari (which is samaccount@domain when an Active Directory is configured). Its query template is:SELECT idfield AS $(IDCOLUMN) FROM usertable WHERE login = $(USERNAME)
- the “authorization tokens query” that must return authorization tokens for a given user database ID, or a username (like for the ID query). Its query template is:
SELECT groupnamefield AS $(TOKENCOLUMN) FROM grouptable WHERE user_id = $(UID) OR login = $(USERNAME)
The “user ID query” is only mandatory if the table/view used in the authorization tokens query only works with database user IDs and that you need to perform a mapping between usernames (samaccount@domain) and database user IDs. If defined, it must use the $(IDCOLUMN)
parameter that is used by the connector to identify the user database ID, and the $(USERNAME)
parameter that is replaced by the connector by the username provided by Datafari during search.
The “authorization tokens query” must use the $(TOKENCOLUMN)
that is used by the connector to identify in a result set the auth token of the user, and either the $(UID)
or the $(USERNAME)
or both depending if the table/view used in the query is working with usernames (samaccount@domain) or database user IDs (the $(UID)
parameter is replaced by the user ID returned by the “user ID query”). Also, the “authorization tokens query” must return one result tuple per user authorization token. Here is an example of a compatible user_authorizations table/view for the “authorization tokens query”:
auth_id | username | auth_token |
---|---|---|
1 | mike@francelabs.com | managers |
2 | it_department |
With that kind of table/view, the “authorization tokens query” would be
SELECT auth_token AS $(TOKENCOLUMN) FROM user_authorizations WHERE username = $(USERNAME)
Once everything is configured like described in those steps, you can save the authority connector and test it using the "Test Active Directory Authority" tool provided by Datafari mentioned in that documentation /wiki/spaces/DATAFARI/pages/223313957
If the authority is working and properly configured with THE CORRECT AUTHORITY GROUP (the same configured for the repository connector used in the crawl job), your database documents searches are now secured.