...
When it’s done, you should see the RegexEntityConnector in the table, and a new “Regex Entity” tab. Next step is the configuration of those two connectors. We want them to feed metadatas in the Solr index.
rgpdentity_person should contain the names (provided by SpacyConnector)
rgpdentity_phone should contain the phone numbers (provided by RegexEntityConnector)
rgpdentity_email should contain the email addresses (provided by RegexEntityConnector)
...
Model : You can leave blank to use the default model, which is configured the model.json file.
Endpoint : use the one per default, “/split_detect_and_process/”.
Prefix : This defines how your metadata will be named. In our example, we will use “rgdpset it to“entity_” as recommended, so we can get a “rgpduse the existing field “entity_person” metadata.
...
See our documentation for FastAPI Connector here :
Spacy Transformation Connector
...
Once your Spacy Connector is ready, go to the “Regex Entity” tab to configure the Regex Entity Connector. Its function is to store any regex matches in Solr metadata. In this situation, we want to extract phone numbers to “rgpd“entity_phone”, and email addresses to “rgpd“entity_email”.
To do so, we need to add two lines. For each line, this is how you should set the parameters.
...
Destination field:
Email addresses: rgpd entity_email
Phone numbers: rgpd entity_phone
Leave other fields empty.
...
In order to create a metadata into FileShare collection, add those lines in the file $DATAFARI_HOME/solr/solrcloud/FileShare/conf/customs_schema/custom_fields.incl.
You don’t need to add “entity_phone” or “entity_person”, as those already exist by default in Solr.
Code Block |
---|
{ "name":"rgpd_person", "type":"string", "stored":true, "multiValued":true } && { "name":"rgpd_phone", "type":"string", "stored":true, "multiValued":true } && { "name":"rgpd_entity_email", "type":"string", "stored":true, "multiValued":true } |
...
Code Block |
---|
{ "type": "QueryFacet", "title": "RGPDGDPR", "queries": [ "rgpdentity_phone:*", "rgpdentity_email:*", "rgpdentity_person:*" ], "labels": [ "Phone number", "Email address", "Person" ], "id": "rgpdgdpr_facet", "minShow": 5 }, { "type": "FieldFacet", "title": "People", "field": "rgpdentity_person", "op": "OR", "variant": "autocomplete", "minShow": 3, "maxShow": 15, "show": true }, { "type": "FieldFacet", "title": "Phone numbers", "field": "rgpdentity_phone", "op": "OR", "variant": "autocomplete", "minShow": 3, "maxShow": 15, "show": true }, { "type": "FieldFacet", "title": "Email", "field": "rgpdentity_email", "op": "OR", "variant": "autocomplete", "minShow": 3, "maxShow": 15, "show": true } |
...