/
[DEPRECATED] Setting up a server to host Spacy for Named Entity Recognition

[DEPRECATED] Setting up a server to host Spacy for Named Entity Recognition

With the arrival of our gen AI functionalities (at indexing and searching phase), the Spacy NER has been deprecated as Datafari v6.0

Valid from Datafari 5.3 up to 6.0 not included

Datafari can be setup to use a webservice to extract named entities through the [DEPRECATED] Spacy Transformation Connector . However to do so, a webservice serving spacy models and allowing to query for entities must be setup. This is what this documentation is about.

Resources Needed

We tested the webservice on a machine with an 8 core CPU and 32GB of RAM for sentiment analysis using spacytextblob and keyword extraction using KeyBERT. The spacytextblob library was using the following language models depending on the language detected in documents:

  • "en": "en_core_web_trf",

  • "fr": "fr_dep_news_trf",

  • "de": "de_dep_news_trf",

  • "xx": "xx_ent_wiki_sm"

Resource consumption and requirement will vary depending on the task and models you use. We recommend you read the spacy documentation about the models you plan on using to get an idea of the requirements. Then perform some tests on the webservice before integrating it in an indexation pipeline to make sure it runs smoothly.

Some tasks / model may run better using a GPU when it is available.

Getting the Web-service

We developed a first version of a web-service meeting the requirements of the [DEPRECATED] Spacy Transformation Connector which is available here: https://gitlab.datafari.com/sandboxespublic/spacy-webservice

The readme gives extensive information on how to install, configure and use the web-service.

You can extend the capabilities of the web-service if you need to too. It uses python fastapi library, which is an easy way to build a web API.

Keep in mind this is a work in progress. The current API does not support pools of models and document queues. As a result, if documents are sent at a faster pace than what the models can treat them, an error will be sent back for some documents. The result will be that those documents won’t have any entity attached to them.

 

Related content

[DEPRECATED] Spacy Transformation Connector
[DEPRECATED] Spacy Transformation Connector
More like this
[DEPRECATED] Spacy NER on simplified jobs
[DEPRECATED] Spacy NER on simplified jobs
More like this
Foreword
Read with this
GDPR Inventory - Identify documents with privacy related data with Datafari
GDPR Inventory - Identify documents with privacy related data with Datafari
More like this
[DEPRECATED] OCR on ManifoldCF Configuration with Datafari CE v4
[DEPRECATED] OCR on ManifoldCF Configuration with Datafari CE v4
Read with this
Tagging Text in Datafari
Tagging Text in Datafari
More like this