Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A dedicated server dedicated to running spacy and the fastapi interface is advised as most NER models require several gigabytes of memory to run smoothly. Each model has its own requirements and limitations so it is advised to test the model you plan on using before hand to check its requirements. And then make sure that the server you plan on running the NER model on can handle that.

Beside that, the folder https://gitlab.datafari.com/sandboxes/spacy_ner_2021/-/tree/master/fastapi contains a set of basic configuration files to setup a fastapi server serving spacy models.

This basic configuration contains models that are able to extract person names, so we don’t need to change the configuration in our case.

...

You can find information on how to deploy a fastapi webservice hosting some spacy models here: Setting up a server to host Spacy for Named Entity Recognition

Info

If you want to develop your own web-service for entity extraction, note that this connector expect your endpoints to work in a specific way. More information on this here:

2. Using the Transformation Connector

...

  • Click on the Spacy Fastapi tab at the top and fill in the name required information:

...

  • Name of the model to be used as well as the prefix for the entities filed (if you want to use one, which is strongly recommended)

...

  • : If a specific model name should be included in the query with the endpoint you are using, precise it here. Can be left blank if no model must be provided.

  • Endpoint: The endpoint on the entity extraction web service that you want your request to be sent to. Defaults to /process/ if none is provided.

  • Prefix: The prefix you want to use for the metadata that will be added to the document for the entities. It is strongly recommended to set one. Metadata will be named [prefix][entity_label]. Defaults to spacyEntities_.

3. Store the information in solr

...