API Documentation
To promote the integration of Word Sense Disambiguation into real-world applications, AMuSE-WSD provides an interface to a full end-to-end state-of-the-art multilingual pretrained model. Hereafter, we will describe how to use our API to easily get WSD information.
Disambiguate Text
URL
http://nlp.uniroma1.it/amuse-wsd/api/model
Method
POST
Request
Parameter | Type | Description |
---|---|---|
documents | List<Document> | A list of documents to disambiguate. |
Document
Attribute | Description |
---|---|
text | The document to disambiguate. |
lang | Language of the document. |
Example
Request
curl -X 'POST' \ 'http://nlp.uniroma1.it/amuse-wsd/api/model' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '[ {"text":"The quick brown fox jumps over the lazy dog.", "lang":"EN"}, {"text":"I walked along the river bank.", "lang":"EN"} ]'
Response
Supported Languages
Online
The Web interface of AMuSE-WSD supports the following 10 languages.
- AR: Arabic
- DE: German
- EN: English
- ES: Spanish
- FR: French
- IT: Italian
- NL: Dutch
- PT: Portuguese
- RU: Russian
- ZH: Chinese
Offline
The offline version of AMuSE-WSD supports the following 40 languages. Download AMuSE-WSD here.
- AF: Afrikaans
- AR: Arabic
- BG: Bulgarian
- CA: Catalan
- CS: Czech
- DA: Danish
- DE: German
- EL: Greek
- EN: English
- ES: Spanish
- ET: Estonian
- EU: Basque
- FA: Persian
- FI: Finnish
- FR: French
- GA: Irish
- HE: Hebrew
- HI: Hindi
- HR: Croatian
- HU: Hungarian
- ID: Indonesian
- IT: Italian
- JA: Japanese
- KO: Korean
- LT: Lithuanian
- LV: Latvian
- NL: Dutch
- NB: Norwegian
- PL: Polish
- PT: Portuguese
- RO: Romanian
- RU: Russian
- SK: Slovak
- SL: Slovenian
- SR: Serbian
- SV: Swedish
- TR: Turkish
- UK: Ukrainian
- VI: Vietnamese
- ZH: Chinese
Using AMuSE-WSD Offline
Where to download?
Available Docker images
We release different types of Docker images depending on the underlying language model (e.g. BERT or XLM-RoBERTa) and depending on the hardware you want to use (i.e. CPU or GPU). In particular:
- amuse-large-[cpu|cuda] uses BERT-large and provides state-of-the-art results in English WSD.
- amuse-large-multilingual-[cpu|cuda] employs XLM-RoBERTa-large and thus offers the best results in multilingual WSD. However, it is also the most demanding in terms of hardware requirements.
- amuse-medium-multilingual-[cpu|cuda] adopts XLM-RoBERTa-base which provides outputs that in 98% of cases are the same as those of its larger counterpart, but taking half the time.
- amuse-small-multilingual-[cpu|cuda] uses the multilingual version of MiniLM, a language model distilled from XLM-RoBERTa-base. It is three times faster and three times smaller, while still achieving remarkable results.
Requirements
In order to run AMuSE-WSD offline, you need to install Docker. Please refer to its official documentation to install it on your system.
How to launch your local instance of AMuSE-WSD
To run a local instance of AMuSE-WSD, users are required to perform a one-time setup to load one of the available images. Let’s say that we want to load amuse-large-multilingual-cpu:
#!/bin/bash docker load -i amuse-large-multilingual-cpu-1.0.0.tar
After that, AMuSE-WSD can be started by running the following commands:
#!/bin/bash PORT=12345 LANGUAGES="EN FR IT ZH" docker run \ --name amuse-large-multilingual \ -p $PORT:80 \ -e LANGUAGES=$LANGUAGES \ amuse-large-multilingual-cpu:1.0.0
This will run an amuse-large-multilingual instance on CPU on port number 12345 loading the preprocessing models for English, French, Italian and Chinese.
Running AMuSE-WSD on your GPU
If you need GPU support, you can load the cuda version of an image, for example amuse-large-multilingual-cuda.
Before using a cuda image, you have to load it, just like the CPU image:
#!/bin/bash docker load -i amuse-large-multilingual-cuda-1.0.0.tar
Then you can launch AMuSE-WSD through the docker run command, with an additional flag, --gpus, for example:
#!/bin/bash PORT=12345 LANGUAGES="EN FR IT ZH" docker run \ --name amuse-large-multilingual-cuda \ --gpus all \ -p $PORT:80 \ -e LANGUAGES=$LANGUAGES \ amuse-large-multilingual-cuda:1.0.0
For more info about how to enable GPU support in Docker you can refer to the official documentation.
Usage
AMuSE-WSD exposes an end-point named /api/model. The endpoint accepts POST requests with a JSON body, containing a list of documents. For each document, two parameters must be specified:
- text: the text of the document
- lang: the language of the document
Each request returns a JSON response containing a list of objects, one for each input document. Each object in the response provides the tokenization, lemmatization, PoS-tagging and sense information of the corresponding input document.
Let's try with a simple example. We want to disambiguate the words in the following sentence "The quick brown fox jumps over the lazy dog.":
curl -X 'POST' \ 'http://127.0.0.1/api/model' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '[ {"text":"The quick brown fox jumps over the lazy dog.", "lang":"EN"}, ]'
If everything went right, the output should be similar to:
Environment Variables
There are a few environment variables that can be passed to the docker container to customize the behaviour of your AMuSe-WSD instance:
LANGUAGES
A list of languages, separated by whitespaces. This list indicates the preprocessing models to load with AMuSE-WSD. The WSD model of AMuSE-WSD supports up to 40 languages (see the list of supported languages), but each language requires loading its own preprocessing model. We suggest loading only those languages you need so as to reduce the memory footprint of AMuSE-WSD.
Default value
LANGUAGES="EN"
Example
docker run --name {$CONTAINER_NAME} -e LANGUAGES="EN FR IT" {$IMAGE_NAME}
TIMEOUT
When loading many languages, AMuSE-WSD has to download multiple preprocessing models which may require a long time. If your container crashes, try to increase the value of TIMEOUT.
Default value
TIMEOUT="500"
Example
Setting the TIMEOUT value to 1200.
docker run --name {$CONTAINER_NAME} -e TIMEOUT="1200" {$IMAGE_NAME}
MAX_WORKERS
Can be used to limit the number of simultaneously running processes (instances of AMuSE-WSD).
Default value
MAX_WORKERS=1
Example
Setting the number of AMuSE-WSD instances to 4.
docker run --name {$CONTAINER_NAME} -e MAX_WORKERS="4" {$IMAGE_NAME}
WORKERS_PER_CORE
AMuSE-WSD checks how many CPU cores are available in the current server running your container. It will set the number of workers to the number of CPU cores multiplied by this value.
Default value
WORKERS_PER_CORE=1
Example
docker run --name {$CONTAINER_NAME} -e WORKERS_PER_CORE="3" {$IMAGE_NAME}
If you set the value to 3 in a server with 2 CPU cores, it will run 6 worker processes.
docker run --name {$CONTAINER_NAME} -e WORKERS_PER_CORE="0.5" {$IMAGE_NAME}
LOG_LEVEL
The log level for Gunicorn.
Supported values
One of:
debug info warning error critical
Default value
LOG_LEVEL="info"
Example
docker run --name {$CONTAINER_NAME} -e LOG_LEVEL="info" {$IMAGE_NAME}
FastAPI docs
Auto-generated interactive documentation for the API (thanks to FastAPI).
http://127.0.0.1:PORT/docs
where PORT is the port number you specify when starting up AMuSE-WSD.
Citation
If you use AMuSE-WSD, please cite AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation:
@inproceedings{orlando-etal-2021-amuse-wsd, title = {{AMuSE-WSD}: {A}n All-in-one Multilingual System for Easy {W}ord {S}ense {D}isambiguation}, author = {Orlando, Riccardo and Conia, Simone and Brignone, Fabrizio and Cecconi, Francesco and Navigli, Roberto}, booktitle = {Proceedings of EMNLP}, year = {2021}, month = {nov}, address = {Punta Cana, Dominican Republic}, url = {https://aclanthology.org/2021.emnlp-demo.34}, }
License
AMuSE-WSD is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Acknowledgements
This project is part of the Universal Semantic Annotator (USeA) project funded by the European Language Grid (ELG), Project Number 825627.