API Documentation

To promote the integration of Word Sense Disambiguation into real-world applications, AMuSE-WSD provides an interface to a full end-to-end state-of-the-art multilingual pretrained model. Hereafter, we will describe how to use our API to easily get WSD information.

Disambiguate Text

URL

http://nlp.uniroma1.it/amuse-wsd/api/model

Method

POST

Request

Parameter Type Description
documents List<Document> A list of documents to disambiguate.

Document

Attribute Description
text The document to disambiguate.
lang Language of the document.

Example

Request

curl -X 'POST' \
  'http://nlp.uniroma1.it/amuse-wsd/api/model' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {"text":"The quick brown fox jumps over the lazy dog.", "lang":"EN"},
  {"text":"I walked along the river bank.", "lang":"EN"}
]'

						

Response

[{"tokens":[{"index":0,"text":"The","pos":"DET","lemma":"the","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":1,"text":"quick","pos":"ADJ","lemma":"quick","bnSynsetId":"bn:00096664a","wnSynsetOffset":"wn:00032733a","nltkSynset":"agile.s.01"},{"index":2,"text":"brown","pos":"ADJ","lemma":"brown","bnSynsetId":"bn:00098942a","wnSynsetOffset":"wn:00372111a","nltkSynset":"brown.s.01"},{"index":3,"text":"fox","pos":"NOUN","lemma":"fox","bnSynsetId":"bn:00036129n","wnSynsetOffset":"wn:02118333n","nltkSynset":"fox.n.01"},{"index":4,"text":"jumps","pos":"VERB","lemma":"jump","bnSynsetId":"bn:00083833v","wnSynsetOffset":"wn:01963942v","nltkSynset":"jump.v.01"},{"index":5,"text":"over","pos":"ADP","lemma":"over","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":6,"text":"the","pos":"DET","lemma":"the","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":7,"text":"lazy","pos":"ADJ","lemma":"lazy","bnSynsetId":"bn:00105799a","wnSynsetOffset":"wn:00981304a","nltkSynset":"lazy.s.01"},{"index":8,"text":"dog","pos":"NOUN","lemma":"dog","bnSynsetId":"bn:00015267n","wnSynsetOffset":"wn:02084071n","nltkSynset":"dog.n.01"},{"index":9,"text":".","pos":"PUNCT","lemma":".","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"}]},{"tokens":[{"index":0,"text":"I","pos":"PRON","lemma":"I","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":1,"text":"walked","pos":"VERB","lemma":"walk","bnSynsetId":"bn:00095597v","wnSynsetOffset":"wn:01904930v","nltkSynset":"walk.v.01"},{"index":2,"text":"along","pos":"ADP","lemma":"along","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":3,"text":"the","pos":"DET","lemma":"the","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":4,"text":"river","pos":"NOUN","lemma":"river","bnSynsetId":"bn:00067948n","wnSynsetOffset":"wn:09411430n","nltkSynset":"river.n.01"},{"index":5,"text":"bank","pos":"NOUN","lemma":"bank","bnSynsetId":"bn:00008363n","wnSynsetOffset":"wn:09213565n","nltkSynset":"bank.n.01"},{"index":6,"text":".","pos":"PUNCT","lemma":".","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"}]}]

Supported Languages

Online

The Web interface of AMuSE-WSD supports the following 10 languages.

  • AR: Arabic
  • DE: German
  • EN: English
  • ES: Spanish
  • FR: French
  • IT: Italian
  • NL: Dutch
  • PT: Portuguese
  • RU: Russian
  • ZH: Chinese

Offline

The offline version of AMuSE-WSD supports the following 40 languages. Download AMuSE-WSD here.

  • AF: Afrikaans
  • AR: Arabic
  • BG: Bulgarian
  • CA: Catalan
  • CS: Czech
  • DA: Danish
  • DE: German
  • EL: Greek
  • EN: English
  • ES: Spanish
  • ET: Estonian
  • EU: Basque
  • FA: Persian
  • FI: Finnish
  • FR: French
  • GA: Irish
  • HE: Hebrew
  • HI: Hindi
  • HR: Croatian
  • HU: Hungarian
  • ID: Indonesian
  • IT: Italian
  • JA: Japanese
  • KO: Korean
  • LT: Lithuanian
  • LV: Latvian
  • NL: Dutch
  • NB: Norwegian
  • PL: Polish
  • PT: Portuguese
  • RO: Romanian
  • RU: Russian
  • SK: Slovak
  • SL: Slovenian
  • SR: Serbian
  • SV: Swedish
  • TR: Turkish
  • UK: Ukrainian
  • VI: Vietnamese
  • ZH: Chinese

Using AMuSE-WSD Offline

Where to download?

Download AMuSE-WSD here.

Available Docker images

We release different types of Docker images depending on the underlying language model (e.g. BERT or XLM-RoBERTa) and depending on the hardware you want to use (i.e. CPU or GPU). In particular:

  • amuse-large-[cpu|cuda] uses BERT-large and provides state-of-the-art results in English WSD.
  • amuse-large-multilingual-[cpu|cuda] employs XLM-RoBERTa-large and thus offers the best results in multilingual WSD. However, it is also the most demanding in terms of hardware requirements.
  • amuse-medium-multilingual-[cpu|cuda] adopts XLM-RoBERTa-base which provides outputs that in 98% of cases are the same as those of its larger counterpart, but taking half the time.
  • amuse-small-multilingual-[cpu|cuda] uses the multilingual version of MiniLM, a language model distilled from XLM-RoBERTa-base. It is three times faster and three times smaller, while still achieving remarkable results.

Requirements

In order to run AMuSE-WSD offline, you need to install Docker. Please refer to its official documentation to install it on your system.

How to launch your local instance of AMuSE-WSD

To run a local instance of AMuSE-WSD, users are required to perform a one-time setup to load one of the available images. Let’s say that we want to load amuse-large-multilingual-cpu:

#!/bin/bash
						
docker load -i amuse-large-multilingual-cpu-1.0.0.tar
						

After that, AMuSE-WSD can be started by running the following commands:

#!/bin/bash
						
PORT=12345
LANGUAGES="EN FR IT ZH"
						
docker run \
  --name amuse-large-multilingual \
  -p $PORT:80 \
  -e LANGUAGES=$LANGUAGES \
  amuse-large-multilingual-cpu:1.0.0
						

This will run an amuse-large-multilingual instance on CPU on port number 12345 loading the preprocessing models for English, French, Italian and Chinese.

Running AMuSE-WSD on your GPU

If you need GPU support, you can load the cuda version of an image, for example amuse-large-multilingual-cuda.

Before using a cuda image, you have to load it, just like the CPU image:

#!/bin/bash

docker load -i amuse-large-multilingual-cuda-1.0.0.tar

Then you can launch AMuSE-WSD through the docker run command, with an additional flag, --gpus, for example:

#!/bin/bash

PORT=12345
LANGUAGES="EN FR IT ZH"

docker run \
  --name amuse-large-multilingual-cuda \
  --gpus all \
  -p $PORT:80 \
  -e LANGUAGES=$LANGUAGES \
  amuse-large-multilingual-cuda:1.0.0
						

For more info about how to enable GPU support in Docker you can refer to the official documentation.

Usage

AMuSE-WSD exposes an end-point named /api/model. The endpoint accepts POST requests with a JSON body, containing a list of documents. For each document, two parameters must be specified:

  • text: the text of the document
  • lang: the language of the document

Each request returns a JSON response containing a list of objects, one for each input document. Each object in the response provides the tokenization, lemmatization, PoS-tagging and sense information of the corresponding input document.

Let's try with a simple example. We want to disambiguate the words in the following sentence "The quick brown fox jumps over the lazy dog.":

curl -X 'POST' \
  'http://127.0.0.1/api/model' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '[
  {"text":"The quick brown fox jumps over the lazy dog.", "lang":"EN"},
]'
					

If everything went right, the output should be similar to:

[{"tokens":[{"index":0,"text":"The","pos":"DET","lemma":"the","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":1,"text":"quick","pos":"ADJ","lemma":"quick","bnSynsetId":"bn:00096664a","wnSynsetOffset":"32733a","nltkSynset":"agile.s.01"},{"index":2,"text":"brown","pos":"ADJ","lemma":"brown","bnSynsetId":"bn:00098942a","wnSynsetOffset":"372111a","nltkSynset":"brown.s.01"},{"index":3,"text":"fox","pos":"NOUN","lemma":"fox","bnSynsetId":"bn:00036129n","wnSynsetOffset":"2118333n","nltkSynset":"fox.n.01"},{"index":4,"text":"jumps","pos":"VERB","lemma":"jump","bnSynsetId":"bn:00083833v","wnSynsetOffset":"1963942v","nltkSynset":"jump.v.01"},{"index":5,"text":"over","pos":"ADP","lemma":"over","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":6,"text":"the","pos":"DET","lemma":"the","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"},{"index":7,"text":"lazy","pos":"ADJ","lemma":"lazy","bnSynsetId":"bn:00105799a","wnSynsetOffset":"981304a","nltkSynset":"lazy.s.01"},{"index":8,"text":"dog","pos":"NOUN","lemma":"dog","bnSynsetId":"bn:00015267n","wnSynsetOffset":"2084071n","nltkSynset":"dog.n.01"},{"index":9,"text":".","pos":"PUNCT","lemma":".","bnSynsetId":"O","wnSynsetOffset":"O","nltkSynset":"O"}]}]

Environment Variables

There are a few environment variables that can be passed to the docker container to customize the behaviour of your AMuSe-WSD instance:

LANGUAGES

A list of languages, separated by whitespaces. This list indicates the preprocessing models to load with AMuSE-WSD. The WSD model of AMuSE-WSD supports up to 40 languages (see the list of supported languages), but each language requires loading its own preprocessing model. We suggest loading only those languages you need so as to reduce the memory footprint of AMuSE-WSD.

Default value
LANGUAGES="EN"
Example
docker run --name {$CONTAINER_NAME} -e LANGUAGES="EN FR IT" {$IMAGE_NAME}

TIMEOUT

When loading many languages, AMuSE-WSD has to download multiple preprocessing models which may require a long time. If your container crashes, try to increase the value of TIMEOUT.

Default value
TIMEOUT="500"
Example

Setting the TIMEOUT value to 1200.

docker run --name {$CONTAINER_NAME} -e TIMEOUT="1200" {$IMAGE_NAME}

MAX_WORKERS

Can be used to limit the number of simultaneously running processes (instances of AMuSE-WSD).

Default value
MAX_WORKERS=1
Example

Setting the number of AMuSE-WSD instances to 4.

docker run --name {$CONTAINER_NAME} -e MAX_WORKERS="4" {$IMAGE_NAME}

WORKERS_PER_CORE

AMuSE-WSD checks how many CPU cores are available in the current server running your container. It will set the number of workers to the number of CPU cores multiplied by this value.

Default value
WORKERS_PER_CORE=1
Example
docker run --name {$CONTAINER_NAME} -e WORKERS_PER_CORE="3" {$IMAGE_NAME}

If you set the value to 3 in a server with 2 CPU cores, it will run 6 worker processes.

docker run --name {$CONTAINER_NAME} -e WORKERS_PER_CORE="0.5" {$IMAGE_NAME}

LOG_LEVEL

The log level for Gunicorn.

Supported values

One of:

debug
info
warning
error
critical
					
Default value
LOG_LEVEL="info"
Example
docker run --name {$CONTAINER_NAME} -e LOG_LEVEL="info" {$IMAGE_NAME}

FastAPI docs

Auto-generated interactive documentation for the API (thanks to FastAPI).

http://127.0.0.1:PORT/docs

where PORT is the port number you specify when starting up AMuSE-WSD.

Citation

If you use AMuSE-WSD, please cite AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation:

@inproceedings{orlando-etal-2021-amuse-wsd,
  title = {{AMuSE-WSD}: {A}n All-in-one Multilingual System for Easy {W}ord {S}ense {D}isambiguation},
  author  = {Orlando, Riccardo and Conia, Simone and Brignone, Fabrizio and Cecconi, Francesco and Navigli, Roberto},
  booktitle = {Proceedings of EMNLP},
  year = {2021},
  month = {nov},
  address = {Punta Cana, Dominican Republic},
  url = {https://aclanthology.org/2021.emnlp-demo.34},
}
					

Acknowledgements

This project is part of the Universal Semantic Annotator (USeA) project funded by the European Language Grid (ELG), Project Number 825627.