Contextualized sense embeddings at your fingertips! Find out more about:
All of our vectors lie in a space comparable with that of BERT contextualized word embeddings, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbour approach.
ARES and SensEmBERT are supported by the ERC Consolidator Grant MOUSSE No. 726487 under the European Union’s Horizon 2020 research and innovation programme.
Contextualized word embeddings have been employed effectively across several tasks in Natural Language Processing, as they have proved to carry useful semantic information. However, it is still hard to link them to structured sources of knowledge. In this paper we present ARES (context-AwaRe Embeddings of Senses), a semi-supervised approach to producing sense embeddings for the lexical meanings within a lexical knowledge base that lie in a space that is comparable to that of contextualized word vectors. ARES representations enable a simple 1 Nearest-Neighbour algorithm to outperform state-of-the-art models, not only in the English Word Sense Disambiguation task, but also in the multilingual one, whilst training on sense-annotated data in English only. We further assess the quality of our embeddings in the Word-in-Context task, where, when used as an external source of knowledge, they consistently improve the performance of a neural model, leading it to compete with other more complex architectures.
@inproceedings{scarlini-etal-2020-ares,
title={{With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation}},
author={Scarlini, Bianca and Pasini, Tommaso and Navigli, Roberto},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
publisher={Association for Computational Linguistics},
year={2020}
}
ARES is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.
Contextual representations of words derived by neural language models have proven to effectively encode the subtle distinctions that might occur between different meanings of the same word. However, these representations are not tied to a semantic network, hence they leave the word meanings implicit and thereby neglect the information that can be derived from the knowledge base itself. In this paper, we propose SensEmBERT, a knowledge-based approach that brings together the expressive power of language modelling and the vast amount of knowledge contained in a semantic network to produce high-quality latent semantic representations of word meanings in multiple languages. Our vectors lie in a space comparable with that of contextualized word embeddings, thus allowing a word occurrence to be easily linked to its meaning by applying a simple nearest neighbour approach. We show that, whilst not relying on manual semantic annotations, SensEmBERT is able to either achieve or surpass state-of-the-art results attained by most of the supervised neural approaches on the English Word Sense Disambiguation task. When scaling to other languages, our representations prove to be equally effective as their English counterpart and outperform the existing state of the art on all the Word Sense Disambiguation multilingual datasets.
@inproceedings{scarlini-etal-2020-sensembert,
title={{SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation}},
author={Scarlini, Bianca and Pasini, Tommaso and Navigli, Roberto},
booktitle={Proceedings of the Thirty-Fourth Conference on Artificial Intelligence},
publisher={Association for the Advancement of Artificial Intelligence},
pages={8758--8765},
year={2020}
}
SensEmBERT is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.
PhD Student @ Sapienza
scarlini[at]di.uniroma1.it
Postdoc @ Sapienza
pasini[at]di.uniroma1.it
Full Professor @ Sapienza
navigli[at]di.uniroma1.it