Publications

NAACL 2025

L. Moroni, G. Puccetti, P. Huguet Cabot, A. Bejgu, A. Miaschi, E. Barba, F. Dell'Orletta, A. Esuli, R. Navigli

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Findings of the Association for Computational Linguistics: NAACL 2025

NAACL 2025

Abstract

BibTex

PDF

@inproceedings{moroni-etal-2025-optimizing,
    title = "Optimizing {LLM}s for {I}talian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation",
    author = "Moroni, Luca  and
      Puccetti, Giovanni  and
      Huguet Cabot, Pere-Llu{\'i}s  and
      Bejgu, Andrei Stefan  and
      Miaschi, Alessio  and
      Barba, Edoardo  and
      Dell{'}Orletta, Felice  and
      Esuli, Andrea  and
      Navigli, Roberto",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.371/",
    pages = "6646--6660",
    ISBN = "979-8-89176-195-7",
    abstract = "The number of pretrained Large Language Models (LLMs) is increasing steadily, though the majority are designed predominantly for the English language. While state-of-the-art LLMs can handle other languages, due to language contamination or some degree of multilingual pretraining data, they are not optimized for non-English languages, leading to inefficient encoding (high token {\textquotedblleft}fertility{\textquotedblright}) and slower inference speed.In this work, we thoroughly compare a variety of vocabulary adaptation techniques for optimizing English LLMs for the Italian language, and put forward Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that leverages neural mapping for vocabulary substitution. SAVA achieves competitive performance across multiple downstream tasks, enhancing grounded alignment strategies. We adapt two LLMs: Mistral-7B-v0.1, reducing token fertility by 25{\%}, and Llama-3.1-8B, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, following the adaptation of the vocabulary, these models can recover their performance with a relatively limited stage of continual training on the target language. Finally, we test the capabilities of the adapted models on various multi-choice and generative tasks."
}

ACL 2024

A. Scirè, K. Ghonim, R. Navigli

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

Findings of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{scire-etal-2024-fenice,
    title = "{FENICE}: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction",
    author = "Scir{\`e}, Alessandro  and
      Ghonim, Karim  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.841",
    doi = "10.18653/v1/2024.findings-acl.841",
    pages = "14148--14161",
    abstract = "Recent advancements in text summarization, particularly with the advent of Large Language Models (LLMs), have shown remarkable performance. However, a notable challenge persists as a substantial number of automatically-generated summaries exhibit factual inconsistencies, such as hallucinations. In response to this issue, various approaches for the evaluation of consistency for summarization have emerged. Yet, these newly-introduced metrics face several limitations, including lack of interpretability, focus on short document summaries (e.g., news articles), and computational impracticality, especially for LLM-based metrics. To address these shortcomings, we propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE), a more interpretable and efficient factuality-oriented metric. FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary. Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation. Moreover, we extend our evaluation to a more challenging setting by conducting a human annotation process of long-form summarization. In the hope of fostering research in summarization factuality evaluation, we release the code of our metric and our factuality annotations of long-form summarization at anonymizedurl.",
}

ACL 2024

A. S. Bejgu, E. Barba, L. Procopio, A. Fernández-Castro, R. Navigli

Word Sense Linking: Disambiguating Outside the Sandbox

Findings of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{bejgu-etal-2024-word,
    title = "Word Sense Linking: Disambiguating Outside the Sandbox",
    author = "Bejgu, Andrei  and
      Barba, Edoardo  and
      Procopio, Luigi  and
      Fern{\'a}ndez-Castro, Alberte  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.851",
    doi = "10.18653/v1/2024.findings-acl.851",
    pages = "14332--14347",
    abstract = "Word Sense Disambiguation (WSD) is the task of associating a word in a given context with its most suitable meaning among a set of possible candidates. While the task has recently witnessed renewed interest, with systems achieving performances above the estimated inter-annotator agreement, at the time of writing it still struggles to find downstream applications. We argue that one of the reasons behind this is the difficulty of applying WSD to plain text. Indeed, in the standard formulation, models work under the assumptions that a) all the spans to disambiguate have already been identified, and b) all the possible candidate senses of each span are provided, both of which are requirements that are far from trivial. In this work, we present a new task called Word Sense Linking (WSL) where, given an input text and a reference sense inventory, systems have to both identify which spans to disambiguate and then link them to their most suitable meaning.We put forward a transformer-based architecture for the task and thoroughly evaluate both its performance and those of state-of-the-art WSD systems scaled to WSL, iteratively relaxing the assumptions of WSD. We hope that our work will foster easier integration of lexical semantics into downstream applications.",
}

ACL 2024

S. Perrella, L. Proietti, A. Scirè, E. Barba, R. Navigli

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{perrella-etal-2024-guardians,
    title = "Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!",
    author = "Perrella, Stefano  and
      Proietti, Lorenzo  and
      Scir{\`e}, Alessandro  and
      Barba, Edoardo  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.856",
    doi = "10.18653/v1/2024.acl-long.856",
    pages = "16216--16244",
    abstract = "Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide researchers toward enhancing the next generation of metrics and MT systems. With the recent introduction of neural metrics, the field has witnessed notable advancements. Nevertheless, the inherent opacity of these metrics has posed substantial challenges to the meta-evaluation process. This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings. To do this, we introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process{'}s accuracy, robustness, and fairness. By employing sentinel metrics, we aim to validate our findings, and shed light on and monitor the potential biases or inconsistencies in the rankings. We discover that the present meta-evaluation framework favors two categories of metrics: i) those explicitly trained to mimic human quality assessments, and ii) continuous metrics. Finally, we raise concerns regarding the evaluation capabilities of state-of-the-art metrics, emphasizing that they might be basing their assessments on spurious correlations found in their training data.",
}

ACL 2024

R. Orlando, P. Huguet Cabot, E. Barba, R. Navigli

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

Findings of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{orlando-etal-2024-relik,
    title = "{R}e{L}i{K}: Retrieve and {L}in{K}, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget",
    author = "Orlando, Riccardo  and
      Huguet Cabot, Pere-Llu{\'\i}s  and
      Barba, Edoardo  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.839",
    doi = "10.18653/v1/2024.findings-acl.839",
    pages = "14114--14132",
    abstract = "Entity Linking (EL) and Relation Extraction (RE) are fundamental tasks in Natural Language Processing, serving as critical components in a wide range of applications. In this paper, we propose ReLiK, a Retriever-Reader architecture for both EL and RE, where, given an input text, the Retriever module undertakes the identification of candidate entities or relations that could potentially appear within the text. Subsequently, the Reader module is tasked to discern the pertinent retrieved entities or relations and establish their alignment with the corresponding textual spans. Notably, we put forward an innovative input representation that incorporates the candidate entities or relations alongside the text, making it possible to link entities or extract relations in a single forward pass and to fully leverage pre-trained language models contextualization capabilities, in contrast with previous Retriever-Reader-based methods, which require a forward pass for each candidate. Our formulation of EL and RE achieves state-of-the-art performance in both in-domain and out-of-domain benchmarks while using academic budget training and with up to 40x inference speed compared to competitors. Finally, we show how our architecture can be used seamlessly for Information Extraction (cIE), i.e. EL + RE, and setting a new state of the art by employing a shared Reader that simultaneously extracts entities and relations.",
}

ACL 2024

G. Martinelli, E. Barba, R. Navigli

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{martinelli-etal-2024-maverick,
    title = "Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends",
    author = "Martinelli, Giuliano  and
      Barba, Edoardo  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.722",
    doi = "10.18653/v1/2024.acl-long.722",
    pages = "13380--13394",
    abstract = "Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work, we challenge this recent trend by introducing Maverick, a carefully designed {--} yet simple {--} pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.",
}

ACL 2024

A. C. Martinez Lorenzo, P. Huguet Cabot, K. Ghonim, L. Xu, H. Choi, A. Fernández-Castro, R. Navigli

Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset

Findings of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{martinez-lorenzo-etal-2024-mitigating,
    title = "Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset",
    author = "Martinez Lorenzo, Abelardo Carlos  and
      Huguet Cabot, Pere-Llu{\'\i}s  and
      Ghonim, Karim  and
      Xu, Lu  and
      Choi, Hee-Soo  and
      Fern{\'a}ndez-Castro, Alberte  and
      Navigli, Roberto",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.836",
    doi = "10.18653/v1/2024.findings-acl.836",
    pages = "14056--14080",
    abstract = "Data scarcity is a prevalent challenge in the era of Large Language Models (LLMs). The insatiable hunger of LLMs for large corpora becomes even more pronounced when dealing with non-English and low-resource languages. The issue is particularly exacerbated in Semantic Parsing (SP), i.e. the task of converting text into a formal representation. The complexity of semantic formalisms makes training human annotators and subsequent data annotation unfeasible on a large scale, especially across languages. To mitigate this, we first introduce the Multilingual Semantic Layer (MSL), a conceptual evolution of previous formalisms, which decouples from disambiguation and external inventories and simplifies the task. MSL provides the necessary tools to encode the meaning across languages, paving the way for developing a high-quality semantic parsing dataset across different languages in a semi-automatic strategy. Subsequently, we manually refine a portion of this dataset and fine-tune GPT-3.5 to propagate these refinements across the dataset. Then, we manually annotate 1,100 sentences in eleven languages, including low-resource ones. Finally, we assess our dataset{'}s quality, showcasing the performance gap reduction across languages in Semantic Parsing.",
}

ACL 2024

R. Navigli, M. Lo Pinto, P. Silvestri, D. Rotondi, S. Ciciliano, A. Scirè

NounAtlas: Filling the Gap in Nominal Semantic Role Labeling

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics

ACL 2024

Abstract

BibTex

PDF

@inproceedings{navigli-etal-2024-nounatlas,
    title = "{N}oun{A}tlas: Filling the Gap in Nominal Semantic Role Labeling",
    author = "Navigli, Roberto  and
      Lo Pinto, Marco  and
      Silvestri, Pasquale  and
      Rotondi, Dennis  and
      Ciciliano, Simone  and
      Scir{\`e}, Alessandro",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.857",
    doi = "10.18653/v1/2024.acl-long.857",
    pages = "16245--16258",
    abstract = "Despite significant advances in Semantic Role Labeling (SRL), much work in this field has been carried out with a focus on verbal predicates, with the research on nominal SRL lagging behind. In many contexts, however, nominal predicates are often as informative as verbal ones, thus needing proper treatment. In this paper we aim to fill this gap and make nominal SRL a first-class citizen. We introduce a novel approach to create the first large-scale, high-quality inventory of nominal predicates and organize them into semantically-coherent frames. Although automatically created, NounAtlas {--} our frame inventory {--} is subsequently fully validated. We then put forward a technique to generate silver training data for nominal SRL and show that a state-of-the-art SRL model can achieve good performance. Interestingly, thanks to our design choices which enable seamless integration of our predicate inventory with its verbal counterpart, we can mix verbal and nominal data and perform robust SRL on both types of predicates.",
}

EACL 2024

F.M. Molfese, A.S. Bejgu, S. Tedeschi, S. Conia, R. Navigli

CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics

EACL 2024

Abstract

BibTex

PDF

@inproceedings{molfese-etal-2024-neuralign,
    title = "Neuralign: A Context-Aware, Cross-Lingual and Fully-Neural Sentence Alignment System for Long Texts",
    author = "Molfese, Francesco  and
      Bejgu, Andrei  and
      Tedeschi, Simone  and
      Conia, Simone  and
      Navigli, Roberto",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.135",
    pages = "2209--2220",
    abstract = "Sentence alignment {--} establishing links between corresponding sentences in two related documents {--} is an important NLP task with several downstream applications, such as machine translation (MT).Despite the fact that existing sentence alignment systems have achieved promising results, their effectiveness is based on auxiliary information such as document metadata or machine-generated translations, as well as hyperparameter-sensitive techniques. Moreover, these systems often overlook the crucial role that context plays in the alignment process.In this paper, we address the aforementioned issues and propose Neuralign: the first context-aware, end-to-end and fully-neural architecture for sentence alignment. Our system maps source and target sentences in long documents by contextualizing their sentence embeddings with respect to the other sentences in the document. We extensively evaluate Neuralign on a multilingual dataset consisting of 20 language pairs derived from the Opus project, and demonstrate that our model achieves state-of-the-art performance. To ensure reproducibility, we release our code and model checkpoints at https://github.com/Babelscape/Neuralign.",
}

EMNLP 2024

S. Perrella, L. Proietti, P. Huguet-Cabot, E. Barba, and R. Navigli

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

EMNLP 2024

Abstract

BibTex

PDF

@inproceedings{perrella-etal-2024-beyond,
    title = "Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics",
    author = "Perrella, Stefano  and
      Proietti, Lorenzo  and
      Huguet Cabot, Pere-Llu{\'i}s  and
      Barba, Edoardo  and
      Navigli, Roberto",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1152/",
    doi = "10.18653/v1/2024.emnlp-main.1152",
    pages = "20689--20714",
    abstract = "Machine Translation (MT) evaluation metrics assess translation quality automatically. Recently, researchers have employed MT metrics for various new use cases, such as data filtering and translation re-ranking. However, most MT metrics return assessments as scalar scores that are difficult to interpret, posing a challenge to making informed design choices. Moreover, MT metrics' capabilities have historically been evaluated using correlation with human judgment, which, despite its efficacy, falls short of providing intuitive insights into metric performance, especially in terms of new metric use cases. To address these issues, we introduce an interpretable evaluation framework for MT metrics. Within this framework, we evaluate metrics in two scenarios that serve as proxies for the data filtering and translation re-ranking use cases. Furthermore, by measuring the performance of MT metrics using Precision, Recall, and F-score, we offer clearer insights into their capabilities than correlation with human judgments. Finally, we raise concerns regarding the reliability of manually curated data following the Direct Assessments+Scalar Quality Metrics (DA+SQM) guidelines, reporting a notably low agreement with Multidimensional Quality Metrics (MQM) annotations."
}

EMNLP 2024

F. Molfese, S. Conia, R. Orlando, and R. Navigli

ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

EMNLP 2024

Abstract

BibTex

PDF

@inproceedings{molfese-etal-2024-zebra,
    title = "{ZEBRA}: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering",
    author = "Molfese, Francesco Maria  and
      Conia, Simone  and
      Orlando, Riccardo  and
      Navigli, Roberto",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1251/",
    doi = "10.18653/v1/2024.emnlp-main.1251",
    pages = "22429--22444",
    abstract = "Current Large Language Models (LLMs) have shown strong reasoning capabilities in commonsense question answering benchmarks, but the process underlying their success remains largely opaque. As a consequence, recent approaches have equipped LLMs with mechanisms for knowledge retrieval, reasoning and introspection, not only to improve their capabilities but also to enhance the interpretability of their outputs. However, these methods require additional training, hand-crafted templates or human-written explanations. To address these issues, we introduce ZEBRA, a zero-shot question answering framework that combines retrieval, case-based reasoning and introspection and dispenses with the need for additional training of the LLM. Given an input question, ZEBRA retrieves relevant question-knowledge pairs from a knowledge base and generates new knowledge by reasoning over the relationships in these pairs. This generated knowledge is then used to answer the input question, improving the model`s performance and interpretability. We evaluate our approach across 8 well-established commonsense reasoning benchmarks, demonstrating that ZEBRA consistently outperforms strong LLMs and previous knowledge integration approaches, achieving an average accuracy improvement of up to 4.5 points."
}

EMNLP 2024

S. Conia, D. Lee, M. Li, U. Minhas, S. Potdar, and Y. Li

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

EMNLP 2024

Abstract

BibTex

PDF

@inproceedings{conia-etal-2024-towards,
    title = "Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs",
    author = "Conia, Simone  and
      Lee, Daniel  and
      Li, Min  and
      Minhas, Umar Farooq  and
      Potdar, Saloni  and
      Li, Yunyao",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.914/",
    doi = "10.18653/v1/2024.emnlp-main.914",
    pages = "16343--16360",
    abstract = "Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be caused by transcreation, an adaptation process that entails more than transliteration and word-for-word translation. In this paper, we address the problem of cross-cultural translation on two fronts: (i) we introduce XC-Translate, the first large-scale, manually-created benchmark for machine translation that focuses on text that contains potentially culturally-nuanced entity names, and (ii) we propose KG-MT, a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model by leveraging a dense retrieval mechanism. Our experiments and analyses show that current machine translation systems and large language models still struggle to translate texts containing entity names, whereas KG-MT outperforms state-of-the-art approaches by a large margin, obtaining a 129{\%} and 62{\%} relative improvement compared to NLLB-200 and GPT-4, respectively."
}

GeBNLP 2024

M. Stranisci, P. Huguet Cabot, E. Bassignana, R. Navigli

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin

Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing

GeBNLP 2024

Abstract

BibTex

PDF

@inproceedings{stranisci-etal-2024-dissecting,
    title = "Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People{'}s Gender and Origin",
    author = "Stranisci, Marco  and
      Huguet Cabot, Pere-Llu{\'\i}s  and
      Bassignana, Elisa  and
      Navigli, Roberto",
    editor = "Fale{\'n}ska, Agnieszka  and
      Basta, Christine  and
      Costa-juss{\`a}, Marta  and
      Goldfarb-Tarrant, Seraphina  and
      Nozza, Debora",
    booktitle = "Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.gebnlp-1.12",
    doi = "10.18653/v1/2024.gebnlp-1.12",
    pages = "190--202",
    abstract = "Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving {`}person{'} entities from RE resources, (ii) collecting meta-information ({`}gender{'} and {`}place of birth{'}) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.",
}

LREC-COLING 2024

L. Proietti, S. Perrella, S. Tedeschi, G. Vulpis, L. Lavalle, A. Sanchietti, A. Ferrari, R. Navigli

Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

LREC-COLING 2024

Abstract

BibTex

PDF

@inproceedings{proietti-etal-2024-analyzing-homonymy,
    title = "Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models",
    author = "Proietti, Lorenzo  and
      Perrella, Stefano  and
      Tedeschi, Simone  and
      Vulpis, Giulia  and
      Lavalle, Leonardo  and
      Sanchietti, Andrea  and
      Ferrari, Andrea  and
      Navigli, Roberto",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italy",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.83",
    pages = "924--938",
    abstract = "Word Sense Disambiguation (WSD) is a key task in Natural Language Processing (NLP), aiming to assign the correct meaning (sense) to a word in context. However, traditional WSD systems rely on WordNet as the underlying sense inventory, often differentiating meticulously between subtle nuances of word meanings, which may lead to excessive complexity and reduced practicality of WSD systems in today{'}s NLP. Indeed, current Pretrained Language Models (PLMs) do seem to be able to perform disambiguation, but it is not clear to what extent, or to what level of granularity, they actually operate. In this paper, we address these points and, firstly, introduce a new large-scale resource that leverages homonymy relations to systematically cluster WordNet senses, effectively reducing the granularity of word senses to a very coarse-grained level; secondly, we use this resource to train Homonymy Disambiguation systems and investigate whether PLMs are inherently able to differentiate coarse-grained word senses. Our findings demonstrate that, while state-of-the-art models still struggle to choose the correct fine-grained meaning of a word in context, Homonymy Disambiguation systems are able to differentiate homonyms with up to 95{\%} accuracy scores even without fine-tuning the underlying PLM. We release our data and code at https://github.com/SapienzaNLP/homonymy-wsd.",
}

LREC-COLING 2024

I. Ghinassi, S. Tedeschi, P. Marongiu, R. Navigli, B. McGillivray

Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

LREC-COLING 2024

Abstract

BibTex

PDF

@inproceedings{ghinassi-etal-2024-language-pivoting,
    title = "Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on {L}atin",
    author = "Ghinassi, Iacopo  and
      Tedeschi, Simone  and
      Marongiu, Paola  and
      Navigli, Roberto  and
      McGillivray, Barbara",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italy",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.880",
    pages = "10073--10084",
    abstract = "Word Sense Disambiguation (WSD) is an important task in NLP, which serves the purpose of automatically disambiguating a polysemous word with its most likely sense in context. Recent studies have advanced the state of the art in this task, but most of the work has been carried out on contemporary English or other modern languages, leaving challenges posed by low-resource languages and diachronic change open. Although the problem with low-resource languages has recently been mitigated by using existing multilingual resources to propagate otherwise expensive annotations from English to other languages, such techniques have hitherto not been applied to historical languages such as Latin. In this work, we make the following two major contributions. First, we test such a strategy on a historical language and propose a new approach in this framework which makes use of existing bilingual corpora instead of native English datasets. Second, we fine-tune a Latin WSD model on the data produced and achieve state-of-the-art results on a standard benchmark for the task. Finally, we release the dataset generated with our approach, which is the largest dataset for Latin WSD to date. This work opens the door to further research, as our approach can be used for different historical and, generally, under-resourced languages.",
}

NAACL 2024

G. Martinelli, F. Molfese, S. Tedeschi, A. Fernández-Castro, R. Navigli

CNER: Concept and Named Entity Recognition

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics

NAACL 2024

Abstract

BibTex

PDF

@inproceedings{martinelli-etal-2024-cner,
    title = "{CNER}: Concept and Named Entity Recognition",
    author = "Martinelli, Giuliano  and
      Molfese, Francesco  and
      Tedeschi, Simone  and
      Fern{\'a}ndez-Castro, Alberte  and
      Navigli, Roberto",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.461",
    pages = "8329--8344",
    abstract = "Named entities {--} typically expressed via proper nouns {--} play a key role in Natural Language Processing, as their identification and comprehension are crucial in tasks such as Relation Extraction, Coreference Resolution and Question Answering, among others. Tasks like these also often entail dealing with concepts {--} typically represented by common nouns {--} which, however, have not received as much attention. Indeed, the potential of their identification and understanding remains underexplored, as does the benefit of a synergistic formulation with named entities. To fill this gap, we introduce Concept and Named Entity Recognition (CNER), a new unified task that handles concepts and entities mentioned in unstructured texts seamlessly. We put forward a comprehensive set of categories that can be used to model concepts and named entities jointly, and propose new approaches for the creation of CNER datasets. We evaluate the benefits of performing CNER as a unified task extensively, showing that a CNER model gains up to +5.4 and +8 macro F1 points when compared to specialized named entity and concept recognition systems, respectively. Finally, to encourage the development of CNER systems, we release our datasets and models at https://github.com/Babelscape/cner.",
}

NAACL 2024

S. Conia, E. Barba, A. C. Martinez Lorenzo, P. Huguet Cabot, R. Orlando, L. Procopio, R. Navigli

MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics

NAACL 2024

Abstract

BibTex

PDF

@inproceedings{conia-etal-2024-mosaico,
    title = "{MOSAIC}o: a Multilingual Open-text Semantically Annotated Interlinked Corpus",
    author = "Conia, Simone  and
      Barba, Edoardo  and
      Martinez Lorenzo, Abelardo Carlos  and
      Huguet Cabot, Pere-Llu{\'\i}s  and
      Orlando, Riccardo  and
      Procopio, Luigi  and
      Navigli, Roberto",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.442",
    pages = "7983--7997",
    abstract = "Several Natural Language Understanding (NLU) tasks focus on linking text to explicit knowledge, including Word Sense Disambiguation, Semantic Role Labeling, Semantic Parsing, and Relation Extraction. In addition to the importance of connecting raw text with explicit knowledge bases, the integration of such carefully curated knowledge into deep learning models has been shown to be beneficial across a diverse range of applications, including Language Modeling and Machine Translation. Nevertheless, the scarcity of semantically-annotated corpora across various tasks and languages limits the potential advantages significantly. To address this issue, we put forward MOSAICo, the first endeavor aimed at equipping the research community with the key ingredients to model explicit semantic knowledge at a large scale, providing hundreds of millions of silver yet high-quality annotations for four NLU tasks across five languages. We describe the creation process of MOSAICo, demonstrate its quality and variety, and analyze the interplay between different types of semantic information. MOSAICo, available at https://github.com/SapienzaNLP/mosaico, aims to drop the requirement of closed, licensed datasets and represents a step towards a level playing field across languages and tasks in NLU.",
}

AACL 2023

F. Martelli, L. Procopio, E. Barba, R. Navigli

LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference ofthe Asia-Pacific Chapter of the Association for Computational Linguistics (14) (PDF) LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries. Available from: https:

AACL 2023

Abstract

BibTex

PDF
BibTex
```
@inproceedings{inproceedings,
author = {Martelli, Federico and Procopio, Luigi and Barba, Edoardo and Navigli, Roberto},
year = {2023},
month = {11},
pages = {},
title = {LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries}
}
                
```

ACL 2023

A. Scirè, S. Conia, S. Ciciliano, R. Navigli

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization

Findings of the Association for Computational Linguistics

ACL 2023

Abstract

BibTex

PDF

@inproceedings{scire-etal-2023-echoes,
    title = "Echoes from Alexandria: A Large Resource for Multilingual Book Summarization",
    author = "Scir{\`e}, Alessandro  and
      Conia, Simone  and
      Ciciliano, Simone  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.54",
    doi = "10.18653/v1/2023.findings-acl.54",
    pages = "853--867",
    abstract = "In recent years, research in text summarization has mainly focused on the news domain, where texts are typically short and have strong layout features. The task of full-book summarization presents additional challenges which are hard to tackle with current resources, due to their limited size and availability in English only. To overcome these limitations, we present {``}Echoes from Alexandria{''}, or in shortened form, {``}Echoes{''}, a large resource for multilingual book summarization. Echoes featuresthree novel datasets: i) Echo-Wiki, for multilingual book summarization, ii) Echo-XSum, for extremely-compressive multilingual book summarization, and iii) Echo-FairySum, for extractive book summarization. To the best of our knowledge, Echoes {--} with its thousands of books and summaries {--} is the largest resource, and the first to be multilingual, featuring 5 languages and 25 language pairs. In addition to Echoes, we also introduce a new extractive-then-abstractive baseline, and, supported by our experimental results and manual analysis of the summaries generated, we argue that this baseline is more suitable for book summarization than purely-abstractive approaches. We release our resource and software at \url{https://github.com/Babelscape/echoes-from-alexandria} in the hope of fostering innovative research in multilingual booksummarization.",
}

ACL 2023

E. Barba, N. Campolungo, R. Navigli

DMLM: Descriptive Masked Language Modeling

Findings of the Association for Computational Linguistics

ACL 2023

Abstract

BibTex

PDF

@inproceedings{barba-etal-2023-dmlm,
    title = "{DMLM}: Descriptive Masked Language Modeling",
    author = "Barba, Edoardo  and
      Campolungo, Niccol{\`o}  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.808",
    doi = "10.18653/v1/2023.findings-acl.808",
    pages = "12770--12788",
    abstract = "Over the last few years, Masked Language Modeling (MLM) pre-training has resulted in remarkable advancements in many Natural Language Understanding (NLU) tasks, which sparked an interest in researching alternatives and extensions to the MLM objective. In this paper, we tackle the absence of explicit semantic grounding in MLM and propose Descriptive Masked Language Modeling (DMLM), a knowledge-enhanced reading comprehension objective, where the model is required to predict the most likely word in a context, being provided with the word{'}s definition. For instance, given the sentence {``}I was going to the {\_}{''}, if we provided as definition {``}financial institution{''}, the model would have to predict the word {``}bank{''}; if, instead, we provided {``}sandy seashore{''}, the model should predict {``}beach{''}. Our evaluation highlights the effectiveness of DMLM in comparison with standard MLM, showing improvements on a number of well-established NLU benchmarks, as well as other semantics-focused tasks, e.g., Semantic Role Labeling. Furthermore, we demonstrate how it is possible to take full advantage of DMLM to embed explicit semantics in downstream tasks, explore several properties of DMLM-based contextual representations and suggest a number of future directions to investigate.",
}

ACL 2023

A. C. Martinez Lorenzo, P. L. Huguet Cabot, R. Navigli

AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

ACL 2023

Abstract

BibTex

PDF

@inproceedings{martinez-lorenzo-etal-2023-amrs,
    title = "{AMR}s Assemble! Learning to Ensemble with Autoregressive Models for {AMR} Parsing",
    author = "Mart{\'\i}nez Lorenzo, Abelardo Carlos  and
      Huguet Cabot, Pere Llu{\'\i}s  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-short.137",
    doi = "10.18653/v1/2023.acl-short.137",
    pages = "1595--1605",
    abstract = "In this paper, we examine the current state-of-the-art in AMR parsing, which relies on ensemble strategies by merging multiple graph predictions. Our analysis reveals that the present models often violate AMR structural constraints. To address this issue, we develop a validation method, and show how ensemble models can exploit SMATCH metric weaknesses to obtain higher scores, but sometimes result in corrupted graphs. Additionally, we highlight the demanding need to compute the SMATCH score among all possible predictions. To overcome these challenges, we propose two novel ensemble strategies based on Transformer models, improving robustness to structural constraints, while also reducing the computational time. Our methods provide new insights for enhancing AMR parsers and metrics. Our code is available at [\url{https://www.github.com/babelscape/AMRs-Assemble}](\url{https://www.github.com/babelscape/AMRs-Assemble}).",
}

ACL 2023

P. Vasylenko, P. L. Huguet Cabot, A. C. Martinez Lorenzo, R. Navigli

Incorporating Graph Information in Transformer-based AMR Parsing

Findings of the Association for Computational Linguistics

ACL 2023

Abstract

BibTex

PDF

@inproceedings{vasylenko-etal-2023-incorporating,
    title = "Incorporating Graph Information in Transformer-based {AMR} Parsing",
    author = "Vasylenko, Pavlo  and
      Huguet Cabot, Pere Llu{\'\i}s  and
      Mart{\'\i}nez Lorenzo, Abelardo Carlos  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.125",
    doi = "10.18653/v1/2023.findings-acl.125",
    pages = "1995--2011",
    abstract = "Abstract Meaning Representation (AMR) is a Semantic Parsing formalism that aims at providing a semantic graph abstraction representing a given text. Current approaches are based on autoregressive language models such as BART or T5, fine-tuned through Teacher Forcing to obtain a linearized version of the AMR graph from a sentence. In this paper, we present LeakDistill, a model and method that explores a modification to the Transformer architecture, using structural adapters to explicitly incorporate graph information into the learned representations and improve AMR parsing performance. Our experiments show how, by employing word-to-node alignment to embed graph structural information into the encoder at training time, we can obtain state-of-the-art AMR parsing through self-knowledge distillation, even without the use of additional data. We release the code at [\url{http://www.github.com/sapienzanlp/LeakDistill}](\url{http://www.github.com/sapienzanlp/LeakDistill}).",
}

ACL 2023

R. Orlando, S. Conia, R. Navigli

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities

Findings of the Association for Computational Linguistics

ACL 2023

Abstract

BibTex

PDF

@inproceedings{orlando-etal-2023-exploring,
    title = "Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities",
    author = "Orlando, Riccardo  and
      Conia, Simone  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.783",
    doi = "10.18653/v1/2023.findings-acl.783",
    pages = "12378--12388",
    abstract = "Although we have witnessed impressive progress in Semantic Role Labeling (SRL), most of the research in the area is carried out assuming that the majority of predicates are verbs. Conversely, predicates can also be expressed using other parts of speech, e.g., nouns and adjectives. However, non-verbal predicates appear in the benchmarks we commonly use to measure progress in SRL less frequently than in some real-world settings {--} newspaper headlines, dialogues, and tweets, among others. In this paper, we put forward a new PropBank dataset which boasts wide coverage of multiple predicate types. Thanks to it, we demonstrate empirically that standard benchmarks do not provide an accurate picture of the current situation in SRL and that state-of-the-art systems are still incapable of transferring knowledge across different predicate types. Having observed these issues, we also present a novel, manually-annotated challenge set designed to give equal importance to verbal, nominal, and adjectival predicate-argument structures. We use such dataset to investigate whether we can leverage different linguistic resources to promote knowledge transfer. In conclusion, we claim that SRL is far from {``}solved{''}, and its integration with other semantic tasks might enable significant improvements in the future, especially for the long tail of non-verbal predicates, thereby facilitating further research on SRL for non-verbal predicates. We release our software and datasets at \url{https://github.com/sapienzanlp/exploring-srl}.",
}

ACL 2023

S. Tedeschi, J. Bos, T. Declerck, J. Hajič, D. Hershcovich, E. Hovy, A. Koller, S. Krek, S. Schockaert, R. Sennrich, E. Shutova, R. Navigli

What’s the Meaning of Superhuman Performance in Today’s NLU?

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

ACL 2023

Abstract

BibTex

PDF

@inproceedings{tedeschi-etal-2023-whats,
    title = "What{'}s the Meaning of Superhuman Performance in Today{'}s {NLU}?",
    author = "Tedeschi, Simone  and
      Bos, Johan  and
      Declerck, Thierry  and
      Haji{\v{c}}, Jan  and
      Hershcovich, Daniel  and
      Hovy, Eduard  and
      Koller, Alexander  and
      Krek, Simon  and
      Schockaert, Steven  and
      Sennrich, Rico  and
      Shutova, Ekaterina  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.697",
    doi = "10.18653/v1/2023.acl-long.697",
    pages = "12471--12491",
    abstract = "In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocative idea that certain tasks have been solved. In this position paper, we take a critical look at these claims and ask whether PLMs truly have superhuman abilities and what the current benchmarks are really evaluating. We show that these benchmarks have serious limitations affecting the comparison between humans and PLMs and provide recommendations for fairer and more transparent benchmarks.",
}

ACL 2023

‪P. L. Huguet Cabot, S.Tedeschi, A. Ngonga Ngomo, R. Navigli

REDFM: a Filtered and Multilingual Relation Extraction Dataset

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

ACL 2023

Abstract

BibTex

PDF

@inproceedings{huguet-cabot-etal-2023-red,
    title = "{RED}$^{\textrm{FM}}$: a Filtered and Multilingual Relation Extraction Dataset",
    author = "Huguet Cabot, ‪Pere-Llu{\'\i}s  and
      Tedeschi, Simone  and
      Ngonga Ngomo, Axel-Cyrille  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.237",
    doi = "10.18653/v1/2023.acl-long.237",
    pages = "4326--4343",
    abstract = "Relation Extraction (RE) is a task that identifies relationships between entities in a text, enabling the acquisition of relational facts and bridging the gap between natural language and structured knowledge. However, current RE models often rely on small datasets with low coverage of relation types, particularly when working with languages other than English.In this paper, we address the above issue and provide two new resources that enable the training and evaluation of multilingual RE systems. First, we present SRED$^{\textrm{FM}}$, an automatically annotated dataset covering 18 languages, 400 relation types, 13 entity types, totaling more than 40 million triplet instances. Second, we propose RED$^{\textrm{FM}}$, a smaller, human-revised dataset for seven languages that allows for the evaluation of multilingual RE systems. To demonstrate the utility of these novel datasets, we experiment with the first end-to-end multilingual RE model, mREBEL, that extracts triplets, including entity types, in multiple languages. We release our resources and model checkpoints at [\url{https://www.github.com/babelscape/rebel}](\url{https://www.github.com/babelscape/rebel}).",
}

ACL 2023

A. C. Martínez Lorenzo, P. L. Huguet Cabot, R. Navigli

Cross-lingual AMR Aligner: Paying Attention to Cross-Attention

Findings of the Association for Computational Linguistics

ACL 2023

Abstract

BibTex

PDF

@inproceedings{martinez-lorenzo-etal-2023-cross,
    title = "Cross-lingual {AMR} Aligner: Paying Attention to Cross-Attention",
    author = "Mart{\'\i}nez Lorenzo, Abelardo Carlos  and
      Huguet Cabot, Pere Llu{\'\i}s  and
      Navigli, Roberto",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.109",
    doi = "10.18653/v1/2023.findings-acl.109",
    pages = "1726--1742",
    abstract = "This paper introduces a novel aligner for Abstract Meaning Representation (AMR) graphs that can scale cross-lingually, and is thus capable of aligning units and spans in sentences of different languages. Our approach leverages modern Transformer-based parsers, which inherently encode alignment information in their cross-attention weights, allowing us to extract this information during parsing. This eliminates the need for English-specific rules or the Expectation Maximization (EM) algorithm that have been used in previous approaches. In addition, we propose a guided supervised method using alignment to further enhance the performance of our aligner. We achieve state-of-the-art results in the benchmarks for AMR alignment and demonstrate our aligner{'}s ability to obtain them across multiple languages. Our code will be available at [\url{https://www.github.com/babelscape/AMR-alignment}](\url{https://www.github.com/babelscape/AMR-alignment}).",
}

CLiC-IT 2023

F. Martelli, A. S. Bejgu, C. Campagnano, J. Čibej, R. Costa, A. Gantar, J. Kallas, S. Koeva, K.Koppel, S. Krek, M. Langemets, V. Lipp, S. Nimb, S. Olsen, B. S. Pedersen, V. Quochi, A. Salgado, L. Simon, C. Tiberius, R. Ureña-Ruiz, R. Navigli

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Proceedings of CLiC-it Ninth Italian Conference on Computational Linguistics

CLiC-IT 2023

Abstract

BibTex

PDF
BibTex
```
test
                
```
EACL 2023

L. Procopio,S. Conia,E. Barba,R. Navigli

Entity Disambiguation with Entity Definitions

Proceedings of The 17th Conference of the European Chapter of the Association for Computational Linguistics

EACL 2023

Abstract

BibTex

PDF
BibTex
```
@misc{procopio2022entity,
      title={Entity Disambiguation with Entity Definitions}, 
      author={Luigi Procopio and Simone Conia and Edoardo Barba and Roberto Navigli},
      year={2022},
      eprint={2210.05648},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
                
```
EMNLP 2023

V. Iyer, E. Barba, A. Birch, J. Z. Pan, R. Navigli

Code-Switching with Word Senses for Pretraining in Neural Machine Translation

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

EMNLP 2023

Abstract

BibTex

PDF
BibTex
```
test
                
```
AAAI 2022

B. Scarlini, T. Pasini, R. Navigli

Visual Definition Modeling: Challenging Vision & Language Models to Define Words and Objects

Proc. of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)

AAAI 2022

Abstract

BibTex

PDF
BibTex
```
@inproceedings{inproceedings,
author = {Scarlini, Bianca and Pasini, Tommaso and Navigli, Roberto},
year = {2022},
month = {02},
pages = {},
title = {Visual Definition Modeling: Challenging Vision & Language Models to Define Words and Objects}
}
                
```
AAAI 2022

S. Pepe, E. Barba, R. Blloshmi, R. Navigli

STEPS: Semantic Typing of Event Processes with a Sequence-to-Sequence Approach

Proc. of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)

AAAI 2022

Abstract

BibTex

PDF
BibTex
```
@inproceedings{inproceedings,
author = {Pepe, Sveva and Barba, Edoardo and Blloshmi, Rexhina and Navigli, Roberto},
year = {2022},
month = {02},
pages = {},
title = {STEPS: Semantic Typing of Event Processes with a Sequence-to-Sequence Approach}
}
                
```
AAAI 2022

R. Navigli, R. Blloshmi, A. C. Martinez Lorenzo

BabelNet Meaning Representation: A Fully Semantic Formalism to Overcome Language Barriers

Proc. of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)

AAAI 2022

Abstract

BibTex

PDF
BibTex
```
@inproceedings{inproceedings,
author = {Navigli, Roberto and Blloshmi, Rexhina and Martinez Lorenzo, Abelardo},
year = {2022},
month = {02},
pages = {},
title = {BabelNet Meaning Representation: A Fully Semantic Formalism to Overcome Language Barriers}
}
                
```

ACL 2022

N. Campolungo, F. Martelli, F. Saina, R. Navigli

DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{campolungo-etal-2022-dibimt,
    title = "{D}i{B}i{MT}: A Novel Benchmark for Measuring {W}ord {S}ense {D}isambiguation Biases in {M}achine {T}ranslation",
    author = "Campolungo, Niccol{\`o}  and
      Martelli, Federico  and
      Saina, Francesco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.298",
    pages = "4331--4352",
    abstract = "Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over the last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by the polysemous nature of words. Within this body of research, some studies have posited that models pick up semantic biases existing in the training data, thus producing translation errors. In this paper, we present DiBiMT, the first entirely manually-curated evaluation benchmark which enables an extensive study of semantic biases in Machine Translation of nominal and verbal words in five different language combinations, namely, English and one or other of the following languages: Chinese, German, Italian, Russian and Spanish. Furthermore, we test state-of-the-art Machine Translation systems, both commercial and non-commercial ones, against our new test bed and provide a thorough statistical and linguistic analysis of the results. We release DiBiMT at https://nlp.uniroma1.it/dibimt as a closed benchmark with a public leaderboard.",
}

ACL 2022

C. Campagnano, S. Conia, R. Navigli

SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{campagnano-etal-2022-srl4e,
    title = "{SRL4E} {--} {S}emantic {R}ole {L}abeling for {E}motions: {A} Unified Evaluation Framework",
    author = "Campagnano, Cesare  and
      Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.314",
    pages = "4586--4601",
    abstract = "In the field of sentiment analysis, several studies have highlighted that a single sentence may express multiple, sometimes contrasting, sentiments and emotions, each with its own experiencer, target and/or cause. To this end, over the past few years researchers have started to collect and annotate data manually, in order to investigate the capabilities of automatic systems not only to distinguish between emotions, but also to capture their semantic constituents. However, currently available gold datasets are heterogeneous in size, domain, format, splits, emotion categories and role labels, making comparisons across different works difficult and hampering progress in the area. In this paper, we tackle this issue and present a unified evaluation framework focused on Semantic Role Labeling for Emotions (SRL4E), in which we unify several datasets tagged with emotions and semantic roles by using a common labeling scheme. We use SRL4E as a benchmark to evaluate how modern pretrained language models perform and analyze where we currently stand in this task, hoping to provide the tools to facilitate studies in this complex area.",
}

ACL 2022

S. Conia, R. Navigli

Probing for Predicate Argument Structures in Pretrained Language Models

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{conia-navigli-2022-probing,
    title = "Probing for Predicate Argument Structures in Pretrained Language Models",
    author = "Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.316",
    pages = "4622--4632",
    abstract = "Thanks to the effectiveness and wide availability of modern pretrained language models (PLMs), recently proposed approaches have achieved remarkable results in dependency- and span-based, multilingual and cross-lingual Semantic Role Labeling (SRL). These results have prompted researchers to investigate the inner workings of modern PLMs with the aim of understanding how, where, and to what extent they encode information about SRL. In this paper, we follow this line of research and probe for predicate argument structures in PLMs. Our study shows that PLMs do encode semantic structures directly into the contextualized representation of a predicate, and also provides insights into the correlation between predicate senses and their structures, the degree of transferability between nominal and verbal structures, and how such structures are encoded across languages. Finally, we look at the practical implications of such insights and demonstrate the benefits of embedding predicate argument structure information into an SRL model.",
}

ACL 2022

A. C. Martinez Lorenzo, M. Maru, R. Navigli

Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{martinez-lorenzo-etal-2022-fully,
    title = "{F}ully-{S}emantic {P}arsing and {G}eneration: the {B}abel{N}et {M}eaning {R}epresentation",
    author = "Mart{\'\i}nez Lorenzo, Abelardo Carlos  and
      Maru, Marco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.121",
    pages = "1727--1741",
    abstract = "A language-independent representation of meaning is one of the most coveted dreams in Natural Language Understanding. With this goal in mind, several formalisms have been proposed as frameworks for meaning representation in Semantic Parsing. And yet, the dependencies these formalisms share with respect to language-specific repositories of knowledge make the objective of closing the gap between high- and low-resourced languages hard to accomplish. In this paper, we present the BabelNet Meaning Representation (BMR), an interlingual formalism that abstracts away from language-specific constraints by taking advantage of the multilingual semantic resources of BabelNet and VerbAtlas. We describe the rationale behind the creation of BMR and put forward BMR 1.0, a dataset labeled entirely according to the new formalism. Moreover, we show how BMR is able to outperform previous formalisms thanks to its fully-semantic framing, which enables top-notch multilingual parsing and generation. We release the code at https://github.com/SapienzaNLP/bmr.",
}

ACL 2022

M. Maru, S. Conia, M. Bevilacqua, R. Navigli

Nibbling at the Hard Core of Word Sense Disambiguation

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{maru-etal-2022-nibbling,
    title = "{N}ibbling at the Hard Core of {W}ord {S}ense {D}isambiguation",
    author = "Maru, Marco  and
      Conia, Simone  and
      Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.324",
    pages = "4724--4737",
    abstract = "With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks.In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.",
}

ACL 2022

E. Barba, L. Procopio, R. Navigli

ExtEnD: Extractive Entity Disambiguation

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

ACL 2022

Abstract

BibTex

PDF

@inproceedings{barba-etal-2022-extend,
    title = "{E}xt{E}n{D}: Extractive Entity Disambiguation",
    author = "Barba, Edoardo  and
      Procopio, Luigi  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.177",
    pages = "2478--2488",
    abstract = "Local models for Entity Disambiguation (ED) have today become extremely powerful, in most part thanks to the advent of large pre-trained language models. However, despite their significant performance achievements, most of these approaches frame ED through classification formulations that have intrinsic limitations, both computationally and from a modeling perspective. In contrast with this trend, here we propose ExtEnD, a novel local formulation for ED where we frame this task as a text extraction problem, and present two Transformer-based architectures that implement it. Based on experiments in and out of domain, and training over two different data regimes, we find our approach surpasses all its competitors in terms of both data efficiency and raw performance. ExtEnD outperforms its alternatives by as few as 6 F1 points on the more constrained of the two data regimes and, when moving to the other higher-resourced regime, sets a new state of the art on 4 out of 4 benchmarks under consideration, with average improvements of 0.7 F1 points overall and 1.1 F1 points out of domain. In addition, to gain better insights from our results, we also perform a fine-grained evaluation of our performances on different classes of label frequency, along with an ablation study of our architectural choices and an error analysis. We release our code and models for research purposes at https://github.com/SapienzaNLP/extend.",
}

EMNLP 2022

S. Conia,E. Barba, A. Scirè, R. Navigli

Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures

Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

EMNLP 2022

Abstract

BibTex

PDF
BibTex
```
@misc{https://doi.org/10.48550/arxiv.2212.01094,
  doi = {10.48550/ARXIV.2212.01094},
  
  url = {https://arxiv.org/abs/2212.01094},
  
  author = {Conia, Simone and Barba, Edoardo and Scirè, Alessandro and Navigli, Roberto},
  
  keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}
                
```

EMNLP 2022

S. Perrella, L. Proietti, A. Scirè, N. Campolungo, R. Navigli

MATESE: Machine Translation Evaluation as a Sequence Tagging Problem

Proceedings of the Seventh Conference on Machine Translation (WMT 2022)

EMNLP 2022

Abstract

BibTex

PDF

@InProceedings{perrella-EtAl:2022:WMT,
  author    = {Perrella, Stefano  and  Proietti, Lorenzo  and  ScirÃ¨, Alessandro  and  Campolungo, NiccolÃ²  and  Navigli, Roberto},
  title     = {MaTESe: Machine Translation Evaluation as a Sequence Tagging Problem},
  booktitle      = {Proceedings of the Seventh Conference on Machine Translation},
  month          = {December},
  year           = {2022},
  address        = {Abu Dhabi},
  publisher      = {Association for Computational Linguistics},
  pages     = {569--577},
  abstract  = {Starting from last year, WMT human evaluation has been performed within the Multidimensional Quality Metrics (MQM) framework, where human annotators are asked to identify error spans in translations, alongside an error category and a severity. In this paper, we describe our submission to the WMT 2022 Metrics Shared Task, where we propose using the same paradigm for automatic evaluation: we present the MaTESe metrics, which reframe machine translation evaluation as a sequence tagging problem. Our submission also includes a reference-free metric, denominated MaTESe-QE. Despite the paucity of the openly available MQM data, our metrics obtain promising results, showing high levels of correlation with human judgements, while also enabling an evaluation that is interpretable. Moreover, MaTESe-QE can also be employed in settings where it is infeasible to curate reference translations manually.},
  url       = {https://aclanthology.org/2022.wmt-1.51}
}

EMNLP 2022

S. S. Keh, R. K. Bharadwaj, E. Liu, S. Tedeschi, V. Gangal, R. Navigli

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Third Workshop on Figurative Language (EMNLP 2022)

EMNLP 2022

Abstract

BibTex

PDF
BibTex
```
@unknown{unknown,
author = {Keh, Sedrick and Bharadwaj, Rohit and Liu, Emmy and Tedeschi, Simone and Gangal, Varun and Navigli, Roberto},
year = {2022},
month = {10},
pages = {},
title = {EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation},
doi = {10.48550/arXiv.2210.12846}
}
                
```

NAACL 2022

N. Campolungo, T. Pasini, D. Emelin, R. Navigli

Reducing Disambiguation Biases in NMT by Leveraging Explicit Word Sense Information

Proc. of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2022

Abstract

BibTex

PDF

@inproceedings{campolungo-etal-2022-reducing,
    title = "Reducing Disambiguation Biases in {NMT} by Leveraging Explicit Word Sense Information",
    author = "Campolungo, Niccol{\`o}  and
      Pasini, Tommaso  and
      Emelin, Denis  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.355",
    pages = "4824--4838",
    abstract = "Recent studies have shed some light on a common pitfall of Neural Machine Translation (NMT) models, stemming from their struggle to disambiguate polysemous words without lapsing into their most frequently occurring senses in the training corpus.In this paper, we first provide a novel approach for automatically creating high-precision sense-annotated parallel corpora, and then put forward a specifically tailored fine-tuning strategy for exploiting these sense annotations during training without introducing any additional requirement at inference time.The use of explicit senses proved to be beneficial to reduce the disambiguation bias of a baseline NMT model, while, at the same time, leading our system to attain higher BLEU scores than its vanilla counterpart in 3 language pairs.",
}

NAACL 2022

S. Tedeschi, F. Martelli, R. Navigli

ID10M: Idiom Identification in 10 Languages

Proc. of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2022

Abstract

BibTex

PDF

@inproceedings{tedeschi-etal-2022-id10m,
    title = "{ID}10{M}: Idiom Identification in 10 Languages",
    author = "Tedeschi, Simone  and
      Martelli, Federico  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.208",
    pages = "2715--2726",
    abstract = "Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components.Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M.",
}

NAACL 2022

S. Tedeschi, R. Navigli

MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)

Proc. of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2022

Abstract

BibTex

PDF

@inproceedings{tedeschi-navigli-2022-multinerd,
    title = "{M}ulti{NERD}: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)",
    author = "Tedeschi, Simone  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.60",
    pages = "801--812",
    abstract = "Named Entity Recognition (NER) is the task of identifying named entities in texts and classifying them through specific semantic categories, a process which is crucial for a wide range of NLP applications. Current datasets for NER focus mainly on coarse-grained entity types, tend to consider a single textual genre and to cover a narrow set of languages, thus limiting the general applicability of NER systems.In this work, we design a new methodology for automatically producing NER annotations, and address the aforementioned limitations by introducing a novel dataset that covers 10 languages, 15 NER categories and 2 textual genres.We also introduce a manually-annotated test set, and extensively evaluate the quality of our novel dataset on both this new test set and standard benchmarks for NER.In addition, in our dataset, we include: i) disambiguation information to enable the development of multilingual entity linking systems, and ii) image URLs to encourage the creation of multimodal systems.We release our dataset at https://github.com/Babelscape/multinerd.",
}

AAAI 2021

T. Pasini, A. Raganato, R. Navigli

XL-WSD: An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation

Proc. of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

AAAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{pasini-etal-xl-wsd-2021,
  title={ {XL-WSD}: An Extra-Large and Cross-Lingual Evaluation Framework for Word Sense Disambiguation.},
  author={Pasini, Tommaso and Raganato, Alessandro and Navigli, Roberto},
  booktitle={Proc. of AAAI},
  year={2021}
}
                
```
AAAI 2021

M. Bevilacqua, R. Blloshmi, R. Navigli

One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline

Proc. of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

AAAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{bevilacqua-etal-2021-spring,
  title={One {SPRING} to Rule Them Both: {S}ymmetric {AMR} Semantic Parsing and Generation without a Complex Pipeline},
  author={Bevilacqua, Michele and Blloshmi, Rexhina and Navigli, Roberto},
  booktitle={Proc. of AAAI},
  year={2021}
}
                
```
EACL 2021

S. Conia, R. Navigli

Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration

Proc. of the 16th Conference of the European Chapter of the Association for Computational Linguistics

EACL 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-navigli-2021-multilabel-wsd,
    title = "Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration",
    author = "Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.eacl-main.286",
    pages = "3269--3275",
}
                
```
EACL 2021

P. Huguet Cabot, D. Abadi, A. Fischer, E. Shutova

Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions

Proc. of the 16th Conference of the European Chapter of the Association for Computational Linguistics

EACL 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{huguet-cabot-etal-2021-us,
    title = "Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions",
    author = "Huguet Cabot, Pere-Llu{\'\i}s  and
      Abadi, David  and
      Fischer, Agneta  and
      Shutova, Ekaterina",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.eacl-main.165",
    pages = "1921--1945",
}
                
```
EMNLP 2021

A. El Sheikh, M. Bevilacqua, R. Navigli

Integrating Personalized PageRank into Neural Word Sense Disambiguation

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{el-sheikh-etal-2021-integrating,
    title = "Integrating Personalized {P}age{R}ank into Neural Word Sense Disambiguation",
    author = "El Sheikh, Ahmed  and
      Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.715",
    pages = "9092--9098",
}
                
```
EMNLP 2021

C. Lacerra, R. Tripodi, R. Navigli

GeneSis: A Generative Approach to Substitutes in Context

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{lacerra-etal-2021-genesis,
    title = "{G}ene{S}is: {A} {G}enerative {A}pproach to {S}ubstitutes in {C}ontext",
    author = "Lacerra, Caterina  and
      Tripodi, Rocco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.844",
    pages = "10810--10823",
}
                
```
EMNLP 2021

R. Blloshmi, T. Pasini, N. Campolungo, S. Banerjee, R. Navigli, G. Pasi

IR like a SIR: Sense-enhanced Information Retrieval for Multiple Languages

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{blloshmi-etal-2021-ir,
    title = "{IR} like a {SIR}: {S}ense-enhanced {I}nformation {R}etrieval for {M}ultiple {L}anguages",
    author = "Blloshmi, Rexhina  and
      Pasini, Tommaso  and
      Campolungo, Niccol{\`o}  and
      Banerjee, Somnath  and
      Navigli, Roberto  and
      Pasi, Gabriella",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.79",
    pages = "1030--1041",
}
                
```
EMNLP 2021

E. Barba, L. Procopio, R. Navigli

ConSeC: Word Sense Disambiguation as Continuous Sense Comprehension

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{barba-etal-2021-consec,
    title = "{C}on{S}e{C}: Word Sense Disambiguation as Continuous Sense Comprehension",
    author = "Barba, Edoardo  and
      Procopio, Luigi  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.112",
    pages = "1492--1503",
}
                
```
EMNLP 2021

R. Tripodi, S. Conia, R. Navigli

UniteD-SRL: A Unified Dataset for Span- and Dependency-Based Multilingual and Cross-Lingual Semantic Role Labeling

Findings of the Association for Computational Linguistics: EMNLP 2021

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{tripodi-etal-2021-united-srl,
    title = "{UniteD-SRL}: {A} Unified Dataset for Span- and Dependency-Based Multilingual and Cross-Lingual {S}emantic {R}ole {L}abeling",
    author = "Tripodi, Rocco  and
      Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.197",
    pages = "2293--2305"
}
                
```
EMNLP 2021

P. Huguet Cabot, R. Navigli

REBEL: Relation Extraction By End-to-end Language generation

Findings of the Association for Computational Linguistics: EMNLP 2021

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{huguet-cabot-navigli-2021-rebel-relation,
    title = "{REBEL}: Relation Extraction By End-to-end Language generation",
    author = "Huguet Cabot, Pere-Llu{\'\i}s  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.204",
    pages = "2370--2381",
}
                
```
EMNLP 2021

S. Tedeschi, V. Maiorca, N. Campolungo, F. Cecconi, R. Navigli

WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER

Findings of the Association for Computational Linguistics: EMNLP 2021

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{tedeschi-etal-2021-wikineural-combined,
    title = "{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}",
    author = "Tedeschi, Simone  and
      Maiorca, Valentino  and
      Campolungo, Niccol{\`o}  and
      Cecconi, Francesco  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.215",
    pages = "2521--2533",
}
                
```
EMNLP 2021

S. Tedeschi, S. Conia, F. Cecconi, R. Navigli

Named Entity Recognition for Entity Linking: What Works and What's Next

Findings of the Association for Computational Linguistics: EMNLP 2021

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{tedeschi-etal-2021-named-entity,
    title = "{N}amed {E}ntity {R}ecognition for {E}ntity {L}inking: {W}hat Works and What{'}s Next",
    author = "Tedeschi, Simone  and
      Conia, Simone  and
      Cecconi, Francesco  and
      Navigli, Roberto",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.220",
    pages = "2584--2596",
}
                
```
EMNLP 2021

R. Blloshmi, M. Bevilacqua, E. Fabiano, V. Caruso, R. Navigli

SPRING Goes Online: End-to-End AMR Parsing and Generation

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{blloshmi-etal-2021-spring,
    title = "{SPRING} {G}oes {O}nline: {E}nd-to-{E}nd {AMR} {P}arsing and {G}eneration",
    author = "Blloshmi, Rexhina  and
      Bevilacqua, Michele  and
      Fabiano, Edoardo  and
      Caruso, Valentina  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.16",
    pages = "134--142",
}
                
```
EMNLP 2021

R. Orlando, S. Conia, F. Brignone, F. Cecconi, R. Navigli

AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{orlando-etal-2021-amuse,
    title = "{AMuSE-WSD}: {A}n All-in-one Multilingual System for Easy {W}ord {S}ense {D}isambiguation",
    author = "Orlando, Riccardo  and
      Conia, Simone  and
      Brignone, Fabrizio  and
      Cecconi, Francesco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.34",
    pages = "298--307",
}
                
```
EMNLP 2021

S. Conia, R. Orlando, F. Brignone, F. Cecconi, R. Navigli

InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing

EMNLP 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-etal-2021-invero,
    title = "{InVeRo-XL}: {M}aking Cross-Lingual {S}emantic {R}ole {L}abeling Accessible with Intelligible Verbs and Roles",
    author = "Conia, Simone  and
      Orlando, Riccardo  and
      Brignone, Fabrizio  and
      Cecconi, Francesco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-demo.36",
    pages = "319--328",
}
                
```
IJCAI 2021

E. Barba, L. Procopio, C. Lacerra, T. Pasini, R. Navigli

Exemplification Modeling: Can You Give Me an Example, Please?

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{barba-etal-2021-exmod,
  title     = {Exemplification Modeling: Can You Give Me an Example, Please?},
  author    = {Barba, Edoardo and Procopio, Luigi and Lacerra, Caterina and Pasini, Tommaso and Navigli, Roberto},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {3779--3785},
  year      = {2021},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2021/520},
  url       = {https://doi.org/10.24963/ijcai.2021/520},
}
                
```
IJCAI 2021

R. Blloshmi, S. Conia, R. Tripodi, R. Navigli

Generating Senses and RoLes: An End-to-End Model for Dependency- and Span-based Semantic Role Labeling

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{blloshmi-etal-2021-gsrl,
  title     = {{G}enerating {S}enses and {R}o{L}es: An End-to-End Model for Dependency- and Span-based {S}emantic {R}ole {L}abeling},
  author    = {Blloshmi, Rexhina and Conia, Simone and Tripodi, Rocco and Navigli, Roberto},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {3786--3793},
  year      = {2021},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2021/521},
  url       = {https://doi.org/10.24963/ijcai.2021/521},
}
                
```
IJCAI 2021

C. Lacerra, T. Pasini, R. Tripodi, R. Navigli

ALaSca: an Automated approach for Large-Scale Lexical Substitution

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{lacerra-etal-2021-alasca,
  title     = {{ALaSca}: an Automated approach for Large-Scale Lexical Substitution},
  author    = {Lacerra, Caterina and Pasini, Tommaso and Tripodi, Rocco and Navigli, Roberto},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {3836--3842},
  year      = {2021},
  month     = {8},
  note      = {Main Track}
  doi       = {10.24963/ijcai.2021/528},
  url       = {https://doi.org/10.24963/ijcai.2021/528},
}
                
```
IJCAI 2021

L. Procopio, E. Barba, F. Martelli, R. Navigli

MultiMirror: Neural Cross-lingual Word Alignment for Multilingual Word Sense Disambiguation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{procopio-etal-2021-multimirror,
  title     = {{MultiMirror}: Neural Cross-lingual Word Alignment for Multilingual {W}ord {S}ense {D}isambiguation},
  author    = {Procopio, Luigi and Barba, Edoardo and Martelli, Federico and Navigli, Roberto},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {3915--3921},
  year      = {2021},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2021/539},
  url       = {https://doi.org/10.24963/ijcai.2021/539},
}
                
```
IJCAI 2021

M. Bevilacqua, T. Pasini, A. Raganato, R. Navigli

Recent Trends in Word Sense Disambiguation: A Survey

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{bevilacqua-etal-2021-wsd-survey,
  title     = {Recent Trends in {W}ord {S}ense {D}isambiguation: A Survey},
  author    = {Bevilacqua, Michele and Pasini, Tommaso and Raganato, Alessandro and Navigli, Roberto},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {4330--4338},
  year      = {2021},
  month     = {8},
  note      = {Survey Track},
  doi       = {10.24963/ijcai.2021/593},
  url       = {https://doi.org/10.24963/ijcai.2021/593},
}
                
```
IJCAI 2021

R. Navigli, M. Bevilacqua, S. Conia, D. Montagnini, F. Cecconi

Ten Years of BabelNet: A Survey

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

IJCAI 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{navigli-etal-2021-babelnet-survey,
  title     = {Ten Years of {BabelNet}: A Survey},
  author    = {Navigli, Roberto and Bevilacqua, Michele and Conia, Simone and Montagnini, Dario and Cecconi, Francesco},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {4559--4567},
  year      = {2021},
  month     = {8},
  note      = {Survey Track},
  doi       = {10.24963/ijcai.2021/620},
  url       = {https://doi.org/10.24963/ijcai.2021/620},
}
                
```
NAACL 2021

S. Conia, A. Bacciu, R. Navigli

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-etal-2021-unifying-srl,
    title = "Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources",
    author = "Conia, Simone  and
      Bacciu, Andrea  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.31",
    pages = "338--351",
}
                
```
NAACL 2021

L. Procopio, R. Tripodi, R. Navigli

SGL: Speaking the Graph Languages of Semantic Parsing via Multilingual Translation

Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{procopio-etal-2021-sgl,
    title = "{SGL}: Speaking the Graph Languages of Semantic Parsing via Multilingual Translation",
    author = "Procopio, Luigi  and
      Tripodi, Rocco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.30",
    pages = "325--337",
}
                
```
NAACL 2021

E. Barba, T. Pasini, R. Navigli

ESC: Redesigning WSD with Extractive Sense Comprehension

Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NAACL 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{barba-etal-2021-esc,
    title = "{ESC}: Redesigning {WSD} with Extractive Sense Comprehension",
    author = "Barba, Edoardo  and
      Pasini, Tommaso  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.371",
    pages = "4661--4672"}
                
```
SemEval 2021

F. Martelli, N. Kalach, G. Tola, R. Navigli

SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation

Proc. of the 15th International Workshop on Semantic Evaluation

SemEval 2021

Abstract

BibTex

PDF
BibTex
```
@inproceedings{martelli-etal-2021-mclwic, title = "{S}em{E}val-2021 {T}ask 2: {M}ultilingual and {C}ross-lingual {W}ord-in-{C}ontext {D}isambiguation ({MCL}-{W}i{C})", author= "Martelli, Federico and Kalach, Najla and Tola, Gabriele and Navigli, Roberto", booktitle="Proceedings of the Fifteenth Workshop on Semantic Evaluation (SemEval-2021)", year={2021} }
                
```
AAAI 2020

B. Scarlini, T. Pasini, R. Navigli

SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation

Proc. of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, USA, 7-12th February, 2020.

AAAI 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{scarlini2020sensembert,
    title={SENSEMBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation},
    author={Scarlini, Bianca and Pasini, Tommaso and Navigli, Roberto}
}
                
```
AAAI 2020

C. Lacerra, M. Bevilacqua, T. Pasini, R. Navigli

CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation

Proc. of the 34th AAAI Conference on Artificial Intelligence (AAAI 2020), New York, USA, 7-12th February, 2020.

AAAI 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{lacerra2020csi,
    title = {CSI: A coarse sense inventory for 85\% word sense disambiguation},
    author = {Lacerra, Caterina and Bevilacqua, Michele and Pasini, Tommaso and Navigli, Roberto},
    booktitle = {Proc. of AAAI},
    year = {2020}
}
                
```

ACL 2020

M. Bevilacqua, R. Navigli

Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

ACL 2020

Abstract

BibTex

PDF

@inproceedings{bevilacqua-navigli-2020-breaking,
    title = "Breaking Through the 80{\%} Glass Ceiling: {R}aising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information",
    author = "Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.255",
    pages = "2854--2864",
    abstract = "Neural architectures are the current state of the art in Word Sense Disambiguation (WSD). However, they make limited use of the vast amount of relational information encoded in Lexical Knowledge Bases (LKB). We present Enhanced WSD Integrating Synset Embeddings and Relations (EWISER), a neural supervised architecture that is able to tap into this wealth of knowledge by embedding information from the LKB graph within the neural architecture, and to exploit pretrained synset embeddings, enabling the network to predict synsets that are not in the training set. As a result, we set a new state of the art on almost all the evaluation settings considered, also breaking through, for the first time, the 80{\%} ceiling on the concatenation of all the standard all-words English WSD evaluation benchmarks. On multilingual all-words WSD, we report state-of-the-art results by training on nothing but English.",
}

ACL 2020

A. Calabrese, M. Bevilacqua, R. Navigli

Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concepts

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

ACL 2020

Abstract

BibTex

PDF

@inproceedings{calabrese-etal-2020-fatality,
    title = "Fatality Killed the Cat or: {B}abel{P}ic, a Multimodal Dataset for Non-Concrete Concepts",
    author = "Calabrese, Agostina  and
      Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.425",
    pages = "4680--4686",
    abstract = "Thanks to the wealth of high-quality annotated images available in popular repositories such as ImageNet, multimodal language-vision research is in full bloom. However, events, feelings and many other kinds of concepts which can be visually grounded are not well represented in current datasets. Nevertheless, we would expect a wide-coverage language understanding system to be able to classify images depicting recess and remorse, not just cats, dogs and bridges. We fill this gap by presenting BabelPic, a hand-labeled dataset built by cleaning the image-synset association found within the BabelNet Lexical Knowledge Base (LKB). BabelPic explicitly targets non-concrete concepts, thus providing refreshing new data for the community. We also show that pre-trained language-vision systems can be used to further expand the resource by exploiting natural language knowledge available in the LKB. BabelPic is available for download at http://babelpic.org.",
}

ACL 2020

T. Pasini, F. Scozzafava, B. Scarlini

CluBERT: A Cluster-Based Approach for Learning Sense Distributions in Multiple Languages

Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

ACL 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{pasini-etal-2020-clubert,
    title = "{C}lu{BERT}: {A} Cluster-Based Approach for Learning Sense Distributions in Multiple Languages",
    author = "Pasini, Tommaso  and
      Scozzafava, Federico  and
      Scarlini, Bianca",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.369",
    doi = "10.18653/v1/2020.acl-main.369",
    pages = "4008--4018"
}
                
```
ACL 2020

F. Scozzafava, M. Maru, F. Brignone, G. Torrisi, R. Navigli

Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation

Proc. of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations (ACL 2020)

ACL 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{scozzafava-etal-2020-personalized,
    title = "Personalized {P}age{R}ank with Syntagmatic Information for Multilingual Word Sense Disambiguation",
    author = "Scozzafava, Federico  and
      Maru, Marco  and
      Brignone, Fabrizio  and
      Torrisi, Giovanni  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.6",
    doi = "10.18653/v1/2020.acl-demos.6",
    pages = "37--46"
}
                
```

AI 2020

T. Pasini, R. Navigli

Train-O-Matic: Supervised Word Sense Disambiguation with no (manual) effort

Artificial Intelligence, 279, Elsevier, 2020.

AI 2020

Abstract

BibTex

PDF

@article{PASINI2020103215,
    title = "Train-O-Matic: Supervised Word Sense Disambiguation with no (manual) effort",
    journal = "Artificial Intelligence",
    volume = "279",
    pages = "103215",
    year = "2020",
    issn = "0004-3702",
    doi = "https://doi.org/10.1016/j.artint.2019.103215",
    url = "http://www.sciencedirect.com/science/article/pii/S0004370218307021",
    author = "Tommaso Pasini and Roberto Navigli",
    keywords = "Word Sense Disambiguation, Corpus Generation, Word Sense Distribution learning, Multilinguality",
    abstract = "Word Sense Disambiguation (WSD) is the task of associating the correct meaning with a word in a given context. WSD provides explicit semantic information that is beneficial to several downstream applications, such as question answering, semantic parsing and hypernym extraction. Unfortunately, WSD suffers from the well-known knowledge acquisition bottleneck problem: it is very expensive, in terms of both time and money, to acquire semantic annotations for a large number of sentences. To address this blocking issue we present Train-O-Matic, a knowledge-based and language-independent approach that is able to provide millions of training instances annotated automatically with word meanings. The approach is fully automatic, i.e., no human intervention is required, and the only type of human knowledge used is a task-independent WordNet-like resource. Moreover, as the sense distribution in the training set is pivotal to boosting the performance of WSD systems, we also present two unsupervised and language-independent methods that automatically induce a sense distribution when given a simple corpus of sentences. We show that, when the learned distributions are taken into account for generating the training sets, the performance of supervised methods is further enhanced. Experiments have proven that Train-O-Matic on its own, and also coupled with word sense distribution learning methods, lead a supervised system to achieve state-of-the-art performance consistently across gold standard datasets and languages. Importantly, we show how our sense distribution learning techniques aid Train-O-Matic to scale well over domains, without any extra human effort. To encourage future research, we release all the training sets in 5 different languages and the sense distributions for each domain of SemEval-13 and SemEval-15 at http://trainomatic.org."
}

COLING 2020

S. Conia, R. Navigli

Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations

Proc. of the 28th International Conference on Computational Linguistics (COLING 2020)

COLING 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-navigli-2020-conception,
    title = "Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations",
    author = "Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.291",
    pages = "3268--3284"
}
                
```
COLING 2020

S. Conia, R. Navigli

Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach

Proc. of the 28th International Conference on Computational Linguistics (COLING 2020)

COLING 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-navigli-2020-bridging,
    title = "Bridging the Gap in Multilingual Semantic Role Labeling: a Language-Agnostic Approach",
    author = "Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.120",
    pages = "1396--1410"
}
                
```
EMNLP 2020

B. Scarlini, T. Pasini, R. Navigli

With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation

Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), online, 16-20th November, 2020.

EMNLP 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{scarlini-etal-2020-ares,
	  title={{With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation}},
	  author={Scarlini, Bianca and Pasini, Tommaso and Navigli, Roberto},
	  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
	  publisher={Association for Computational Linguistics},
	  year={2020}
	}
                
```
EMNLP 2020

M. Bevilacqua, M. Maru, R. Navigli

Generationary or: “How We Went beyond Word Sense Inventories and Learned to Gloss”

Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP 2020), online, 16-20th November, 2020.

EMNLP 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{bevilacqua-etal-2020-generationary,
    title = "Generationary or: {``}How We Went beyond Word Sense Inventories and Learned to Gloss{''}",
    author = "Bevilacqua, Michele  and
      Maru, Marco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.585",
    pages = "7207--7221",
}
                
```
EMNLP 2020

R. Blloshmi, R. Tripodi, R. Navigli

XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques

Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), online, 16-20th November, 2020.

EMNLP 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{blloshmi-etal-2020-xl,
    title = "{XL}-{AMR}: Enabling Cross-Lingual {AMR} Parsing with Transfer Learning Techniques",
    author = "Blloshmi, Rexhina  and
      Tripodi, Rocco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.195",
    doi = "10.18653/v1/2020.emnlp-main.195",
    pages = "2487--2500"
}
                
```
EMNLP 2020

A. Raganato, T. Pasini, J. Camacho-Collados, M. T. Pilehvar

XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization

Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), online, 16-20th November, 2020.

EMNLP 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{raganato-etal-2020-xl,
    title = "{XL}-{W}i{C}: A Multilingual Benchmark for Evaluating Semantic Contextualization",
    author = "Raganato, Alessandro  and
      Pasini, Tommaso  and
      Camacho-Collados, Jose  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.584",
    doi = "10.18653/v1/2020.emnlp-main.584",
    pages = "7193--7206"
}
                
```
EMNLP 2020

S. Conia, F. Brignone, D. Zanfardino, R. Navigli

InVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2020), online, 16-20th November, 2020.

EMNLP 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{conia-etal-2020-invero,
    title = "{I}n{V}e{R}o: Making {S}emantic {R}ole {L}abeling Accessible with Intelligible Verbs and Roles",
    author = "Conia, Simone  and
      Brignone, Fabrizio  and
      Zanfardino, Davide  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2020)",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.11",
    doi = "10.18653/v1/2020.emnlp-demos.11",
    pages = "77--84"
}
                
```
IJCAI 2020

T. Pasini

The Knowledge Acquisition Bottleneck Problem in Multilingual Word Sense Disambiguation

Proc. of the 29th International Joint Conference on Artificial Intelligence (IJCAI-20)

IJCAI 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{ijcai2020-687,
  title     = {The Knowledge Acquisition Bottleneck Problem in Multilingual {W}ord {S}ense {D}isambiguation},
  author    = {Pasini, Tommaso},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere},	
  pages     = {4936--4942},
  year      = {2020},
  month     = {7},
  note      = {Survey track}
  doi       = {10.24963/ijcai.2020/687},
  url       = {https://doi.org/10.24963/ijcai.2020/687},
}
                
```
IJCAI 2020

E. Barba, L. Procopio, N. Campolungo, T. Pasini, R. Navigli

MuLaN: Multilingual Label propagatioN for Word Sense Disambiguation

Proc. of the 29th International Joint Conference on Artificial Intelligence (IJCAI-20)

IJCAI 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{ijcai2020-0531,
  title     = {{MuLaN}: {Mu}ltilingual {L}abel propagatio{N} for {W}ord {S}ense {D}isambiguation},
  author    = {Barba, Edoardo and Procopio, Luigi and Campolungo, Niccolò and Pasini, Tommaso and Navigli, Roberto},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere},	
  pages     = {3837--3844},
  year      = {2020},
  month     = {7},
  note      = {Main track}
  doi       = {10.24963/ijcai.2020/531},
  url       = {https://doi.org/10.24963/ijcai.2020/531},
}
                
```
IJCAI 2020

A. Calabrese, M. Bevilacqua, R. Navigli

EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings

Proc. of the 29th International Joint Conference on Artificial Intelligence (IJCAI-20)

IJCAI 2020

Abstract

BibTex

PDF
BibTex
```
@inproceedings{ijcai2020-67,
  title     = {{EViLBERT}: {L}earning Task-Agnostic Multimodal Sense Embeddings},
  author    = {Calabrese, Agostina and Bevilacqua, Michele and Navigli, Roberto},
  booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on
               Artificial Intelligence, {IJCAI-20}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},             
  editor    = {Christian Bessiere},	
  pages     = {481--487},
  year      = {2020},
  month     = {7},
  note      = {Main track}
  doi       = {10.24963/ijcai.2020/67},
  url       = {https://doi.org/10.24963/ijcai.2020/67},
}
                
```

ACL 2019

B. Scarlini, T. Pasini, R. Navigli

Just "OneSeC" for Producing Multilingual Sense-Annotated Data

Proc. of 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), Florence, Italy, July 28th-August 2nd, 2019, pp. 699-709.

ACL 2019

Abstract

BibTex

PDF

@inproceedings{scarlini-etal-2019-just,
    title = "Just {``}{O}ne{S}e{C}{''} for Producing Multilingual Sense-Annotated Data",
    author = "Scarlini, Bianca  and
      Pasini, Tommaso  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1069",
    doi = "10.18653/v1/P19-1069",
    pages = "699--709",
    abstract = "The well-known problem of knowledge acquisition is one of the biggest issues in Word Sense Disambiguation (WSD), where annotated data are still scarce in English and almost absent in other languages. In this paper we formulate the assumption of One Sense per Wikipedia Category and present OneSeC, a language-independent method for the automatic extraction of hundreds of thousands of sentences in which a target word is tagged with its meaning. Our automatically-generated data consistently lead a supervised WSD model to state-of-the-art performance when compared with other automatic and semi-automatic methods. Moreover, our approach outperforms its competitors on multilingual and domain-specific settings, where it beats the existing state of the art on all languages and most domains. All the training data are available for research purposes at http://trainomatic.org/onesec.",
}

ACL 2019

I. Iacobacci, R. Navigli

LSTMEmbed: Learning Word and Sense Representations from a Large Semantically Annotated Corpus with Long Short-Term Memories

Proceedings of 57th Annual Meeting of the Association for Computational Linguistics

ACL 2019

Abstract

BibTex

PDF
BibTex
```
@inproceedings{iacobacci2019lstmembed,
  title={Lstmembed: Learning word and sense representations from a large semantically annotated corpus with long short-term memories},
  author={Iacobacci, Ignacio and Navigli, Roberto},
  booktitle={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  pages={1685--1695},
  year={2019}
}
                
```

EMNLP 2019

R. Tripodi, R. Navigli

Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation

Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP 2019), Hong Kong, China, 3-7th November, 2019.

EMNLP 2019

Abstract

BibTex

PDF

@inproceedings{tripodi-navigli-2019-game,
    title = "Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation",
    author = "Tripodi, Rocco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1009",
    doi = "10.18653/v1/D19-1009",
    pages = "88--99",
    abstract = "Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize. The impact of the word and sense embedding representations in the framework has been tested and analyzed extensively: experiments on standard benchmarks show state-of-art performances and different tests hint at the usefulness of using disambiguation to obtain contextualized word representations.",
}

EMNLP 2019

A. Di Fabio, S. Conia, R. Navigli

VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling

EMNLP 2019

Abstract

BibTex

PDF

@inproceedings{di-fabio-etal-2019-verbatlas,
    title = "{V}erb{A}tlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling",
    author = "Di Fabio, Andrea  and
      Conia, Simone  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1058",
    doi = "10.18653/v1/D19-1058",
    pages = "627--637",
    abstract = "We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org.",
}

EMNLP 2019

M. Maru, F. Scozzafava, F. Martelli, R. Navigli

SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations

EMNLP 2019

Abstract

BibTex

PDF

@inproceedings{maru-etal-2019-syntagnet,
    title = "{S}yntag{N}et: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations",
    author = "Maru, Marco  and
      Scozzafava, Federico  and
      Martelli, Federico  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1359",
    doi = "10.18653/v1/D19-1359",
    pages = "3534--3540",
    abstract = "Current research in knowledge-based Word Sense Disambiguation (WSD) indicates that performances depend heavily on the Lexical Knowledge Base (LKB) employed. This paper introduces SyntagNet, a novel resource consisting of manually disambiguated lexical-semantic combinations. By capturing sense distinctions evoked by syntagmatic relations, SyntagNet enables knowledge-based WSD systems to establish a new state of the art which challenges the hitherto unrivaled performances attained by supervised approaches. To the best of our knowledge, SyntagNet is the first large-scale manually-curated resource of this kind made available to the community (at http://syntagnet.org).",
}

KBS 2019

R. Sinoara, J. Camacho-Collados, R. Rossi, R. Navigli, S. Rezende

Knowledge-enhanced document embeddings for text classification

Knowledge-Based Systems, 163, Elsevier, 2019, pp. 955-971.

KBS 2019

Abstract

BibTex

PDF
BibTex
```
@article{sinoara2019knowledge,
  title={Knowledge-enhanced document embeddings for text classification},
  author={Sinoara, Roberta A and Camacho-Collados, Jose and Rossi, Rafael G and Navigli, Roberto and Rezende, Solange O},
  journal={Knowledge-Based Systems},
  volume={163},
  pages={955--971},
  year={2019},
  publisher={Elsevier}
}
                
```
LREC 2019

J. Camacho-Collados, C. Delli Bovi, A. Raganato, R. Navigli

SenseDefs: a multilingual corpus of semantically annotated textual definitions - Exploiting multiple languages and resources jointly for high-quality Word Sense Disambiguation and Entity Linking

Language Resources and Evaluation

LREC 2019

Abstract

BibTex

PDF
BibTex
```
@article{camacho2019s,
  title={S ense D efs: a multilingual corpus of semantically annotated textual definitions},
  author={Camacho-Collados, Jose and Bovi, Claudio Delli and Raganato, Alessandro and Navigli, Roberto},
  journal={Language Resources and Evaluation},
  volume={53},
  number={2},
  pages={251--278},
  year={2019},
  publisher={Springer}
}
                
```
NLE 2019

R. Navigli, F. Martelli

An overview of word and sense similarity

Natural Language Engineering

NLE 2019

Abstract

BibTex

PDF
BibTex
```
@article{navigli2019overview,
  title={An overview of word and sense similarity},
  author={Navigli, Roberto and Martelli, Federico},
  journal={Natural Language Engineering},
  volume={25},
  number={6},
  pages={693--714},
  year={2019},
  publisher={Cambridge University Press}
}
                
```

RANLP 2019

M. Bevilacqua, R. Navigli

Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation

Proc. of the 2019 conference on Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, September 2-4th, 2019, pp. 122-131.

RANLP 2019

Abstract

BibTex

PDF

@inproceedings{bevilacqua-navigli-2019-quasi,
    title = "Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation",
    author = "Bevilacqua, Michele  and
      Navigli, Roberto",
    booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)",
    month = sep,
    year = "2019",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd.",
    url = "https://www.aclweb.org/anthology/R19-1015",
    doi = "10.26615/978-954-452-056-4_015",
    pages = "122--131",
    abstract = "While contextualized embeddings have produced performance breakthroughs in many Natural Language Processing (NLP) tasks, Word Sense Disambiguation (WSD) has not benefited from them yet. In this paper, we introduce QBERT, a Transformer-based architecture for contextualized embeddings which makes use of a co-attentive layer to produce more deeply bidirectional representations, better-fitting for the WSD task. As a result, we are able to train a WSD system that beats the state of the art on the concatenation of all evaluation datasets by over 3 points, also outperforming a comparable model using ELMo.",
}

AAAI 2018

T. Pasini, R. Navigli

Two Knowledge-based Methods for High-Performance Sense Distribution Learning

Proceedings of the 2018 Conference of the Association for the Advancement of Artificial Intelligence

AAAI 2018

Abstract

BibTex

PDF
BibTex
```
@InProceedings{PasiniNavigli:2018,
  author = {Pasini, Tommaso and Navigli, Roberto},
  title = {Two Knowledge-based Methods for High-Performance Sense Distribution Learning},
  booktitle = {Proc. of the 32th {AAAI} {C}onference on {A}rtificial {I}ntelligence},
  year = {2018},
  address = {New Orleans, {USA}},
}
                
```
IJCAI 2018

R. Navigli

Natural Language Understanding: Instructions for (Present and Future) Use

Proc. of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, 13-19 July, 2018, pp. 5697-5702.

IJCAI 2018

Abstract

BibTex

PDF
BibTex
```
@inproceedings{navigli2018natural,
  title={Natural Language Understanding: Instructions for (Present and Future) Use.},
  author={Navigli, Roberto},
  booktitle={IJCAI},
  pages={5697--5702},
  year={2018}
}
                
```
LREC 2018

T. Pasini, F. M. Elia, R. Navigli

Huge Automatically Extracted Training-Sets for Multilingual Word Sense Disambiguation.

Proceedings of the Language Resources and Evaluation Conference

LREC 2018

Abstract

BibTex

PDF
BibTex
```
@inproceedings{pasini-etal-2018-huge,
    title = "Huge Automatically Extracted Training-Sets for Multilingual Word {S}ense{D}isambiguation",
    author = "Pasini, Tommaso  and
      Elia, Francesco  and
      Navigli, Roberto",
    booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
    month = may,
    year = "2018",
    address = "Miyazaki, Japan",
    publisher = "European Language Resources Association (ELRA)",
    url = "https://www.aclweb.org/anthology/L18-1268",
}
                
```

SemEval 2018

J. Camacho-Collados, C. Delli Bovi, L. Espinosa-Anke, S. Oramas, T. Pasini, E. Santus, V. Shwartz, R. Navigli, H. Saggion

SemEval-2018 Task 9: Hypernym Discovery

Proc. of the 12th International Workshop on Semantic Evaluation

SemEval 2018

Abstract

BibTex

PDF

@inproceedings{camacho-collados-etal-2018-semeval,
    title = "{S}em{E}val-2018 Task 9: Hypernym Discovery",
    author = "Camacho-Collados, Jose  and
      Delli Bovi, Claudio  and
      Espinosa-Anke, Luis  and
      Oramas, Sergio  and
      Pasini, Tommaso  and
      Santus, Enrico  and
      Shwartz, Vered  and
      Navigli, Roberto  and
      Saggion, Horacio",
    booktitle = "Proceedings of The 12th International Workshop on Semantic Evaluation",
    month = jun,
    year = "2018",
    address = "New Orleans, Louisiana",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/S18-1115",
    doi = "10.18653/v1/S18-1115",
    pages = "712--724",
    abstract = "This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows: given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at \url{https://competitions.codalab.org/competitions/17119}.",
}

WWW 2018

V. Basile, R. Navigli

From MultiJEDI to MOUSSE: Two ERC Projects for Innovating Multilingual Disambiguation and Semantic Parsing of Text

Proceedings of the Web Conference 2018

WWW 2018

Abstract

BibTex

PDF
BibTex
```
@inproceedings{BasileNavigli:18,
  title = {From MultiJEDI to MOUSSE: Two ERC Projects for Innovating Multilingual Disambiguation and Semantic Parsing of Text},
  author = {Basile, Valerio and Navigli, Roberto},
  booktitle = {Proc. of The Web Conference 2018},
  address = {Lyon, France},
  year = {2018},
}
                
```

Publications

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

BibTex

FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction

BibTex

Word Sense Linking: Disambiguating Outside the Sandbox

BibTex

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

BibTex

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

BibTex

Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends

BibTex

Mitigating Data Scarcity in Semantic Parsing across Languages with the Multilingual Semantic Layer and its Dataset

BibTex

NounAtlas: Filling the Gap in Nominal Semantic Role Labeling

BibTex

CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts

BibTex

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics

BibTex

ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering

BibTex

Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs

BibTex

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin

BibTex

Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models

BibTex

Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin

BibTex

CNER: Concept and Named Entity Recognition

BibTex

MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus

BibTex

LexicoMatic: Automatic Creation of Multilingual Lexical-Semantic Dictionaries

BibTex

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization

BibTex

DMLM: Descriptive Masked Language Modeling

BibTex

AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing

BibTex

Incorporating Graph Information in Transformer-based AMR Parsing

BibTex

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities

BibTex

What’s the Meaning of Superhuman Performance in Today’s NLU?

BibTex

REDFM: a Filtered and Multilingual Relation Extraction Dataset

BibTex

Cross-lingual AMR Aligner: Paying Attention to Cross-Attention

BibTex

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

BibTex

Entity Disambiguation with Entity Definitions

BibTex

Code-Switching with Word Senses for Pretraining in Neural Machine Translation

BibTex

Visual Definition Modeling: Challenging Vision & Language Models to Define Words and Objects

BibTex

STEPS: Semantic Typing of Event Processes with a Sequence-to-Sequence Approach

BibTex

BabelNet Meaning Representation: A Fully Semantic Formalism to Overcome Language Barriers

BibTex

DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation

BibTex

SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework

BibTex

Probing for Predicate Argument Structures in Pretrained Language Models

BibTex

Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation

BibTex

Nibbling at the Hard Core of Word Sense Disambiguation

BibTex

ExtEnD: Extractive Entity Disambiguation

BibTex

Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate-Argument Structures

BibTex

MATESE: Machine Translation Evaluation as a Sequence Tagging Problem