By Chadi Helwe, Tom Calamai, Pierre-Henri Paris, Chloé Clavel, and Fabian M. Suchanek.
- We highlight the critical importance of detecting fallacies in the context of disinformation, fake news, and propaganda.
- Our study tackles the challenge of subjectivity in fallacy detection by introducing a novel taxonomy, an annotation scheme tailored for subjective NLP tasks, and a new evaluation method designed to handle subjectivity.
- We introduce MAFALDA (Multi-level Annotated FALlacy DAtaset), a comprehensive dataset compiled from existing fallacy datasets, and evaluate several language models for their ability to detect and classify fallacies.
- We propose a new taxonomy that consolidates and unifies existing collections of fallacies from related works.
- Our taxonomy is structured into three levels:
- Level 0 focuses on labeling text as fallacious or not.
- Level 1 categorizes fallacies into broad groups: Appeal to Emotion, Fallacies of Logic, and Fallacies of Credibility.
- Level 2 identifies 23 specific categories of fallacies, including straw man, appeal to fear, etc.
- Our approach introduces an innovative annotation scheme that allows multiple correct annotations for the same text. We also develop a new evaluation metric that recognizes these alternative annotations.
- This scheme and metric are particularly suited for the subjective nature of fallacy detection and classification.
- We conduct experiments to evaluate the performance of current state-of-the-art language models on our MAFALDA benchmark.
- These experiments benchmark the models' performance and provide insights into their strengths and weaknesses in handling fallacious reasoning.
- Our study addresses the long-standing issue of subjectivity in fallacy annotation, a significant hurdle in previous research.
- The methodology used in MAFALDA allows for a nuanced analysis of model performance across different levels of granularity and a variety of fallacy types.
This summary is prepared based on the MAFALDA academic paper. For detailed insights and comprehensive understanding, we recommend referring to our full paper.