MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification

By Chadi Helwe, Tom Calamai, Pierre-Henri Paris, Chloé Clavel, and Fabian M. Suchanek.


  • We highlight the critical importance of detecting fallacies in the context of disinformation, fake news, and propaganda.
  • Our study tackles the challenge of subjectivity in fallacy detection by introducing a novel taxonomy, an annotation scheme tailored for subjective NLP tasks, and a new evaluation method designed to handle subjectivity.
  • We introduce MAFALDA (Multi-level Annotated FALlacy DAtaset), a comprehensive dataset compiled from existing fallacy datasets, and evaluate several language models for their ability to detect and classify fallacies.

Taxonomy of Fallacies

  • We propose a new taxonomy that consolidates and unifies existing collections of fallacies from related works.
  • Our taxonomy is structured into three levels:
    • Level 0 focuses on labeling text as fallacious or not.
    • Level 1 categorizes fallacies into broad groups: Appeal to Emotion, Fallacies of Logic, and Fallacies of Credibility.
    • Level 2 identifies 23 specific categories of fallacies, including straw man, appeal to fear, etc.

Annotation Scheme and Evaluation Metric

  • Our approach introduces an innovative annotation scheme that allows multiple correct annotations for the same text. We also develop a new evaluation metric that recognizes these alternative annotations.
  • This scheme and metric are particularly suited for the subjective nature of fallacy detection and classification.

Experiments and Results

  • We conduct experiments to evaluate the performance of current state-of-the-art language models on our MAFALDA benchmark.
  • These experiments benchmark the models' performance and provide insights into their strengths and weaknesses in handling fallacious reasoning.

Significance of Our Study

  • Our study addresses the long-standing issue of subjectivity in fallacy annotation, a significant hurdle in previous research.
  • The methodology used in MAFALDA allows for a nuanced analysis of model performance across different levels of granularity and a variety of fallacy types.

This summary is prepared based on the MAFALDA academic paper. For detailed insights and comprehensive understanding, we recommend referring to our full paper.