MAFALDA is a new benchmark for fallacy classification that consolidates previous datasets into a unified taxonomy. The benchmark includes manual annotations with explanations, a specialized annotation scheme for subjective NLP tasks, and a novel evaluation method to handle subjectivity. We assessed both language models and human performance on MAFALDA to evaluate their fallacy detection and classification capabilities.