Given the rise of widespread misinformation, detection must balance high accuracy with safety alignment. We argue that accuracy, in isolation, fails to meet the stringent demands of real-world deployment.
Accuracy Achieved
Evaluated on the WELFake dataset A large-scale dataset of 72,131 news articles used for training robust detection models. using hybrid architectures like BERT-CNN BERT-CNN: A powerful model that combines BERT (for understanding language context) with CNNs (for spotting key patterns). and CNN-LSTM CNN-LSTM: A hybrid model that excels at analyzing both specific keywords and the flow of sentences over time. .
Explainability
We moved beyond "black boxes" by integrating SHAP SHapley Additive exPlanations: A method that highlights exactly which words or phrases caused the AI to flag an article as fake. to make the AI's reasoning more transparent
Comprehensive Testing
We compared multiple Machine Learning Classical approaches like Random Forest and SVM that are highly interpretable and fast. and Deep Learning Complex CNN and BERT-based models that offer higher accuracy but are harder to explain. models to find the best balance between raw power and safety.
The rise of large-scale misinformation, amplified by LLM-generated content, demands detection systems that are both accurate and safety-aligned. This work evaluates classical, ensemble, and Deep Learning models on the WELFake dataset and shows that hybrid CNN-LSTM and BERT-based architectures surpass 98% accuracy, outperforming traditional baselines.
To ensure transparency and auditability, we integrate SHAP for unified, model-agnostic explanations that reveal key lexical and contextual drivers of predictions. Framed within an AI Safety perspective, the study highlights challenges such as distribution shift, adversarial manipulation, bias, and uncertainty calibration, arguing that accuracy alone is insufficient for real-world deployment.
The results provide a robust, interpretable, and safety-centric blueprint for trustworthy misinformation detection systems.
Access our full research findings, methodology, and models' performance.