✨ TL;DR
This paper provides a comprehensive structured analysis of adversarial robustness literature and introduces Auto-ART, an open-source framework that standardizes robustness evaluation across 50+ attacks, detects gradient masking, and maps compliance to regulatory frameworks. It addresses fragmentation in the field by combining meta-scientific synthesis with practical engineering tools.
Adversarial robustness evaluation is critical for trustworthy ML deployment, yet the field suffers from fragmented evaluation protocols, inconsistent methodologies, and undetected gradient masking—a phenomenon where defenses appear robust only because gradients are obscured rather than genuinely robust. This fragmentation makes it difficult to compare results across papers and identify which defenses are truly effective versus misleadingly robust.
The paper employs two complementary strategies. First, it conducts a structured literature synthesis analyzing nine peer-reviewed corpus sources from 2020-2026 through seven complementary analysis protocols to identify consensus and unresolved challenges in the field. Second, it introduces Auto-ART, an open-source framework that operationalizes identified gaps by providing 50+ attacks, 28 defense modules, a Robustness Diagnostic Index (RDI) for efficient ranking, and automated gradient-masking detection. The framework supports multi-norm evaluation (l1/l2/linf/semantic/spatial) and maps results to regulatory standards including NIST AI RMF, OWASP LLM Top 10, and EU AI Act.
What the paper shows.
Empirical validation on RobustBench demonstrates that Auto-ART's gradient-masking pre-screening achieves 92% accuracy in identifying flagged cases. The Robustness Diagnostic Index shows high correlation with full AutoAttack evaluation, providing an efficient alternative for preliminary assessment. Multi-norm evaluation across state-of-the-art models reveals a 23.5 percentage point gap between average-case and worst-case robustness, highlighting the importance of comprehensive threat model coverage. The framework successfully integrates 50+ attacks and 28 defense modules while maintaining compliance mapping to major regulatory frameworks.
The paper does not explicitly discuss limitations, though several are implicit: the structured synthesis covers only 2020-2026 literature, potentially missing earlier foundational work; the framework's effectiveness depends on the quality and representativeness of the 50+ attacks and 28 defense modules included; gradient-masking detection, while achieving 92% accuracy, is not perfect; and the correlation between RDI and full AutoAttack, while high, suggests some loss of information in the diagnostic index approach. The generalizability of findings across different model architectures, domains, and threat models beyond those evaluated on RobustBench is not thoroughly addressed.
✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.