Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing

Abhijit Talluri

✨ TL;DR

This paper provides a comprehensive structured analysis of adversarial robustness literature and introduces Auto-ART, an open-source framework that standardizes robustness evaluation across 50+ attacks, detects gradient masking, and maps compliance to regulatory frameworks. It addresses fragmentation in the field by combining meta-scientific synthesis with practical engineering tools.

01 · Problem

Adversarial robustness evaluation is critical for trustworthy ML deployment, yet the field suffers from fragmented evaluation protocols, inconsistent methodologies, and undetected gradient masking—a phenomenon where defenses appear robust only because gradients are obscured rather than genuinely robust. This fragmentation makes it difficult to compare results across papers and identify which defenses are truly effective versus misleadingly robust.

02 · Approach

The paper employs two complementary strategies. First, it conducts a structured literature synthesis analyzing nine peer-reviewed corpus sources from 2020-2026 through seven complementary analysis protocols to identify consensus and unresolved challenges in the field. Second, it introduces Auto-ART, an open-source framework that operationalizes identified gaps by providing 50+ attacks, 28 defense modules, a Robustness Diagnostic Index (RDI) for efficient ranking, and automated gradient-masking detection. The framework supports multi-norm evaluation (l1/l2/linf/semantic/spatial) and maps results to regulatory standards including NIST AI RMF, OWASP LLM Top 10, and EU AI Act.

03 · Key insights

What the paper shows.

01Gradient masking is prevalent and detectable: Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases, indicating the technique is both common and reliably detectable

02Multi-norm evaluation reveals significant robustness gaps: Evaluating across multiple threat models exposes a 23.5 percentage point gap between average and worst-case robustness on state-of-the-art models

03RDI provides efficient ranking: The Robustness Diagnostic Index correlates highly with full AutoAttack results, enabling faster preliminary robustness assessment

04Regulatory compliance integration is essential: Mapping adversarial robustness evaluation to NIST, OWASP, and EU AI Act standards bridges the gap between academic research and real-world deployment requirements

04 · Results

Empirical validation on RobustBench demonstrates that Auto-ART's gradient-masking pre-screening achieves 92% accuracy in identifying flagged cases. The Robustness Diagnostic Index shows high correlation with full AutoAttack evaluation, providing an efficient alternative for preliminary assessment. Multi-norm evaluation across state-of-the-art models reveals a 23.5 percentage point gap between average-case and worst-case robustness, highlighting the importance of comprehensive threat model coverage. The framework successfully integrates 50+ attacks and 28 defense modules while maintaining compliance mapping to major regulatory frameworks.

05 · Limitations

The paper does not explicitly discuss limitations, though several are implicit: the structured synthesis covers only 2020-2026 literature, potentially missing earlier foundational work; the framework's effectiveness depends on the quality and representativeness of the 50+ attacks and 28 defense modules included; gradient-masking detection, while achieving 92% accuracy, is not perfect; and the correlation between RDI and full AutoAttack, while high, suggests some loss of information in the diagnostic index approach. The generalizability of findings across different model architectures, domains, and threat models beyond those evaluated on RobustBench is not thoroughly addressed.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers