Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

Fariz Ikhwantri; Dusica Marijan

✨ TL;DR

This paper proposes a graph neural network framework to analyze assurance cases (structured safety argument documents) for link prediction between argument elements and detection of LLM-generated versus human-authored cases. The work reveals that LLM-generated assurance cases have distinct structural patterns and highlights limitations in GNN explanation methods.

01 · Problem

Assurance cases are critical documents in regulated domains for demonstrating compliance with safety and industry standards. However, there is limited automated analysis of their structure and provenance. As large language models become more prevalent, there is a need to detect whether assurance cases are human-authored or machine-generated, as LLM-generated cases may introduce biases. Additionally, understanding the connections between argument elements (claims, evidence, warrants) is important for validating case quality and completeness.

02 · Approach

The authors represent assurance cases as text-attributed graphs where nodes represent argument elements and edges represent logical connections. They apply graph neural networks to two tasks: (1) link prediction using graph-based learning to identify missing or implicit connections between argument elements, and (2) graph classification to distinguish human-authored from LLM-generated assurance cases. They compiled a publicly available dataset of assurance cases represented as graphs and evaluated GNN performance in both supervised and semi-supervised settings across different domains.

03 · Key insights

What the paper shows.

01Graph neural networks achieve strong link prediction performance (ROC-AUC 0.760) on real assurance cases and generalize well across domains and semi-supervised settings

02GNNs effectively distinguish human-authored from LLM-generated assurance cases with F1 score of 0.94, enabling provenance detection

03LLM-generated assurance cases exhibit different hierarchical linking patterns compared to human-authored cases, suggesting systematic structural differences

04Existing GNN explanation methods show only moderate faithfulness, indicating a significant gap between predicted reasoning and actual argument structure

04 · Results

The framework demonstrates strong performance on both tasks. For link prediction, GNNs achieve ROC-AUC of 0.760 on real assurance cases with good generalization across domains and semi-supervised settings. For provenance analysis, the model achieves F1 score of 0.94 in distinguishing human-authored from LLM-generated cases. Analysis reveals that LLM-generated cases have distinct hierarchical linking patterns compared to human cases. However, explanation methods applied to the GNNs show only moderate faithfulness, suggesting limitations in interpreting model decisions.

05 · Limitations

The paper does not provide detailed information about dataset size or composition specifics. The moderate faithfulness of GNN explanation methods indicates that while the models make accurate predictions, understanding why they make those predictions remains challenging. The work focuses on structural and provenance analysis but may not capture all semantic aspects of assurance case quality. Generalization to other regulated domains beyond those in the dataset is not fully explored.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers