✨ TL;DR
This paper proposes a graph neural network framework to analyze assurance cases (structured safety argument documents) for link prediction between argument elements and detection of LLM-generated versus human-authored cases. The work reveals that LLM-generated assurance cases have distinct structural patterns and highlights limitations in GNN explanation methods.
Assurance cases are critical documents in regulated domains for demonstrating compliance with safety and industry standards. However, there is limited automated analysis of their structure and provenance. As large language models become more prevalent, there is a need to detect whether assurance cases are human-authored or machine-generated, as LLM-generated cases may introduce biases. Additionally, understanding the connections between argument elements (claims, evidence, warrants) is important for validating case quality and completeness.
The authors represent assurance cases as text-attributed graphs where nodes represent argument elements and edges represent logical connections. They apply graph neural networks to two tasks: (1) link prediction using graph-based learning to identify missing or implicit connections between argument elements, and (2) graph classification to distinguish human-authored from LLM-generated assurance cases. They compiled a publicly available dataset of assurance cases represented as graphs and evaluated GNN performance in both supervised and semi-supervised settings across different domains.
What the paper shows.
The framework demonstrates strong performance on both tasks. For link prediction, GNNs achieve ROC-AUC of 0.760 on real assurance cases with good generalization across domains and semi-supervised settings. For provenance analysis, the model achieves F1 score of 0.94 in distinguishing human-authored from LLM-generated cases. Analysis reveals that LLM-generated cases have distinct hierarchical linking patterns compared to human cases. However, explanation methods applied to the GNNs show only moderate faithfulness, suggesting limitations in interpreting model decisions.
The paper does not provide detailed information about dataset size or composition specifics. The moderate faithfulness of GNN explanation methods indicates that while the models make accurate predictions, understanding why they make those predictions remains challenging. The work focuses on structural and provenance analysis but may not capture all semantic aspects of assurance case quality. Generalization to other regulated domains beyond those in the dataset is not fully explored.
✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.