✨ TL;DR
This paper proposes CS-ARM-BN, a meta-learning method that uses negative control samples to adapt deep learning models to new experimental batches in biomedical imaging, closing the domain gap caused by batch effects. The approach achieves 0.935±0.018 accuracy on drug mechanism-of-action classification, recovering performance from 0.862±0.060 back to near training-domain levels of 0.939±0.005.
Batch effects—systematic technical variations unrelated to biological signals—are a critical problem in biomedical imaging that severely limit the practical deployment of deep learning models. When trained on one experimental batch, models experience dramatic performance drops on new batches from different experimental conditions or labs, with accuracy falling from 0.939 to 0.862 on the JUMP-CP dataset. Despite years of research, no existing method has successfully closed this domain gap for deep learning systems, preventing their real-world clinical and research applications.
The authors propose Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that leverages negative control samples—unperturbed reference images that are routinely included in every experimental batch by design. These control samples serve as stable in-context anchors for adaptation, allowing the model to recalibrate to new batch conditions without requiring labeled data from the target domain. The method exploits batch normalization statistics to perform principled domain adaptation.
What the paper shows.
On the JUMP-CP dataset for Mechanism-of-Action classification, CS-ARM-BN achieves 0.935±0.018 accuracy on new experimental batches, compared to 0.862±0.060 for standard ResNets and failure of foundation models even with Typical Variation Normalization. This represents recovery of the domain gap from a 7.7 percentage point drop to only 0.4 percentage points relative to the training domain performance of 0.939±0.005. The method is particularly effective when batches exhibit strong domain shifts, such as data generated in different laboratories.
The paper focuses evaluation on a single task (Mechanism-of-Action classification) on one large-scale dataset (JUMP-CP), limiting generalizability claims to other biomedical imaging tasks and modalities. The method assumes negative control samples are available in every batch, which may not hold for all experimental designs or historical datasets. The paper does not provide detailed computational cost analysis or comparison of adaptation efficiency relative to alternatives. Ablation studies on the specific components of CS-ARM-BN (batch normalization choice, meta-learning algorithm details) are not discussed.
✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.