✨ TL;DR
COMPASS is a data-centric framework that adapts large language models to target languages using parameter-efficient fine-tuning with semantic-aware sampling of multilingual data. It minimizes negative cross-lingual interference while maximizing positive transfer, and includes a continual learning variant to handle production data distribution shifts.
Large language models exhibit significant performance disparities across languages, and naive multilingual fine-tuning often degrades performance due to negative cross-lingual interference. Existing approaches typically rely on linguistic similarity to select auxiliary data, which fails to account for semantic gaps between training data and target usage distributions. This creates a need for more intelligent data selection strategies that can identify which multilingual data will provide positive transfer without causing interference.
COMPASS uses a distribution-aware sampling strategy that leverages multilingual embeddings and clustering to identify semantic gaps between existing training data and target usage distributions. The framework trains lightweight, language-specific adapters via parameter-efficient fine-tuning (PEFT) on carefully selected auxiliary multilingual data, prioritizing samples from under-represented semantic clusters. The method is extended into COMPASS-ECDA, a continual learning framework that monitors for data distribution shifts in production environments and dynamically updates adapters to prevent model staleness while preserving existing knowledge.
What the paper shows.
COMPASS consistently outperforms baseline methods across three model architectures (Phi-4-Mini, Llama-3.1-8B, and Qwen2.5-7B) and multiple challenging multilingual benchmarks including Global-MMLU, MMLU-ProX, and unseen long-context tasks (OneRuler). The framework demonstrates effective, efficient, and sustainable performance for developing and maintaining high-performing multilingual models in dynamic environments.
The paper does not explicitly detail computational costs or training time comparisons with baseline methods. The evaluation focuses on specific model architectures and benchmarks, and generalization to other model families or languages beyond those tested is unclear. The continual learning variant (COMPASS-ECDA) requires monitoring mechanisms in production, which may introduce operational overhead not fully discussed. The paper does not provide ablation studies isolating the contribution of individual components like the clustering strategy versus the semantic sampling approach.
✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.