CHASM: Unveiling Covert Advertisements on Chinese Social Media

Jingyi Zheng; Tianyi Hu; Yule Liu; Zhen Sun; Zongmin Zhang; Zifan Peng; Wenhan Dong; Xinlei He

✨ TL;DR

This paper introduces CHASM, a dataset of 4,992 annotated instances from Chinese social media for detecting covert advertisements disguised as regular posts. The study reveals that current multimodal large language models fail to reliably identify these deceptive advertisements, highlighting a critical gap in content moderation capabilities.

01 · Problem

Covert advertisements on social media platforms pose a significant ethical and legal threat by disguising promotional content as authentic user posts to deceive consumers into making purchases. Current benchmarks and evaluation frameworks for large language models in social media moderation completely overlook this emerging threat, leaving platforms vulnerable to sophisticated deceptive marketing practices. This gap is particularly acute on platforms like Rednote, where product experience sharing posts can closely resemble covert advertisements, making detection challenging.

02 · Approach

The authors created CHASM, a high-quality, manually curated dataset of 4,992 instances collected from the Chinese social media platform Rednote. The dataset was compiled under strict privacy protection and quality control protocols with careful anonymization. The authors evaluated multiple multimodal large language models (MLLMs) under zero-shot and in-context learning settings, then conducted fine-tuning experiments on open-source MLLMs to assess performance improvements and identify persistent challenges in covert advertisement detection.

03 · Key insights

What the paper shows.

01No current MLLMs are sufficiently reliable for detecting covert advertisements in zero-shot or in-context learning settings

02Fine-tuning open-source MLLMs on the CHASM dataset yields noticeable performance improvements, suggesting the dataset enables model adaptation

03Detecting subtle cues in comments and distinguishing between visual and textual structures remain significant unresolved challenges

04Product experience sharing posts that closely resemble covert advertisements create a particularly challenging detection scenario

04 · Results

Evaluation of current MLLMs revealed inadequate performance in detecting covert advertisements across both zero-shot and in-context learning settings. Fine-tuning experiments on open-source MLLMs showed noticeable performance gains when trained on the CHASM dataset, indicating the dataset's utility for model improvement. However, the models continue to struggle with subtle contextual cues in comments and distinguishing between visual and textual structural differences that characterize covert advertisements.

05 · Limitations

The study is limited to the Chinese social media platform Rednote, potentially restricting generalizability to other platforms and languages. The paper does not provide specific quantitative performance metrics for the evaluated models, making it difficult to assess the magnitude of performance gaps. The dataset's focus on product experience sharing posts may not capture the full diversity of covert advertisement strategies. Additionally, the paper acknowledges but does not fully resolve challenges in detecting subtle cues and visual-textual distinctions, indicating incomplete solutions to the core problem.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers