Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

Dario Simionato; Andrea Tonon; Mingxue Wang; Weiguo Wang; Tong Gui; Xiaoyue Li

✨ TL;DR

This paper proposes a method to estimate long-term treatment effects and lifetime value changes in A/B tests for streaming platforms where user churn is costly. It uses inverse-variance weighted estimation across multiple cohorts and parametric decay modeling to capture both steady-state impact and cumulative user value within short experiments.

01 · Problem

A/B tests in streaming platforms typically evaluate outcomes within limited experimental horizons, missing how treatments affect long-term user retention and lifetime value. Short-term metrics may appear favorable while long-term effects are neutral, yet the intervention could still generate lower total value than control due to user churn. Existing approaches fail to simultaneously capture steady-state treatment effects and cumulative value impact, leading to potentially incorrect product decisions when relying on either short-term or long-term metrics alone.

02 · Approach

The method combines multiple cohorts using an inverse-variance weighted estimator to efficiently estimate time-varying treatment effects with reduced variance. The estimated treatment trajectory is then fit to a parametric decay model to recover both the asymptotic treatment effect and cumulative value generated over time. This framework enables simultaneous evaluation of long-term treatment effects (LTE) and residual expected remaining lifetime value change (ΔERLV) within a single short multi-cohort A/B test under user learning dynamics.

03 · Key insights

What the paper shows.

01Inverse-variance weighting across multiple cohorts reduces variance in long-term treatment effect estimation compared to standard single-cohort approaches

02Parametric decay modeling of treatment trajectories enables recovery of asymptotic effects and cumulative value from short experimental windows

03Simultaneous evaluation of steady-state impact and residual lifetime value prevents incorrect decisions that would result from considering only short-term or long-term metrics independently

04User learning dynamics in streaming platforms require specialized treatment of time-varying effects to accurately predict long-term retention and churn impact

04 · Results

The empirical results demonstrate improved precision in estimating both long-term treatment effects and residual expected remaining lifetime value compared to baseline approaches. The framework successfully identifies scenarios where relying exclusively on short-term metrics or long-term engagement predictions would lead to incorrect product decisions, validating the importance of jointly considering steady-state impact and cumulative user value within a unified evaluation framework.

05 · Limitations

The paper does not explicitly discuss limitations, but implicit constraints include: the method's applicability to streaming platforms specifically may limit generalizability to other domains; the parametric decay model assumes a particular functional form for treatment trajectory that may not hold universally; the approach requires sufficient multi-cohort data and assumes user learning follows predictable patterns; and the framework's performance depends on accurate modeling of long-term user behavior which may be difficult to validate within short experimental windows.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers