diff --git a/docs/_quarto.yml b/docs/_quarto.yml index dfdd1f2e7..6bb93e348 100644 --- a/docs/_quarto.yml +++ b/docs/_quarto.yml @@ -31,6 +31,10 @@ website: file: changelog.qmd - text: "Contributing" file: contributing.qmd + - text: Tutorials + menu: + - text: "Variance Reduction in AB Tests" + file: panel_variance_reduction.ipynb - text: Learn more menu: - text: "Regression Tables and Summary Statistics" diff --git a/docs/panel_variance_reduction.ipynb b/docs/panel_variance_reduction.ipynb new file mode 100644 index 000000000..604fde94f --- /dev/null +++ b/docs/panel_variance_reduction.ipynb @@ -0,0 +1,1838 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7671c77a", + "metadata": {}, + "source": [ + "## Panel Estimators for AB Tests with Repeated Observations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b206e77", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "from IPython.display import display\n", + "from tqdm import tqdm\n", + "\n", + "import pyfixest as pf\n", + "from pyfixest.utils.dgps import get_sharkfin" + ] + }, + { + "cell_type": "markdown", + "id": "0e15e974", + "metadata": {}, + "source": [ + "In this tutorial, we show how to use pyfixest to reduce the variance of your estimators in AB tests with repeated observations.\n", + "\n", + "For example, we may be a streaming platform and we want to estimate the effect of a new feature on the amount of time users spend watching videos. To do so, \n", + "we randomly assign the treatment to half of our users. For 15 days prior to the experiment, we track the desired outcome (minutes watched) for each user. If\n", + "users are not seen on the platform on a given day, the number of minutes watched is 0. Our experiment runs for 15 days. All in all, we have 30 days of data for each \n", + "user. \n", + "\n", + "To get started, we simulate a panel data set of 100_000 users, with mentioned 30 days of data, with 15 days of pre and post data. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "337fdce4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
userdaytreatminutes_watchedever_treated
00008.1144880
10107.6330580
20206.7123320
30306.2308200
40406.0044890
\n", + "
" + ], + "text/plain": [ + " user day treat minutes_watched ever_treated\n", + "0 0 0 0 8.114488 0\n", + "1 0 1 0 7.633058 0\n", + "2 0 2 0 6.712332 0\n", + "3 0 3 0 6.230820 0\n", + "4 0 4 0 6.004489 0" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def get_data(num_units: int, sigma_unit: int) -> pd.DataFrame:\n", + " \"\"\"\n", + " Random example data set.\n", + " num_units: int\n", + " The number of users\n", + " sigma_unit: int\n", + " The user-level idosyncratic error term.\n", + " \"\"\"\n", + " data = get_sharkfin(\n", + " num_units=100_000,\n", + " num_periods=30,\n", + " num_treated=500,\n", + " treatment_start=15,\n", + " seed=231,\n", + " sigma_unit=18,\n", + " )\n", + " data = data.rename(columns={\"Y\": \"minutes_watched\", \"unit\": \"user\", \"year\": \"day\"})\n", + "\n", + " return data\n", + "\n", + "\n", + "data = get_data(num_units=100_000, sigma_unit=18)\n", + "data.head()" + ] + }, + { + "cell_type": "markdown", + "id": "b125d2a3", + "metadata": {}, + "source": [ + "We can inspect the data generating process via the `panelview()` function: " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "005e6a4d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "pf.panelview(\n", + " data,\n", + " unit=\"user\",\n", + " time=\"day\",\n", + " treat=\"treat\",\n", + " collapse_to_cohort=True,\n", + " sort_by_timing=True,\n", + " ylab=\"Cohort\",\n", + " xlab=\"Day\",\n", + " title=\"Treatment Assignment Cohorts\",\n", + " figsize=(6, 5),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "a368b4ea", + "metadata": {}, + "source": [ + "We see that half of our users are treated after half the time. " + ] + }, + { + "cell_type": "markdown", + "id": "f9c7a8aa", + "metadata": {}, + "source": [ + "In the next step, we will look at three different ways to compute the average treatment effect of our feature on the outcome: \n", + "\n", + "- First, we will compute a standard \"difference in means\" estimator and conduct a two-sampled t-test for inference. We do this both by hand and by means of linear regression. While standard, we will show that this estimator is relatively inefficient \n", + " as it throws aways valuable information from the pre-treatment periods. \n", + "- In a second step, we improve on the difference in means estimator and include pre-treatment measures of the outcome variable to our regression model. In the tech blog world, this method \n", + " is often referred to as CUPED (which stands for \"Controlled-experiment Using Pre-Experiment Data\" as far as we know). We will demonstrate that CUPED uted leads to a significant reduction in the variance of our estimators. Pre-treatment averages are a \"good control\" variable because \n", + " a) under uncondional randomization, pre-treatment averages should be uncorrelated of the treatment assignment and b) pre-treatment averages should be highly predictive of the post-treatment outcome. The intuition here is simply that users will behave similarly \"after\" the experiment starts / when receiving the treatment than they did before. \n", + "- In a third step, we show that instead of using pre-treatment averages as a control variable, we can simply fit a panel model and control for user and time fixed effect. \n", + "\n", + "All three estimators identify the same average treatment effect, but the CUPED and panel estimator are much more efficient: they produce lower variances. \n", + "\n", + "But first, let's discuss why we are interested in reducing the variance of our estimators. " + ] + }, + { + "cell_type": "markdown", + "id": "d487603a", + "metadata": {}, + "source": [ + "## Variance of Estimators, Statistical Power and Sample Size Requirements\n", + "\n", + "In statistical experiments, we care about power to make sure that we detect a true effect. \n", + "\n", + "It depends on the **signal-to-noise ratio**:\n", + "\n", + "$$\n", + "\\text{Power} \\;\\sim\\; f\\!\\left(\\frac{|\\tau|}{\\text{SE}}\\right)\n", + "$$\n", + "\n", + "where\n", + "\n", + "- $\\tau$ = the true effect size \n", + "- $\\text{SE} = \\frac{\\sigma}{\\sqrt{n}}$ = the standard error of the estimate \n", + "- $\\sigma$ = standard deviation of the outcome \n", + "- $n$ = sample size per group (if balanced) \n", + "\n", + "So, anything that **increases the effect size**, **increases the sample size**, or **reduces outcome variance** will improve power. That's where our interest in variance reduction stems from. \n" + ] + }, + { + "cell_type": "markdown", + "id": "ef3befc5", + "metadata": {}, + "source": [ + "## Simple Difference in Means Estimator" + ] + }, + { + "cell_type": "markdown", + "id": "94686a5c", + "metadata": {}, + "source": [ + "The simplest way to analyse an AB test / estimate an average treatment effect is the difference in means estimator: \n", + "$$\n", + " \\tau = \\frac{1}{n_1} \\sum_{i=1}^{n_1} Y_{i,1} - \\frac{1}{n_0} \\sum_{i=1}^{n_0} Y_{i,0}\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "b037d365", + "metadata": {}, + "source": [ + "We can compute it in a few lines of `pandas` by implementing three steps: \n", + "- First, we throw away all pre-experimental data. \n", + "- Then, we sum the post-treatment minutes watched into a single data point per user. \n", + "- Finally, we compute the difference of means of total minutes watched between the treated and control group.\n", + "Done! " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5eeecd81", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
usertotal_minutes_watched
treat
0NaNNaN
1-1387.6301510.761514
\n", + "
" + ], + "text/plain": [ + " user total_minutes_watched\n", + "treat \n", + "0 NaN NaN\n", + "1 -1387.630151 0.761514" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def _difference_in_means_pd(data):\n", + " # standard analyses: throw away pre-experimental data\n", + " data2 = data.copy()\n", + " data_post = data2[data2.day >= 15]\n", + " # collapse post-treatment minutes watched into a single data point per user\n", + " data_post_agg = (\n", + " data_post.groupby([\"user\", \"treat\"])\n", + " .agg({\"minutes_watched\": \"mean\"})\n", + " .reset_index()\n", + " .rename(columns={\"minutes_watched\": \"total_minutes_watched\"})\n", + " )\n", + " # compute difference of means estimator\n", + " return data_post_agg.groupby(\"treat\").mean().diff()\n", + "\n", + "\n", + "_difference_in_means_pd(data)" + ] + }, + { + "cell_type": "markdown", + "id": "c5d99260", + "metadata": {}, + "source": [ + "Because linear regression is just a fency way to compute and compare differences, we could also have estimated this via `pf.feols()`: " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "01495f78", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "###\n", + "\n", + "Estimation: OLS\n", + "Dep. var.: minutes_watched, Fixed effects: 0\n", + "Inference: iid\n", + "Observations: 1500000\n", + "\n", + "| Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |\n", + "|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|\n", + "| Intercept | 0.357 | 0.015 | 24.163 | 0.000 | 0.328 | 0.386 |\n", + "| treat | 0.762 | 0.209 | 3.641 | 0.000 | 0.352 | 1.171 |\n", + "---\n", + "RMSE: 18.07 R2: 0.0 \n" + ] + } + ], + "source": [ + "def _difference_in_means(data):\n", + " data2 = data.copy()\n", + "\n", + " data_post = data2[data2.day >= 15]\n", + " fit = pf.feols(\"minutes_watched ~ treat\", data=data_post)\n", + "\n", + " return fit\n", + "\n", + "\n", + "fit_difference_in_means = _difference_in_means(data)\n", + "fit_difference_in_means.summary()" + ] + }, + { + "cell_type": "markdown", + "id": "c9f44820", + "metadata": {}, + "source": [ + "Note that no aggregation step is needed and that we get identical point estimates. As a bonus, we get standard errors and confidence intervals in the process. " + ] + }, + { + "cell_type": "markdown", + "id": "e972baa3", + "metadata": {}, + "source": [ + "## Cuped: Using Pre-Experimental Measures of the Outcomes Variable as Controls\n", + "\n", + "Sometimes, throwing away data can be a winning strategy (\"garbage in garbage out\"), but not in this example: we have high-quality pre-experimental measures of the behavior \n", + "of our users at hand, and we should use it in our favor! \n", + "\n", + "More specifically, instead of throwing away all of the pre-experimental measures, we could instead have used them as a control! \n", + "\n", + "This should help reduce the variance of our estimators because pre-experimental behavior is likely to be highly predicitive of what users do after the launch of the experiment. Consequently, including these baselines in our regression models should help us \"explain residual errors\" and thereby reduce the variance of our \n", + "estimators. \n", + "\n", + "If this works, it's great news, as it allows us to run the same experiment on a smaller number of users and still achieve the same level of power!" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "817bb864", + "metadata": {}, + "outputs": [], + "source": [ + "def _regression_cuped(data):\n", + " data = data.copy()\n", + " data[\"pre_experiment\"] = data[\"day\"] < 15\n", + "\n", + " # aggregate pre-post averages by user\n", + " agg = data.groupby([\"user\", \"pre_experiment\"], as_index=False).agg(\n", + " minutes_watched=(\"minutes_watched\", \"mean\")\n", + " )\n", + "\n", + " wide = (\n", + " agg.pivot(index=\"user\", columns=\"pre_experiment\", values=\"minutes_watched\")\n", + " .rename(columns={True: \"minutes_pre\", False: \"minutes_post\"})\n", + " .reset_index()\n", + " )\n", + "\n", + " wide = wide.merge(\n", + " data[[\"user\", \"ever_treated\"]].drop_duplicates(), on=\"user\", how=\"left\"\n", + " ).rename(columns={\"ever_treated\": \"treat\"})\n", + "\n", + " # center the pre metric\n", + " mu_pre = wide[\"minutes_pre\"].mean()\n", + " wide[\"minutes_pre_c\"] = wide[\"minutes_pre\"] - mu_pre\n", + "\n", + " fit_cuped = pf.feols(\n", + " \"minutes_post ~ treat + minutes_pre_c\", data=wide, vcov=\"hetero\"\n", + " )\n", + "\n", + " return fit_cuped, wide\n", + "\n", + "\n", + "fit_cuped, data_cuped = _regression_cuped(data)" + ] + }, + { + "cell_type": "markdown", + "id": "9ae49f77", + "metadata": {}, + "source": [ + "Per user, we now log the total minutes watched before and after the treatment and then fit a regression model of the following form: ^^" + ] + }, + { + "cell_type": "markdown", + "id": "6df64964", + "metadata": {}, + "source": [ + "$$\n", + " \\text{total minutes watched after treatment} = \\alpha + \\beta \\text{treat} + \\gamma (\\text{total minutes watched before treatment} - \\text{avg total minutes watched before treatment}) + \\epsilon\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "550da9fc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
userminutes_postminutes_pretreatminutes_pre_c
007.7478417.76717807.894779
1125.59592824.674535024.802136
22-32.224392-32.1303340-32.002733
33-12.269762-13.1424350-13.014835
44-1.448348-1.3920430-1.264442
\n", + "
" + ], + "text/plain": [ + " user minutes_post minutes_pre treat minutes_pre_c\n", + "0 0 7.747841 7.767178 0 7.894779\n", + "1 1 25.595928 24.674535 0 24.802136\n", + "2 2 -32.224392 -32.130334 0 -32.002733\n", + "3 3 -12.269762 -13.142435 0 -13.014835\n", + "4 4 -1.448348 -1.392043 0 -1.264442" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_cuped.head()" + ] + }, + { + "cell_type": "markdown", + "id": "56861ab2", + "metadata": {}, + "source": [ + "We can compare with the previous results: " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "67d8c3f6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "\n", + "\n", + " \n", + " \n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " minutes_watched\n", + " \n", + " minutes_post\n", + "
(1)(2)
coef
treat0.762***
(0.209)
0.192***
(0.029)
minutes_pre_c0.999***
(0.000)
Intercept0.357***
(0.015)
0.360***
(0.002)
stats
Observations1500000100000
S.E. typeiidhetero
R20.0000.999
Adj. R20.0000.999
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\n", + "Coefficient \n", + " (Std. Error)
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "GT(_tbl_data= level_0 level_1 0 1\n", + "0 coef treat 0.762***
(0.209) 0.192***
(0.029)\n", + "1 coef minutes_pre_c 0.999***
(0.000)\n", + "2 coef Intercept 0.357***
(0.015) 0.360***
(0.002)\n", + "3 stats Observations 1500000 100000\n", + "4 stats S.E. type iid hetero\n", + "5 stats R2 0.000 0.999\n", + "6 stats Adj. R2 0.000 0.999, _body=, _boxhead=Boxhead([ColInfo(var='level_0', type=, column_label='level_0', column_align='center', column_width=None), ColInfo(var='level_1', type=, column_label='level_1', column_align='center', column_width=None), ColInfo(var='0', type=, column_label='(1)', column_align='center', column_width=None), ColInfo(var='1', type=, column_label='(2)', column_align='center', column_width=None)]), _stub=, _spanners=Spanners([SpannerInfo(spanner_id='minutes_watched', spanner_level=1, spanner_label='minutes_watched', spanner_units=None, spanner_pattern=None, vars=['0'], built=None), SpannerInfo(spanner_id='minutes_post', spanner_level=1, spanner_label='minutes_post', spanner_units=None, spanner_pattern=None, vars=['1'], built=None)]), _heading=Heading(title=None, subtitle=None, preheader=None), _stubhead=None, _source_notes=['Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\\nCoefficient \\n (Std. Error)'], _footnotes=[], _styles=[], _locale=, _formats=[], _substitutions=[], _options=Options(table_id=OptionsInfo(scss=False, category='table', type='value', value=None), table_caption=OptionsInfo(scss=False, category='table', type='value', value=None), table_width=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_layout=OptionsInfo(scss=True, category='table', type='value', value='fixed'), table_margin_left=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_margin_right=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_background_color=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_additional_css=OptionsInfo(scss=False, category='table', type='values', value=[]), table_font_names=OptionsInfo(scss=False, category='table', type='values', value=['-apple-system', 'BlinkMacSystemFont', 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', 'Helvetica Neue', 'Fira Sans', 'Droid Sans', 'Arial', 'sans-serif']), table_font_size=OptionsInfo(scss=True, category='table', type='px', value='16px'), table_font_weight=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_style=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_color=OptionsInfo(scss=True, category='table', type='value', value='#333333'), table_font_color_light=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_border_top_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_top_style=OptionsInfo(scss=True, category='table', type='value', value='solid'), table_border_top_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_top_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_right_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_right_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_right_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), table_border_bottom_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_bottom_style=OptionsInfo(scss=True, category='table', type='value', value='hidden'), table_border_bottom_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_bottom_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_left_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_left_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_left_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), heading_background_color=OptionsInfo(scss=True, category='heading', type='value', value=None), heading_align=OptionsInfo(scss=True, category='heading', type='value', value='center'), heading_title_font_size=OptionsInfo(scss=True, category='heading', type='px', value='125%'), heading_title_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_subtitle_font_size=OptionsInfo(scss=True, category='heading', type='px', value='85%'), heading_subtitle_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_padding=OptionsInfo(scss=True, category='heading', type='px', value='4px'), heading_padding_horizontal=OptionsInfo(scss=True, category='heading', type='px', value='5px'), heading_border_bottom_style=OptionsInfo(scss=True, category='heading', type='value', value='solid'), heading_border_bottom_width=OptionsInfo(scss=True, category='heading', type='px', value='2px'), heading_border_bottom_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), heading_border_lr_style=OptionsInfo(scss=True, category='heading', type='value', value='none'), heading_border_lr_width=OptionsInfo(scss=True, category='heading', type='px', value='1px'), heading_border_lr_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), column_labels_background_color=OptionsInfo(scss=True, category='column_labels', type='value', value=None), column_labels_font_size=OptionsInfo(scss=True, category='column_labels', type='px', value='100%'), column_labels_font_weight=OptionsInfo(scss=True, category='column_labels', type='value', value='normal'), column_labels_text_transform=OptionsInfo(scss=True, category='column_labels', type='value', value='inherit'), column_labels_padding=OptionsInfo(scss=True, category='column_labels', type='px', value='4px'), column_labels_padding_horizontal=OptionsInfo(scss=True, category='column_labels', type='px', value='5px'), column_labels_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), column_labels_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), column_labels_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), column_labels_border_top_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_top_width=OptionsInfo(scss=True, category='column_labels', type='px', value='2px'), column_labels_border_top_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_bottom_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_bottom_width=OptionsInfo(scss=True, category='column_labels', type='px', value='0.5px'), column_labels_border_bottom_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_lr_style=OptionsInfo(scss=True, category='column_labels', type='value', value='none'), column_labels_border_lr_width=OptionsInfo(scss=True, category='column_labels', type='px', value='1px'), column_labels_border_lr_color=OptionsInfo(scss=True, category='column_labels', type='value', value='#D3D3D3'), column_labels_hidden=OptionsInfo(scss=False, category='column_labels', type='boolean', value=False), row_group_background_color=OptionsInfo(scss=True, category='row_group', type='value', value=None), row_group_font_size=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_font_weight=OptionsInfo(scss=True, category='row_group', type='value', value='initial'), row_group_text_transform=OptionsInfo(scss=True, category='row_group', type='value', value='inherit'), row_group_padding=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_padding_horizontal=OptionsInfo(scss=True, category='row_group', type='px', value='5px'), row_group_border_top_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_top_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_top_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_right_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_right_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_right_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_border_bottom_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_bottom_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_bottom_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_left_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_left_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_left_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_as_column=OptionsInfo(scss=False, category='row_group', type='boolean', value=False), table_body_hlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_hlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='1px'), table_body_hlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='#D3D3D3'), table_body_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), table_body_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), table_body_border_top_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_top_width=OptionsInfo(scss=True, category='table_body', type='px', value='0.5px'), table_body_border_top_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), table_body_border_bottom_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_bottom_width=OptionsInfo(scss=True, category='table_body', type='px', value='2px'), table_body_border_bottom_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), data_row_padding=OptionsInfo(scss=True, category='data_row', type='px', value='4px'), data_row_padding_horizontal=OptionsInfo(scss=True, category='data_row', type='px', value='5px'), stub_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_border_style=OptionsInfo(scss=True, category='stub', type='value', value='hidden'), stub_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), stub_row_group_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_row_group_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_row_group_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_row_group_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_row_group_border_style=OptionsInfo(scss=True, category='stub', type='value', value='solid'), stub_row_group_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_row_group_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), source_notes_padding=OptionsInfo(scss=True, category='source_notes', type='px', value='4px'), source_notes_padding_horizontal=OptionsInfo(scss=True, category='source_notes', type='px', value='5px'), source_notes_background_color=OptionsInfo(scss=True, category='source_notes', type='value', value=None), source_notes_font_size=OptionsInfo(scss=True, category='source_notes', type='px', value='90%'), source_notes_border_bottom_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_bottom_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_bottom_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_border_lr_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_lr_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_lr_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_multiline=OptionsInfo(scss=False, category='source_notes', type='boolean', value=True), source_notes_sep=OptionsInfo(scss=False, category='source_notes', type='value', value=' '), row_striping_background_color=OptionsInfo(scss=True, category='row', type='value', value='rgba(128,128,128,0.05)'), row_striping_include_stub=OptionsInfo(scss=False, category='row', type='boolean', value=False), row_striping_include_table_body=OptionsInfo(scss=False, category='row', type='boolean', value=False), container_width=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_height=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_padding_x=OptionsInfo(scss=False, category='container', type='px', value='0px'), container_padding_y=OptionsInfo(scss=False, category='container', type='px', value='10px'), container_overflow_x=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), container_overflow_y=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), quarto_disable_processing=OptionsInfo(scss=False, category='quarto', type='logical', value=False), quarto_use_bootstrap=OptionsInfo(scss=False, category='quarto', type='logical', value=False)), _has_built=False)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pf.etable([fit_difference_in_means, fit_cuped])" + ] + }, + { + "cell_type": "markdown", + "id": "34dc9d20", + "metadata": {}, + "source": [ + "Point estimates for the difference-in-means estimator is 0.762, while it is 0.192 for the Cuped estimator. But most importantly, the standard errors are much smaller for the CUPED estimator (0.03 vs 0.209). Because we have fitted a regression model with covariates, we have used heteroskedasticity-robust standard errors, which should be more conservative than the iid errors used in the difference-in-means regression." + ] + }, + { + "cell_type": "markdown", + "id": "2899a0fe", + "metadata": {}, + "source": [ + "## Panel Estimator\n", + "\n", + "Instead of collapsing all pre-and post information into a single average (and thereby losing information), we can as well be lazy and just use a panel estimator.\n", + "\n", + "Via `pyfixest`, that's one line of code: " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "0e6998a8", + "metadata": {}, + "outputs": [], + "source": [ + "fit_panel = pf.feols(\n", + " \"minutes_watched ~ treat | user + day\",\n", + " data=data,\n", + " vcov=\"hetero\",\n", + " demeaner_backend=\"rust\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "52670ecf", + "metadata": {}, + "source": [ + "We can compare all results: " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "8cf20732", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + " \n", + " \n", + " \n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " minutes_watched\n", + " \n", + " minutes_post\n", + " \n", + " minutes_watched\n", + "
(1)(2)(3)
coef
treat0.762***
(0.209)
0.192***
(0.029)
0.191***
(0.012)
minutes_pre_c0.999***
(0.000)
Intercept0.357***
(0.015)
0.360***
(0.002)
fe
day--x
user--x
stats
Observations15000001000003000000
S.E. typeiidheterohetero
R20.0000.9990.999
R2 Within--0.000
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\n", + "Coefficient \n", + " (Std. Error)
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "GT(_tbl_data= level_0 level_1 0 1 \\\n", + "0 coef treat 0.762***
(0.209) 0.192***
(0.029) \n", + "1 coef minutes_pre_c 0.999***
(0.000) \n", + "2 coef Intercept 0.357***
(0.015) 0.360***
(0.002) \n", + "3 fe day - - \n", + "4 fe user - - \n", + "5 stats Observations 1500000 100000 \n", + "6 stats S.E. type iid hetero \n", + "7 stats R2 0.000 0.999 \n", + "8 stats R2 Within - - \n", + "\n", + " 2 \n", + "0 0.191***
(0.012) \n", + "1 \n", + "2 \n", + "3 x \n", + "4 x \n", + "5 3000000 \n", + "6 hetero \n", + "7 0.999 \n", + "8 0.000 , _body=, _boxhead=Boxhead([ColInfo(var='level_0', type=, column_label='level_0', column_align='center', column_width=None), ColInfo(var='level_1', type=, column_label='level_1', column_align='center', column_width=None), ColInfo(var='0', type=, column_label='(1)', column_align='center', column_width=None), ColInfo(var='1', type=, column_label='(2)', column_align='center', column_width=None), ColInfo(var='2', type=, column_label='(3)', column_align='center', column_width=None)]), _stub=, _spanners=Spanners([SpannerInfo(spanner_id='minutes_watched', spanner_level=1, spanner_label='minutes_watched', spanner_units=None, spanner_pattern=None, vars=['0', '2'], built=None), SpannerInfo(spanner_id='minutes_post', spanner_level=1, spanner_label='minutes_post', spanner_units=None, spanner_pattern=None, vars=['1'], built=None)]), _heading=Heading(title=None, subtitle=None, preheader=None), _stubhead=None, _source_notes=['Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\\nCoefficient \\n (Std. Error)'], _footnotes=[], _styles=[], _locale=, _formats=[], _substitutions=[], _options=Options(table_id=OptionsInfo(scss=False, category='table', type='value', value=None), table_caption=OptionsInfo(scss=False, category='table', type='value', value=None), table_width=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_layout=OptionsInfo(scss=True, category='table', type='value', value='fixed'), table_margin_left=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_margin_right=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_background_color=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_additional_css=OptionsInfo(scss=False, category='table', type='values', value=[]), table_font_names=OptionsInfo(scss=False, category='table', type='values', value=['-apple-system', 'BlinkMacSystemFont', 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', 'Helvetica Neue', 'Fira Sans', 'Droid Sans', 'Arial', 'sans-serif']), table_font_size=OptionsInfo(scss=True, category='table', type='px', value='16px'), table_font_weight=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_style=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_color=OptionsInfo(scss=True, category='table', type='value', value='#333333'), table_font_color_light=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_border_top_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_top_style=OptionsInfo(scss=True, category='table', type='value', value='solid'), table_border_top_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_top_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_right_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_right_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_right_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), table_border_bottom_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_bottom_style=OptionsInfo(scss=True, category='table', type='value', value='hidden'), table_border_bottom_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_bottom_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_left_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_left_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_left_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), heading_background_color=OptionsInfo(scss=True, category='heading', type='value', value=None), heading_align=OptionsInfo(scss=True, category='heading', type='value', value='center'), heading_title_font_size=OptionsInfo(scss=True, category='heading', type='px', value='125%'), heading_title_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_subtitle_font_size=OptionsInfo(scss=True, category='heading', type='px', value='85%'), heading_subtitle_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_padding=OptionsInfo(scss=True, category='heading', type='px', value='4px'), heading_padding_horizontal=OptionsInfo(scss=True, category='heading', type='px', value='5px'), heading_border_bottom_style=OptionsInfo(scss=True, category='heading', type='value', value='solid'), heading_border_bottom_width=OptionsInfo(scss=True, category='heading', type='px', value='2px'), heading_border_bottom_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), heading_border_lr_style=OptionsInfo(scss=True, category='heading', type='value', value='none'), heading_border_lr_width=OptionsInfo(scss=True, category='heading', type='px', value='1px'), heading_border_lr_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), column_labels_background_color=OptionsInfo(scss=True, category='column_labels', type='value', value=None), column_labels_font_size=OptionsInfo(scss=True, category='column_labels', type='px', value='100%'), column_labels_font_weight=OptionsInfo(scss=True, category='column_labels', type='value', value='normal'), column_labels_text_transform=OptionsInfo(scss=True, category='column_labels', type='value', value='inherit'), column_labels_padding=OptionsInfo(scss=True, category='column_labels', type='px', value='4px'), column_labels_padding_horizontal=OptionsInfo(scss=True, category='column_labels', type='px', value='5px'), column_labels_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), column_labels_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), column_labels_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), column_labels_border_top_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_top_width=OptionsInfo(scss=True, category='column_labels', type='px', value='2px'), column_labels_border_top_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_bottom_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_bottom_width=OptionsInfo(scss=True, category='column_labels', type='px', value='0.5px'), column_labels_border_bottom_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_lr_style=OptionsInfo(scss=True, category='column_labels', type='value', value='none'), column_labels_border_lr_width=OptionsInfo(scss=True, category='column_labels', type='px', value='1px'), column_labels_border_lr_color=OptionsInfo(scss=True, category='column_labels', type='value', value='#D3D3D3'), column_labels_hidden=OptionsInfo(scss=False, category='column_labels', type='boolean', value=False), row_group_background_color=OptionsInfo(scss=True, category='row_group', type='value', value=None), row_group_font_size=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_font_weight=OptionsInfo(scss=True, category='row_group', type='value', value='initial'), row_group_text_transform=OptionsInfo(scss=True, category='row_group', type='value', value='inherit'), row_group_padding=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_padding_horizontal=OptionsInfo(scss=True, category='row_group', type='px', value='5px'), row_group_border_top_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_top_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_top_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_right_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_right_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_right_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_border_bottom_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_bottom_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_bottom_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_left_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_left_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_left_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_as_column=OptionsInfo(scss=False, category='row_group', type='boolean', value=False), table_body_hlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_hlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='1px'), table_body_hlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='#D3D3D3'), table_body_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), table_body_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), table_body_border_top_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_top_width=OptionsInfo(scss=True, category='table_body', type='px', value='0.5px'), table_body_border_top_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), table_body_border_bottom_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_bottom_width=OptionsInfo(scss=True, category='table_body', type='px', value='2px'), table_body_border_bottom_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), data_row_padding=OptionsInfo(scss=True, category='data_row', type='px', value='4px'), data_row_padding_horizontal=OptionsInfo(scss=True, category='data_row', type='px', value='5px'), stub_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_border_style=OptionsInfo(scss=True, category='stub', type='value', value='hidden'), stub_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), stub_row_group_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_row_group_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_row_group_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_row_group_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_row_group_border_style=OptionsInfo(scss=True, category='stub', type='value', value='solid'), stub_row_group_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_row_group_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), source_notes_padding=OptionsInfo(scss=True, category='source_notes', type='px', value='4px'), source_notes_padding_horizontal=OptionsInfo(scss=True, category='source_notes', type='px', value='5px'), source_notes_background_color=OptionsInfo(scss=True, category='source_notes', type='value', value=None), source_notes_font_size=OptionsInfo(scss=True, category='source_notes', type='px', value='90%'), source_notes_border_bottom_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_bottom_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_bottom_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_border_lr_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_lr_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_lr_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_multiline=OptionsInfo(scss=False, category='source_notes', type='boolean', value=True), source_notes_sep=OptionsInfo(scss=False, category='source_notes', type='value', value=' '), row_striping_background_color=OptionsInfo(scss=True, category='row', type='value', value='rgba(128,128,128,0.05)'), row_striping_include_stub=OptionsInfo(scss=False, category='row', type='boolean', value=False), row_striping_include_table_body=OptionsInfo(scss=False, category='row', type='boolean', value=False), container_width=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_height=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_padding_x=OptionsInfo(scss=False, category='container', type='px', value='0px'), container_padding_y=OptionsInfo(scss=False, category='container', type='px', value='10px'), container_overflow_x=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), container_overflow_y=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), quarto_disable_processing=OptionsInfo(scss=False, category='quarto', type='logical', value=False), quarto_use_bootstrap=OptionsInfo(scss=False, category='quarto', type='logical', value=False)), _has_built=False)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pf.etable([fit_difference_in_means, fit_cuped, fit_panel])" + ] + }, + { + "cell_type": "markdown", + "id": "eb2bab0a", + "metadata": {}, + "source": [ + "The panel estimator almost exactly matches CUPED, and we do even better in terms of variance. \n", + "\n", + "However, because we are working with panel data, the error terms are likely correlated over time within each user. If we ignore this dependence, our standard errors may be underestimated, which in turn can lead to over-rejecting the null hypothesis of no effect. A common way to address this issue is to compute cluster-robust standard errors at the user level, which account for arbitrary heteroskedasticity and autocorrelation within users." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "1342b84e", + "metadata": {}, + "outputs": [], + "source": [ + "fit_panel_crv = pf.feols(\n", + " \"minutes_watched ~ treat | user + day\",\n", + " data=data,\n", + " vcov={\"CRV1\": \"user\"},\n", + " demeaner_backend=\"rust\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9ba7a9c2", + "metadata": {}, + "source": [ + "Comparing all results, the panel estimator still does quite well in terms of the size of the estimated standard errors. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d99943b1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " minutes_watched\n", + " \n", + " minutes_post\n", + " \n", + " minutes_watched\n", + "
(1)(2)(3)(4)
coef
treat0.7615***
(0.2092)
0.1918***
(0.0288)
0.1913***
(0.0118)
0.1913***
(0.0287)
minutes_pre_c0.9992***
(0.0001)
Intercept0.3574***
(0.0148)
0.3602***
(0.0021)
fe
day--xx
user--xx
stats
Observations150000010000030000003000000
S.E. typeiidheteroheteroby: user
R20.00000.99870.99850.9985
R2 Within--0.00010.0001
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\n", + "Coefficient \n", + " (Std. Error)
\n", + "\n", + "
\n", + " " + ], + "text/plain": [ + "GT(_tbl_data= level_0 level_1 0 \\\n", + "0 coef treat 0.7615***
(0.2092) \n", + "1 coef minutes_pre_c \n", + "2 coef Intercept 0.3574***
(0.0148) \n", + "3 fe day - \n", + "4 fe user - \n", + "5 stats Observations 1500000 \n", + "6 stats S.E. type iid \n", + "7 stats R2 0.0000 \n", + "8 stats R2 Within - \n", + "\n", + " 1 2 3 \n", + "0 0.1918***
(0.0288) 0.1913***
(0.0118) 0.1913***
(0.0287) \n", + "1 0.9992***
(0.0001) \n", + "2 0.3602***
(0.0021) \n", + "3 - x x \n", + "4 - x x \n", + "5 100000 3000000 3000000 \n", + "6 hetero hetero by: user \n", + "7 0.9987 0.9985 0.9985 \n", + "8 - 0.0001 0.0001 , _body=, _boxhead=Boxhead([ColInfo(var='level_0', type=, column_label='level_0', column_align='center', column_width=None), ColInfo(var='level_1', type=, column_label='level_1', column_align='center', column_width=None), ColInfo(var='0', type=, column_label='(1)', column_align='center', column_width=None), ColInfo(var='1', type=, column_label='(2)', column_align='center', column_width=None), ColInfo(var='2', type=, column_label='(3)', column_align='center', column_width=None), ColInfo(var='3', type=, column_label='(4)', column_align='center', column_width=None)]), _stub=, _spanners=Spanners([SpannerInfo(spanner_id='minutes_watched', spanner_level=1, spanner_label='minutes_watched', spanner_units=None, spanner_pattern=None, vars=['0', '2', '3'], built=None), SpannerInfo(spanner_id='minutes_post', spanner_level=1, spanner_label='minutes_post', spanner_units=None, spanner_pattern=None, vars=['1'], built=None)]), _heading=Heading(title=None, subtitle=None, preheader=None), _stubhead=None, _source_notes=['Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell:\\nCoefficient \\n (Std. Error)'], _footnotes=[], _styles=[], _locale=, _formats=[], _substitutions=[], _options=Options(table_id=OptionsInfo(scss=False, category='table', type='value', value=None), table_caption=OptionsInfo(scss=False, category='table', type='value', value=None), table_width=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_layout=OptionsInfo(scss=True, category='table', type='value', value='fixed'), table_margin_left=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_margin_right=OptionsInfo(scss=True, category='table', type='px', value='auto'), table_background_color=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_additional_css=OptionsInfo(scss=False, category='table', type='values', value=[]), table_font_names=OptionsInfo(scss=False, category='table', type='values', value=['-apple-system', 'BlinkMacSystemFont', 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', 'Helvetica Neue', 'Fira Sans', 'Droid Sans', 'Arial', 'sans-serif']), table_font_size=OptionsInfo(scss=True, category='table', type='px', value='16px'), table_font_weight=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_style=OptionsInfo(scss=True, category='table', type='value', value='normal'), table_font_color=OptionsInfo(scss=True, category='table', type='value', value='#333333'), table_font_color_light=OptionsInfo(scss=True, category='table', type='value', value='#FFFFFF'), table_border_top_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_top_style=OptionsInfo(scss=True, category='table', type='value', value='solid'), table_border_top_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_top_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_right_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_right_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_right_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), table_border_bottom_include=OptionsInfo(scss=False, category='table', type='boolean', value=True), table_border_bottom_style=OptionsInfo(scss=True, category='table', type='value', value='hidden'), table_border_bottom_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_bottom_color=OptionsInfo(scss=True, category='table', type='value', value='#A8A8A8'), table_border_left_style=OptionsInfo(scss=True, category='table', type='value', value='none'), table_border_left_width=OptionsInfo(scss=True, category='table', type='px', value='2px'), table_border_left_color=OptionsInfo(scss=True, category='table', type='value', value='#D3D3D3'), heading_background_color=OptionsInfo(scss=True, category='heading', type='value', value=None), heading_align=OptionsInfo(scss=True, category='heading', type='value', value='center'), heading_title_font_size=OptionsInfo(scss=True, category='heading', type='px', value='125%'), heading_title_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_subtitle_font_size=OptionsInfo(scss=True, category='heading', type='px', value='85%'), heading_subtitle_font_weight=OptionsInfo(scss=True, category='heading', type='value', value='initial'), heading_padding=OptionsInfo(scss=True, category='heading', type='px', value='4px'), heading_padding_horizontal=OptionsInfo(scss=True, category='heading', type='px', value='5px'), heading_border_bottom_style=OptionsInfo(scss=True, category='heading', type='value', value='solid'), heading_border_bottom_width=OptionsInfo(scss=True, category='heading', type='px', value='2px'), heading_border_bottom_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), heading_border_lr_style=OptionsInfo(scss=True, category='heading', type='value', value='none'), heading_border_lr_width=OptionsInfo(scss=True, category='heading', type='px', value='1px'), heading_border_lr_color=OptionsInfo(scss=True, category='heading', type='value', value='#D3D3D3'), column_labels_background_color=OptionsInfo(scss=True, category='column_labels', type='value', value=None), column_labels_font_size=OptionsInfo(scss=True, category='column_labels', type='px', value='100%'), column_labels_font_weight=OptionsInfo(scss=True, category='column_labels', type='value', value='normal'), column_labels_text_transform=OptionsInfo(scss=True, category='column_labels', type='value', value='inherit'), column_labels_padding=OptionsInfo(scss=True, category='column_labels', type='px', value='4px'), column_labels_padding_horizontal=OptionsInfo(scss=True, category='column_labels', type='px', value='5px'), column_labels_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), column_labels_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), column_labels_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), column_labels_border_top_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_top_width=OptionsInfo(scss=True, category='column_labels', type='px', value='2px'), column_labels_border_top_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_bottom_style=OptionsInfo(scss=True, category='column_labels', type='value', value='solid'), column_labels_border_bottom_width=OptionsInfo(scss=True, category='column_labels', type='px', value='0.5px'), column_labels_border_bottom_color=OptionsInfo(scss=True, category='column_labels', type='value', value='black'), column_labels_border_lr_style=OptionsInfo(scss=True, category='column_labels', type='value', value='none'), column_labels_border_lr_width=OptionsInfo(scss=True, category='column_labels', type='px', value='1px'), column_labels_border_lr_color=OptionsInfo(scss=True, category='column_labels', type='value', value='#D3D3D3'), column_labels_hidden=OptionsInfo(scss=False, category='column_labels', type='boolean', value=False), row_group_background_color=OptionsInfo(scss=True, category='row_group', type='value', value=None), row_group_font_size=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_font_weight=OptionsInfo(scss=True, category='row_group', type='value', value='initial'), row_group_text_transform=OptionsInfo(scss=True, category='row_group', type='value', value='inherit'), row_group_padding=OptionsInfo(scss=True, category='row_group', type='px', value='0px'), row_group_padding_horizontal=OptionsInfo(scss=True, category='row_group', type='px', value='5px'), row_group_border_top_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_top_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_top_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_right_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_right_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_right_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_border_bottom_style=OptionsInfo(scss=True, category='row_group', type='value', value='solid'), row_group_border_bottom_width=OptionsInfo(scss=True, category='row_group', type='px', value='0.5px'), row_group_border_bottom_color=OptionsInfo(scss=True, category='row_group', type='value', value='black'), row_group_border_left_style=OptionsInfo(scss=True, category='row_group', type='value', value='none'), row_group_border_left_width=OptionsInfo(scss=True, category='row_group', type='px', value='1px'), row_group_border_left_color=OptionsInfo(scss=True, category='row_group', type='value', value='white'), row_group_as_column=OptionsInfo(scss=False, category='row_group', type='boolean', value=False), table_body_hlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_hlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='1px'), table_body_hlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='#D3D3D3'), table_body_vlines_style=OptionsInfo(scss=True, category='table_body', type='value', value='none'), table_body_vlines_width=OptionsInfo(scss=True, category='table_body', type='px', value='0px'), table_body_vlines_color=OptionsInfo(scss=True, category='table_body', type='value', value='white'), table_body_border_top_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_top_width=OptionsInfo(scss=True, category='table_body', type='px', value='0.5px'), table_body_border_top_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), table_body_border_bottom_style=OptionsInfo(scss=True, category='table_body', type='value', value='solid'), table_body_border_bottom_width=OptionsInfo(scss=True, category='table_body', type='px', value='2px'), table_body_border_bottom_color=OptionsInfo(scss=True, category='table_body', type='value', value='black'), data_row_padding=OptionsInfo(scss=True, category='data_row', type='px', value='4px'), data_row_padding_horizontal=OptionsInfo(scss=True, category='data_row', type='px', value='5px'), stub_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_border_style=OptionsInfo(scss=True, category='stub', type='value', value='hidden'), stub_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), stub_row_group_background_color=OptionsInfo(scss=True, category='stub', type='value', value=None), stub_row_group_font_size=OptionsInfo(scss=True, category='stub', type='px', value='100%'), stub_row_group_font_weight=OptionsInfo(scss=True, category='stub', type='value', value='initial'), stub_row_group_text_transform=OptionsInfo(scss=True, category='stub', type='value', value='inherit'), stub_row_group_border_style=OptionsInfo(scss=True, category='stub', type='value', value='solid'), stub_row_group_border_width=OptionsInfo(scss=True, category='stub', type='px', value='2px'), stub_row_group_border_color=OptionsInfo(scss=True, category='stub', type='value', value='#D3D3D3'), source_notes_padding=OptionsInfo(scss=True, category='source_notes', type='px', value='4px'), source_notes_padding_horizontal=OptionsInfo(scss=True, category='source_notes', type='px', value='5px'), source_notes_background_color=OptionsInfo(scss=True, category='source_notes', type='value', value=None), source_notes_font_size=OptionsInfo(scss=True, category='source_notes', type='px', value='90%'), source_notes_border_bottom_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_bottom_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_bottom_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_border_lr_style=OptionsInfo(scss=True, category='source_notes', type='value', value='none'), source_notes_border_lr_width=OptionsInfo(scss=True, category='source_notes', type='px', value='2px'), source_notes_border_lr_color=OptionsInfo(scss=True, category='source_notes', type='value', value='#D3D3D3'), source_notes_multiline=OptionsInfo(scss=False, category='source_notes', type='boolean', value=True), source_notes_sep=OptionsInfo(scss=False, category='source_notes', type='value', value=' '), row_striping_background_color=OptionsInfo(scss=True, category='row', type='value', value='rgba(128,128,128,0.05)'), row_striping_include_stub=OptionsInfo(scss=False, category='row', type='boolean', value=False), row_striping_include_table_body=OptionsInfo(scss=False, category='row', type='boolean', value=False), container_width=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_height=OptionsInfo(scss=False, category='container', type='px', value='auto'), container_padding_x=OptionsInfo(scss=False, category='container', type='px', value='0px'), container_padding_y=OptionsInfo(scss=False, category='container', type='px', value='10px'), container_overflow_x=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), container_overflow_y=OptionsInfo(scss=False, category='container', type='overflow', value='auto'), quarto_disable_processing=OptionsInfo(scss=False, category='quarto', type='logical', value=False), quarto_use_bootstrap=OptionsInfo(scss=False, category='quarto', type='logical', value=False)), _has_built=False)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pf.etable(\n", + " [fit_difference_in_means, fit_cuped, fit_panel, fit_panel_crv],\n", + " digits=4,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "d78bd300", + "metadata": {}, + "source": [ + "The panel errors produces point estimates and standard errors for the treatment variable that are very close to CUPED. " + ] + }, + { + "cell_type": "markdown", + "id": "9f4e3bc4", + "metadata": {}, + "source": [ + "The sceptical reader might now object that this is all far from overwhelming evidence. After all , we fit each model only once! So it might well be \n", + "that we were just lucky and our results are driven by sampling error. In the next section, we will argue that this is not the case by means of a monte carlo simulation. " + ] + }, + { + "cell_type": "markdown", + "id": "bc9cf093", + "metadata": {}, + "source": [ + "### Digression: Heterogeneous Effects over Time via Panel Estimators" + ] + }, + { + "cell_type": "markdown", + "id": "bcf15cb8", + "metadata": {}, + "source": [ + "One example of the panel estimator is that it allows us to track treatment effects over time. This allows us to track novelty effects where in the beginning, \n", + "users interact a lot with a new feature, i.e. they spend a lot of time playing a gamee, but over time lose interest. " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "97d0a67c", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fit_panel_time = pf.feols(\n", + " \"minutes_watched ~ i(day, ever_treated, ref = 14) | user + day\",\n", + " vcov={\"CRV1\": \"user\"},\n", + " data=data,\n", + " demeaner_backend=\"rust\",\n", + ")\n", + "fit_panel_time.iplot(\n", + " coord_flip=False,\n", + " plot_backend=\"matplotlib\",\n", + " cat_template=\"{value}\",\n", + " title=\"Event Study Plot\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "bf038777", + "metadata": {}, + "source": [ + "Initially, we observe a large treatment effect that reverts to a null effect on day 23: a clear novelty effect pattern. Clearly, we should not accept this \n", + "experiment as it only adds technical complexity, but has no impact on user behavior beyond the initial effect. When ignoring time heterogeneity (as with the difference in means, cuped, and ATE panel estimators above), we completely miss that the effect fades out. With a dynamic panel estimator, we get it almost for free. \n", + "\n", + "For more examples, see the paper by [Lal et al](https://arxiv.org/abs/2410.09952), which also shows how to estimate very large panel models in [SQL/duckdb](https://github.com/py-econometrics/duckreg) (yep, that's not a joke!). " + ] + }, + { + "cell_type": "markdown", + "id": "41afbbe7", + "metadata": {}, + "source": [ + "## Monte Carlo Simulation" + ] + }, + { + "cell_type": "markdown", + "id": "2a4d8a02", + "metadata": {}, + "source": [ + "To rule out that the examples above were purely do to look, we simply repeat them a couple of times using a monte carlo simulation. " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "edd54613", + "metadata": {}, + "outputs": [], + "source": [ + "def _variance_monte_carlo(data):\n", + " fit_dim = _difference_in_means(data)\n", + " fit_cuped, _ = _regression_cuped(data)\n", + " fit_panel = pf.feols(\n", + " \"minutes_watched ~ treat | user + day\",\n", + " data=data,\n", + " vcov={\"CRV1\": \"user\"},\n", + " demeaner_backend=\"rust\",\n", + " )\n", + "\n", + " dim_se = fit_dim.tidy().xs(\"treat\")[\"Std. Error\"]\n", + " cuped_se = fit_cuped.tidy().xs(\"treat\")[\"Std. Error\"]\n", + " panel_se = fit_panel.tidy().xs(\"treat\")[\"Std. Error\"]\n", + "\n", + " return dim_se, cuped_se, panel_se\n", + "\n", + "\n", + "def _run_simulation(N, sigma_unit, n_sim=100):\n", + " dim_se_arr = np.zeros(n_sim)\n", + " cuped_se_arr = np.zeros(n_sim)\n", + " panel_se_arr = np.zeros(n_sim)\n", + "\n", + " for i in tqdm(range(n_sim)):\n", + " data = get_sharkfin(\n", + " num_units=N,\n", + " num_periods=30,\n", + " num_treated=int(N / 2),\n", + " treatment_start=15,\n", + " seed=i,\n", + " sigma_unit=sigma_unit,\n", + " )\n", + " data.rename(\n", + " columns={\"Y\": \"minutes_watched\", \"unit\": \"user\", \"year\": \"day\"},\n", + " inplace=True,\n", + " )\n", + "\n", + " dim_se_arr[i], cuped_se_arr[i], panel_se_arr[i] = _variance_monte_carlo(data)\n", + "\n", + " return pd.Series(\n", + " {\n", + " \"Difference-in-Means\": np.mean(dim_se_arr),\n", + " \"Cuped\": np.mean(cuped_se_arr),\n", + " \"Panel\": np.mean(panel_se_arr),\n", + " }\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4b0fd4eb", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 20/20 [01:30<00:00, 4.53s/it]\n" + ] + }, + { + "data": { + "text/plain": [ + "Difference-in-Means 0.029457\n", + "Cuped 0.004184\n", + "Panel 0.004185\n", + "dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# note that in a proper scientific simulation, we might want to set the\n", + "# number of iterations to be much higher than 10\n", + "se_sim_18 = _run_simulation(N=100_000, sigma_unit=18, n_sim=20)\n", + "display(se_sim_18)" + ] + }, + { + "cell_type": "markdown", + "id": "c24d0c3e", + "metadata": {}, + "source": [ + "So we manage to reduce our standard errors by around 6x! This is pretty fantastic news, as it implies that we can run our experiment on a much smaller sample, but still achieve the same level of power! This is what we want to verify in the last step. " + ] + }, + { + "cell_type": "markdown", + "id": "671ca0d9", + "metadata": {}, + "source": [ + "## Power Simulation \n", + "\n", + "In all simulations, we conduct a two-sided t-test with size $\\alpha = 0.01$. " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "e04bee49", + "metadata": {}, + "outputs": [], + "source": [ + "def _power(nobs, n_sim):\n", + " res_df = pd.DataFrame()\n", + "\n", + " for N in nobs:\n", + " dim_reject_null_arr = np.zeros(n_sim)\n", + " cuped_reject_null_arr = np.zeros(n_sim)\n", + " panel_reject_null_arr = np.zeros(n_sim)\n", + "\n", + " for i in tqdm(range(n_sim)):\n", + " data = get_sharkfin(\n", + " num_units=N,\n", + " num_periods=30,\n", + " num_treated=int(N / 2),\n", + " treatment_start=15,\n", + " seed=i,\n", + " sigma_unit=18,\n", + " )\n", + " data.rename(\n", + " columns={\"Y\": \"minutes_watched\", \"unit\": \"user\", \"year\": \"day\"},\n", + " inplace=True,\n", + " )\n", + "\n", + " fit_dim = _difference_in_means(data)\n", + " fit_cuped, _ = _regression_cuped(data)\n", + " fit_panel = pf.feols(\n", + " \"minutes_watched ~ treat | user + day\", data=data, vcov={\"CRV1\": \"user\"}\n", + " )\n", + "\n", + " dim_reject_null_arr[i] = fit_dim.pvalue().xs(\"treat\") < 0.01\n", + " cuped_reject_null_arr[i] = fit_cuped.pvalue().xs(\"treat\") < 0.01\n", + " panel_reject_null_arr[i] = fit_panel.pvalue().xs(\"treat\") < 0.01\n", + "\n", + " dim_reject_null_mean = np.mean(dim_reject_null_arr)\n", + " cuped_reject_null_mean = np.mean(cuped_reject_null_arr)\n", + " panel_reject_null_mean = np.mean(panel_reject_null_arr)\n", + "\n", + " res = pd.DataFrame(\n", + " {\n", + " \"N\": [N],\n", + " \"Difference-in-Means\": [dim_reject_null_mean],\n", + " \"Cuped\": [cuped_reject_null_mean],\n", + " \"Panel\": [panel_reject_null_mean],\n", + " }\n", + " )\n", + "\n", + " res_df = pd.concat([res_df, res], axis=0, ignore_index=True)\n", + "\n", + " return res_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9265641", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 10/10 [00:30<00:00, 3.01s/it]\n", + "100%|██████████| 10/10 [00:07<00:00, 1.41it/s]\n", + "100%|██████████| 10/10 [00:06<00:00, 1.46it/s]\n", + "100%|██████████| 10/10 [00:10<00:00, 1.02s/it]\n", + "100%|██████████| 10/10 [00:12<00:00, 1.26s/it]\n", + "100%|██████████| 10/10 [00:53<00:00, 5.31s/it]\n" + ] + } + ], + "source": [ + "# TODO: set n_sim higher\n", + "power_df = _power(nobs=[100, 500, 1000, 5_000, 10_000, 100_000], n_sim=10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6211f87d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NDifference-in-MeansCupedPanel
01000.50.10.1
15000.40.90.9
210000.71.01.0
350000.51.01.0
4100000.61.01.0
51000000.91.01.0
\n", + "
" + ], + "text/plain": [ + " N Difference-in-Means Cuped Panel\n", + "0 100 0.5 0.1 0.1\n", + "1 500 0.4 0.9 0.9\n", + "2 1000 0.7 1.0 1.0\n", + "3 5000 0.5 1.0 1.0\n", + "4 10000 0.6 1.0 1.0\n", + "5 100000 0.9 1.0 1.0" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "display(power_df)" + ] + }, + { + "cell_type": "markdown", + "id": "85cecc4d", + "metadata": {}, + "source": [ + "CUPED and the panel estimator achieve 90% power with 500 observations, while the difference in means-estimator requires a sample that is orders of magnitude larger. \n", + "To conclude, we should mention that these effects stem from a stylized exmaple data set - in practice, the gains from CUPED / panel methods for variance reduction are much lower - \n", + "companies with access to high quality panel data report reductions of 10% to 50% (see e.g. this paper by [Microsoft Bing](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf))." + ] + }, + { + "cell_type": "markdown", + "id": "9f2394bc", + "metadata": {}, + "source": [ + "## When does CUPED not work? \n", + "\n", + "CUPED only works when we explain a lot of \"residual\" variation, which in our example data set is controlled by the `sigma_unit` parameter. Let's see what happens if we drastically reduce it - we now change the parameter from `18` to `4`." + ] + }, + { + "cell_type": "markdown", + "id": "8a776941", + "metadata": {}, + "source": [ + "We repeat the standard error monte carlo simulation, but set `sigma_unit = 4`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "9e308933", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 10/10 [00:48<00:00, 4.89s/it]\n" + ] + } + ], + "source": [ + "se_sim_1 = _run_simulation(N=100_000, sigma_unit=4, n_sim=10)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "16388f84", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Difference-in-Means 0.006852\n", + "Cuped 0.004174\n", + "Panel 0.004186\n", + "dtype: float64" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(se_sim_1)" + ] + }, + { + "cell_type": "markdown", + "id": "00fdf6ac", + "metadata": {}, + "source": [ + "All of a sudden, the difference-in-means estimator has lower variance!" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "f9b04373", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Difference-in-Means 0.029457\n", + "Cuped 0.004184\n", + "Panel 0.004185\n", + "dtype: float64" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(se_sim_18)" + ] + }, + { + "cell_type": "markdown", + "id": "d78b86e9", + "metadata": {}, + "source": [ + "The reason we do not achieve variance reduction this time is that the addition of pre-treatment covariates / unit fixed effects simply explains less unobserved variance than before. " + ] + }, + { + "cell_type": "markdown", + "id": "e6f98862", + "metadata": {}, + "source": [ + "## Other useful links\n", + "\n", + "- [Matteo Courthoud's great blog post on CUPED](https://matteocourthoud.github.io/post/cuped/), goes through some theory, compares CUPED with other estimators, runs simulations studies, and summarizes the literature. \n", + "- [Lal et al on panel experiments](https://arxiv.org/abs/2410.09952) explains why using panel estimators might be a good idea when analysing AB tests, and shows how it can be done efficiently in SQL. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "docs", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}