Balance Checking¶
When we finish the coarsened exact matching, it is necessary to evaluate the quality of the matching with balance checking methods. When the covariate balance is achieved, the resulting effect estimate is less sensitive to model misspecification and ideally close to true treatment effect (Greifer, 2023). Otherwise, you should fine-tune your coarsening schema further or consider collecting more data.
The following imbalance checking methods are provded:
1️⃣ L1 imbalance score¶
‘L1’: Calculate and return the L1 imbalance score.
L1 imbalance score was introduced by Iacus et al. (2011), and it is used to measure the difference between two multivariate distributions.
\[\mathcal L_{1}(f,g) = \frac{1}{2} \sum_{l_{1}, \dots, l_{k}} \left| f_{l_{1}, \dots, l_{k}} - g_{l_{1}, \dots, l_{k}} \right|\]Here we cross-tabulate the discretized variables as \(X_1, \dots, X_k\) for the treated and control groups separately, and record the \(k\)-dimensional relative frequencies for the treated \(f_{l_{1}, \dots, l_{k}}\) and control \(g_{l_{1}, \dots, l_{k}}\) units.
Advantages
It can look at the entire joint distribution of the covariate space at the same time.
Limitations
It is dependent on the granularity of the categories.
There is not an exact criterion on whether an imbalance score is good enough.
Example
from CEM_LinearInf.balance import balance my_balance = balance(df_match = my_cem.matched_df, # matched dataframe df_all = my_cem.df, # original dataframe confounder_cols = my_cem.confounder_cols, # list of column names of confounders cont_confounder_cols = my_cem.cont_confounder_cols, # list of column names of continuous confounders col_y = 'Y', # column name of result variable col_t = 'T') # column name of treatment variable l1_before, l1_after = my_balance.balance_assessing(method = 'L1')
L1 imbalance score before matching: 0.6316 L1 imbalance score after matching: 0.2895
2️⃣ Standardized Mean Difference¶
‘smd’: Print the standardized mean difference (SMD) summary table and plots of confounders.
SMD is a common way to measure the balance for a single covariate \(X\). It can be interpreted as the distance between the means of the two groups in terms of the standard deviation of the covariate’s distribution (Zhang, et. al. 2019).
\[SMD = \frac{\bar{X}_T-\bar{X}_C}{\sqrt{(S_T^2+S_C^2)/2}}, \bar{X}_T = \frac{\sum_{i \in T}w_{i}X_{i}}{\sum_{i \in T}w_{i}}, S_{T}^2 = \frac{\sum{w_{i}}}{(\sum{w_{i}})^2 - \sum{w_{i}^2}} \sum_{i \in T}w_{i} (X_{i} - \bar{X}_T)^2\]
Advantages
Helps you to identify which confounder is imbalanced.
Limitations
It is a measure of balance for a single covariate, and does not take interactions between covariates into account. It’s possible to have balance for each covariate by itself, but not have balance jointly.
Example
from CEM_LinearInf.balance import balance
my_balance = balance(df_match = my_cem.matched_df, # matched dataframe
df_all = my_cem.df, # original dataframe
confounder_cols = my_cem.confounder_cols, # list of column names of confounders
cont_confounder_cols = my_cem.cont_confounder_cols, # list of column names of continuous confounders
col_y = 'Y', # column name of result variable
col_t = 'T') # column name of treatment variable
my_balance = balance(my_cem.matched_df, my_cem.df, my_cem.confounder_cols, my_cem.cont_confounder_cols)
my_balance.balance_assessing(method = 'smd')
SMD Result
Balance measures
Treated Mean Control Mean SMD Variance Ratio SMD.Threshold(<0.1) \
X1 0.1755 0.0924 0.0956 0.9329 Balanced
X2 -0.1462 -0.1407 -0.0062 1.0103 Balanced
X3 0.1375 0.1331 0.0049 0.9924 Balanced
X7 0.5304 0.5304 0.0000 . Balanced
X9 1.6660 1.6660 -0.0000 . Balanced
Var.Threshold(<2)
X1 Balanced
X2 Balanced
X3 Balanced
X7 .
X9 .
-------------------------
Balance tally for SMD
count
SMD.Threshold(<0.1)
Balanced 5
------------------------------
Variable with the max SMD:
SMD SMD.Threshold(<0.1)
X1 0.0956 Balanced
------------------------------------
Balance tally for Variance ratio
count
Var.Threshold(<2)
Balanced 3
-----------------------------------------
Variable with the max variance ratio:
Variance Ratio Var.Threshold(<2)
X2 1.0103 Balanced
-----------------------------------------
3️⃣ Kolmogorov-Smirnov Statistics¶
‘ks’: Plot Kolmogorov-Smirnov Statistics of confounders before and after matching.
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution functions of two samples and can be used to measure the similarity of these two distributions. In our situation, we can use K-S score to measure the similarity between the treated group and control group.
Advantages
Helps you to identify which confounder is imbalanced.
Limitations
It is a measure of balance for a single covariate, and does not take interactions between covariates into account. It’s possible to have balance for each covariate by itself, but not have balance jointly.
Example
from CEM_LinearInf.balance import balance my_balance = balance(df_match = my_cem.matched_df, # matched dataframe df_all = my_cem.df, # original dataframe confounder_cols = my_cem.confounder_cols, # list of column names of confounders cont_confounder_cols = my_cem.cont_confounder_cols, # list of column names of continuous confounders col_y = 'Y', # column name of result variable col_t = 'T') # column name of treatment variable my_balance.balance_assessing(method = 'ks')
4️⃣ Density Plot¶
‘density’: Return density plots of confounders before and after matching.
The density plot can be an intuitive and helpful tool for deciding whether adjustment has yielded similar distributions between the groups for given covariates.
Example
from CEM_LinearInf.balance import balance my_balance = balance(df_match = my_cem.matched_df, # matched dataframe df_all = my_cem.df, # original dataframe confounder_cols = my_cem.confounder_cols, # list of column names of confounders cont_confounder_cols = my_cem.cont_confounder_cols, # list of column names of continuous confounders col_y = 'Y', # column name of result variable col_t = 'T') # column name of treatment variable my_balance.balance_assessing(method = 'density')
5️⃣ Empirical Cumulative Density Plot¶
‘ecdf’: Return empirical cumulative density plots of confounders before and after matching.
The empirical cumulative density plot can be an intuitive and helpful tool for deciding whether adjustment has yielded similar distributions between the groups for given covariates.
Example
from CEM_LinearInf.balance import balance my_balance = balance(df_match = my_cem.matched_df, # matched dataframe df_all = my_cem.df, # original dataframe confounder_cols = my_cem.confounder_cols, # list of column names of confounders cont_confounder_cols = my_cem.cont_confounder_cols, # list of column names of continuous confounders col_y = 'Y', # column name of result variable col_t = 'T') # column name of treatment variable my_balance.balance_assessing(method = 'ecdf')
⭐️ Reference¶
Greifer N (2023). cobalt: Covariate Balance Tables and Plots. https://github.com/ngreifer/cobalt.
Iacus, S. M., King, G., and Porro, G. (2011). Multivariate Matching Methods That are Monotonic Imbalance Bounding. Journal of the American Statistical Association, 106(493), 345-361. Retrieved from https://tinyurl.com/y6pq3fyl
Standardized mean difference (SMD) in causal inference. (2021, Oct 31). Retrieved from https://statisticaloddsandends.wordpress.com/2021/10/31/standardized-mean-difference-smd-in-causal-inference/
What is the L1 imbalance measure in causal inference? (2021, Nov 25). Retrieved from https://statisticaloddsandends.wordpress.com/2021/11/25/what-is-the-l1-imbalance-measure-in-causal-inference/
Zhang, Z., Kim, H. J., Lonjon, G., Zhu, Y., & written on behalf of AME Big-Data Clinical Trial Collaborative Group (2019). Balance diagnostics after propensity score matching. Annals of translational medicine, 7(1), 16. https://doi.org/10.21037/atm.2018.12.10