Efficient covariate adjustment in blocked randomized experiments

Background

In a blocked randomized experiment, subjects are divided into groups known as blocks, and then within each block, some fraction of the subjects are selected (typically uniformly at random) to receive an experimental treatment. As with any experiment, a typical objective is to understand the average effect of this treatment on some outcome of interest. As a motivating real-world example, in 2022-23 the state of Rhode Island ran a blocked randomized experiment to evaluate the effectiveness of their Reemployment Services and Eligibility Assessment (RESEA) program in improving the economic outcomes (e.g., wages) of recipients of unemployment insurance (UI) benefits in the state. The RESEA program consists of meetings with state-appointed career counselors to prevent fraud and to assist UI claimants with their job search. There is a natural blocked structure to the experiment, since UI applications are processed in batches every week; the subjects in each week form a block.

A challenge with many blocked experiments is that the number of subjects in each block, as well as the proportion of subjects treated in each block, varies across blocks. This has implications on the statistical qualities of popular average treatment effect estimates. For instance, a simple “fixed effects” linear regression, which is still popularly used, is known to be biased for the average treatment effect when the block sizes and/or treatment proportions vary. This means that even with an infinite sample size, the resulting estimate will not equal the “true” average treatment effect. While it is known how to correct for this bias (and many studies do so), there do not exist clear guidelines for how to make the estimate as efficient as possible --- i.e., have the smallest possible variance, subject to being (approximately) unbiased. A key source of variance reduction can come from observing additional variables (covariates) for each subject, and “adjusting” for them in a principled way. In the Rhode Island data, available covariates include prior earnings and various demographic information (e.g., education level, self-reported race, age).

The goal of this project will be to say something about how to do covariate adjustment as efficiently as possible in blocked experiments. We can explore various estimators, taking the vast existing literature on covariate adjustment in non-blocked experiments as inspiration, and compare them based on both theoretical asymptotic variance calculations as well as empirical performance in finite samples (in simulations and data examples, including possibly the Rhode Island RESEA dataset). 

If a new estimator is developed with lower variance than existing methods, it would make a highly impactful methodological contribution to the research community across disciplinary lines. If not, it would still be very useful to comprehensively compare and contrast various estimators, providing concrete guidelines for applied researchers. With interest, this could become a substantial contribution to statistical software (e.g., an R package).

Scope

Ideally, this work would involve 1-2 undergraduate students at Harvey Mudd College (HMC) or the other 5 C’s. Due to the necessary investment to get acquainted with the literature and fundamental mathematical principles of causal inference, it is highly preferred for students to be able to both enroll in a 2 credit independent study (6-8 hours/week) during spring 2026 and to work on this project full time during summer 2026 for 10 weeks (with stipend + housing support). However, applicants available for only one or the other period will still be considered.

Essay

Please write no more than 500 words describing your interest in mathematical statistics and in this project more specifically. Also, please let me know how this project would be useful to your future career and/or personal goals, however nebulous they may be now. I’m not looking for anything specific here, I just want to get to know what excites you!

Additionally, please be sure to briefly describe the following information in your application (does not count towards word limit, and does not need to “flow” with the above paragraphs):

  • Your background in probability (e.g., grade in MATH 62, 151, and/or 157) and any additional statistics courses you may have taken
  • Whether you can commit to either spring or summer only, or to both spring and 10 weeks in the summer
  • How you feel about doing long calculations on paper, and how you feel about doing data analysis in R. Please be honest; you do not need to absolutely love both of these things.
Name of research group, project, or lab
The Stat Methods Lab
Why join this research group or lab?

This will be an opportunity to gain insight into the nature of modern statistics research, so you can get some firsthand experience to determine whether you would potentially be interested in pursuing graduate studies in statistics and/or statistical research roles in government and industry. The methodological nature of the project has the potential to be extremely valuable to researchers in many disciplines that use blocked randomized experiments. More broadly, you will develop the valuable skill of thinking precisely and mathematically about the goals of a randomized experiment and how to come up with (and evaluate) good statistical methods to achieve these goals.

Logistics Information:
Project categories
Mathematics
Data Science
Statistical Modeling
Student ranks applicable
Sophomore
Junior
Student qualifications

As this project involves diving into the mathematical properties of various statistical estimators, a background in mathematical probability, e.g. as evidenced by good performance in MATH 151 or MATH 157, is required. Additional upper-level statistics coursework, particularly MATH 152 (inference) or MATH 158 (statistical linear models), will be helpful but not required. Some experience programming for data analysis (preferably in R) is also highly useful.

Time commitment
Spring - Part Time
Summer - Full Time
Compensation
Academic Credit
Paid Research
Number of openings
2
Techniques learned
  • Students will gain a mathematical understanding of the modern theory of causal inference in randomized experiments, in particular how to compute and compare (asymptotic) variances of different estimators
  • Students will learn to design and implement numerical simulations to evaluate the properties of statistical estimators (and associated confidence intervals)
  • Students will develop the ability to understand and scrutinize both methodological and applied research papers in statistics and causal inference, and synthesize results and ideas from a body of work that spans many different disciplines
  • Students will strengthen their written and oral skills to communicate their findings to audiences with a wide range of statistical backgrounds
Project start
Spring 2026
Contact Information:
Mentor
harrli@hmc.edu
Principal Investigator
Name of project director or principal investigator
Harrison Li
Email address of project director or principal investigator
harrli@hmc.edu
2 sp. | 9 appl.
Hours per week
Spring - Part Time (+1)
Spring - Part TimeSummer - Full Time
Project categories
Statistical Modeling (+2)
MathematicsData ScienceStatistical Modeling