Causal inference with two versions of treatment

Accepted at Journal of Educational and Behavioral Statistics ,2020

Recommended citation: Hasegawa, R.B., Deshpande, S.K., Small, D.S., Rosenbaum, P.R. (2020). "Causal inference with two versions of treatment." Journal of Educational and Behavioral Statistics .

Abstract : Causal effects are commonly defined as comparisons of the potential outcomes under treatment and control, but this definition is threatened by the possibility that either the treatment or the control condition is not well defined, existing instead in more than one version. This is often a real possibility in nonexperimental or observational studies of treatments because these treatments occur in the natural or social world without the laboratory control needed to ensure identically the same treatment or control condition occurs in every instance. We consider the simplest case: Either the treatment condition or the control condition exists in two versions that are easily recognized in the data but are of uncertain, perhaps doubtful, relevance, for example, branded Advil versus generic ibuprofen. Common practice does not address versions of treatment: Typically, the issue is either ignored or explicitly stated but assumed to be absent. Common practice is reluctant to address two versions of treatment because the obvious solution entails dividing the data into two parts with two analyses, thereby (a) reducing power to detect versions of treatment in each part, (b) creating problems of multiple inference in coordinating the two analyses, and (c) failing to report a single primary analysis that uses everyone.

We propose and illustrate a new method of analysis that begins with a single primary analysis of everyone that would be correct if the two versions do not differ, adds a second analysis that would be correct were there two different effects for the two versions, controls the family-wise error rate in all assertions made by the several analyses, and yet pays no price in power to detect a constant treatment effect in the primary analysis of everyone. Our method can be applied to analyses of constant additive treatment effects on continuous outcomes. Unlike conventional simultaneous inferences, the new method is coordinating several analyses that are valid under different assumptions, so that one analysis would never be performed if one knew for certain that the assumptions of the other analysis are true. It is a multiple assumptions problem rather than a multiple hypotheses problem. We discuss the relative merits of the method with respect to more conventional approaches to analyzing multiple comparisons. The method is motivated and illustrated using a study of the possibility that repeated head trauma in high school football causes an increase in risk of early onset cognitive decline.


The journal version of the paper is available at [DOI: 10.3102/1076998620914003].

A pre-print version of the paper is available at arXiv:1705.03918