Causal inference is the study of quantifying whether a treatment, policy, or an intervention, denoted as A, has a causal effect on an outcome interest, denoted as Y. What distinguishes a causal effect of A on Y from an associative effect of A on Y, say by computing the correlation between A and Y, is that under a causal effect, intervening on the treatment A leads to changes in the outcome Y. Hence, a causal effect is a stronger notion of a relationship between A and Y than an associative effect.
A central problem in estimating causal effects is dealing with unmeasured confounding, that is dealing with other potential explanations for the effect of a treatment on the outcome which are not measured in the data. For example, to study the causal effect of education on earnings, all possible factors that may influence both education and earnings must be considered to establish causality. One potential factor of concern is the individual’s environment. For example, if an individual grew up in an affluent neighborhood with high quality schools and many opportunities for employment, the environment may have shaped the individual’s desire to pursue more education and to seek high-paying jobs. As such, intervening on education, say by passing education policies that encourage students to go to college, may have minimal to no effect on future earnings, especially if the student’s neighborhood environment remains unchanged; instead, implementing policies that improve neighborhoods would lead to increases in both education and earnings. Because quantifying one’s living environment is generally difficult, this type of unmeasured confounder is always a major concern in causal inference.
The rise of big data has brought renewed hope in dealing with unmeasured confounding. In particular, big data typically provides both rich and large measurements of each individual’s characteristics, providing a greater opportunity to measure unmeasured confounders. Developing methods that effectively utilize big data to draw causal conclusions is a vigorously growing area of research. Broadly speaking, much of the effort has been devoted to correctly apply machine learning techniques to better parse big data from an optimization and computational standpoint and to produce honest causal conclusions, say in the form of adaptive confidence intervals after fitting complex functionals with machine learning techniques.