Causal Inference with Big Data

(06 Dec 2021–23 Dec 2021)

Organizing Committee



Contact Information

General Enquiries: ims(AT)
Scientific Aspects Enquiries: loh(AT)


Causal inference is the study of quantifying whether a treatment, policy, or an intervention, denoted as A, has a causal effect on an outcome interest, denoted as Y. What distinguishes a causal effect of A on Y from an associative effect of A on Y, say by computing the correlation between A and Y, is that under a causal effect, intervening on the treatment A leads to changes in the outcome Y. Hence, a causal effect is a stronger notion of a relationship between A and Y than an associative effect.

A central problem in estimating causal effects is dealing with unmeasured confounding, that is dealing with other potential explanations for the effect of a treatment on the outcome which are not measured in the data. For example, to study the causal effect of education on earnings, all possible factors that may influence both education and earnings must be considered to establish causality. One potential factor of concern is the individual’s environment. For example, if an individual grew up in an affluent neighborhood with high quality schools and many opportunities for employment, the environment may have shaped the individual’s desire to pursue more education and to seek high-paying jobs. As such, intervening on education, say by passing education policies that encourage students to go to college, may have minimal to no effect on future earnings, especially if the student’s neighborhood environment remains unchanged; instead, implementing policies that improve neighborhoods would lead to increases in both education and earnings. Because quantifying one’s living environment is generally difficult, this type of unmeasured confounder is always a major concern in causal inference.

The rise of big data has brought renewed hope in dealing with unmeasured confounding. In particular, big data typically provides both rich and large measurements of each individual’s characteristics, providing a greater opportunity to measure unmeasured confounders. Developing methods that effectively utilize big data to draw causal conclusions is a vigorously growing area of research. Broadly speaking, much of the effort has been devoted to correctly apply machine learning techniques to better parse big data from an optimization and computational standpoint and to produce honest causal conclusions, say in the form of adaptive confidence intervals after fitting complex functionals with machine learning techniques.


Tutorial on Regression Tree Methods with Emphasis on Big Data, Missing Values, Propensity Score Estimation, and Causal Inference for Randomized Experiments. Wei-Yin Loh, University of Wisconsin-Madison, USA6–15 December 2021 N/A
Workshop Talks by Invited Speakers16–23 December 2021N/A


IMS Auditorium

List of Participants

Maria Allayioti University of Southern California, United States of America
Zheng Xin Chai DSO National Laboratories, Singapore
Wei Jie Chee DSO National Laboratories, Singapore
Jeremy Chen National University of Singapore, Singapore
Minhao Benjamin Chen The University of Hong Kong, Hong Kong
Jon Huang Singapore Institute for Clinical Sciences, Singapore
Ta-Cheng Huang National University of Singapore, Singapore
Hyunseung Kang University of Wisconsin-Madison, USA
Jialiang Li National University of Singapore, Singapore
Wei-Yin Loh University of Wisconsin-Madison, USA
Aseem Pahuja National University of Singapore, Singapore
The Hanh Pham Ngee Ann Polytechnic, Singapore
Leyla Ranjbari Universiti Tunku Abdul Rahman, Malaysia
Baoluo Sun National University of Singapore, Singapore
Chin Gee Jacky Tan National University of Singapore, Singapore
Scroll to Top