Statistical Machine Learning for High Dimensional Data

(13 May 2024–31 May 2024)

Organizing Committee





  • Cheng Li  (National University of Singapore)

Contact Information

General Enquiries: ims-enquiry(AT)

Scientific Aspects Enquiries: or loh(AT)


Rapid developments in modern technologies have made large-scale data readily available in daily life. High-dimensional problems focus on datasets in which the number of parameters is magnitudes higher than the number of observations. Recent years have seen an upsurge of interest in high-dimensional data analysis. The practical successes, however, have uncovered a myriad of new challenges.

  • Old premises fail in new settings. High-dimensional models and algorithms can yield unexpected behaviors that challenge classical statistical premises. Notable examples include double descent (no bias-variance tradeoff in over-parameterized prediction) and benign non-convexity (no spurious local optima in matrix/tensor factorization). Classical statistical premises need be revised to address the limitations of the new settings.
  • Various notions of complexity emerge. Complexity is the fundamental tool in modern data science to characterize the difficulty of learning tasks. Statisticians use sample complexity to measure the number of observations needed for accurate estimation. Computer scientists use computational complexity to measure the number of arithmetic operations required in their algorithms. The fundamental challenges of high-dimensional problems lie in the computational- statistical tradeoffs, in which sample complexity hinges on and interacts with computational complexity. Current mechanisms for optimizing these tradeoffs are limited.
  • One paradigm may not fit all. Off-the-shelf data analytics tools are increasingly challenged by domain applications. Parametric models facilitate statistical inference but often lack robustness; nonparametric models can increase prediction accuracy but they are often more difficult to interpret; classical asymptotic theory holds only for global optima but is brittle for local optima. In these situations, we need to characterize the regimes for which a learning approach succeeds or fails. Challenges associated with ever-growing data applications await breakthroughs in our understanding.

The objective of this workshop is to advance machine learning methodologies by addressing the aforementioned challenges. A fundamental question in data science is to quantify the complexities of data structures, and to relate these measures to suitable models and learning tasks. Over the past decade, a number of communities have begun to make progress on this topic. However, efforts to address these questions are currently separated across multiple disciplines. A substantial, integrated effort involving researchers from the full spectrum of foundational data science is needed to tackle these emerging issues. This workshop aims to partially fill this gap by forming new links between statistics, computer science, optimization, and domain sciences.


IMS is closed on 23 May, Vesak Day public holiday.

Part I – Tutorial:

In the first half of the program, Professors Cheng Li, Wei-Yin Loh and Wanjie Wang will give tutorial lectures to provide attendees with some basic knowledge to follow the later invited lectures.

Cheng Li: Scalable Bayesian Gaussian Process Modeling of Massive Spatiotemporal Data. Time: TBD

Wei-Yin Loh: Classification and Regression Tree Methods. Time: TBD

Wanjie Wang: Feature Selection in High-dimensional Data. Time: TBD


Part II – Workshop:

The second part of the program will be devoted to hour-long research talks by invited speakers. The program will bring together international and local leading researchers and practitioners in the fields of statistical machine learning, high-dimensional statistics, and big data applications.


Tutorials13–19 May 2024N/A
Workshop on Statistical Machine Learning for High Dimensional Data21–31 May 2024N/A


Click here to register

Scroll to Top