Statistical Machine Learning for High Dimensional Data

(13 May 2024–31 May 2024)

Organizing Committee

Co-chairs

Jialiang Li (National University of Singapore)

Wei-Yin Loh (University of Wisconsin - Madison)

Miaoyan Wang (University of Wisconsin - Madison)

Members

Cheng Li (National University of Singapore)

Ke-Wei Huang (National University of Singapore)

Jonathan Scarlett (National University of Singapore)

Harold Soh (National University of Singapore)

Wanjie Wang (National University of Singapore)

Contact Information

General Enquiries: ims-enquiry(AT)nus.edu.sg

Scientific Aspects Enquiries: miaoyan.wang(AT)wisc.edu

Overview

Rapid developments in modern technologies have made large-scale data readily available in daily life. High-dimensional problems focus on datasets in which the number of parameters is magnitudes higher than the number of observations. Recent years have seen an upsurge of interest in high-dimensional data analysis. The practical successes, however, have uncovered a myriad of new challenges.

Old premises fail in new settings. High-dimensional models and algorithms can yield unexpected behaviors that challenge classical statistical premises. Notable examples include double descent (no bias-variance tradeoff in over-parameterized prediction) and benign non-convexity (no spurious local optima in matrix/tensor factorization). Classical statistical premises need be revised to address the limitations of the new settings.
Various notions of complexity emerge. Complexity is the fundamental tool in modern data science to characterize the difficulty of learning tasks. Statisticians use sample complexity to measure the number of observations needed for accurate estimation. Computer scientists use computational complexity to measure the number of arithmetic operations required in their algorithms. The fundamental challenges of high-dimensional problems lie in the computational- statistical tradeoffs, in which sample complexity hinges on and interacts with computational complexity. Current mechanisms for optimizing these tradeoffs are limited.
One paradigm may not fit all. Off-the-shelf data analytics tools are increasingly challenged by domain applications. Parametric models facilitate statistical inference but often lack robustness; nonparametric models can increase prediction accuracy but they are often more difficult to interpret; classical asymptotic theory holds only for global optima but is brittle for local optima. In these situations, we need to characterize the regimes for which a learning approach succeeds or fails. Challenges associated with ever-growing data applications await breakthroughs in our understanding.

The objective of this workshop is to advance machine learning methodologies by addressing the aforementioned challenges. A fundamental question in data science is to quantify the complexities of data structures, and to relate these measures to suitable models and learning tasks. Over the past decade, a number of communities have begun to make progress on this topic. However, efforts to address these questions are currently separated across multiple disciplines. A substantial, integrated effort involving researchers from the full spectrum of foundational data science is needed to tackle these emerging issues. This workshop aims to partially fill this gap by forming new links between statistics, computer science, optimization, and domain sciences.

Activities

IMS is closed on 22 May, Vesak Day public holiday.

Part I – Tutorial:

In the first half of the program, Professors Cheng Li, Wei-Yin Loh and Wanjie Wang will give tutorial lectures to provide attendees with some basic knowledge to follow the later invited lectures.

Cheng Li: Scalable Bayesian Gaussian Process Modeling of Massive Spatiotemporal Data.

Wei-Yin Loh: Classification and Regression Tree Methods.

Wanjie Wang: Feature Selection in High-dimensional Data.

Part II – Workshop:

The second part of the program will be devoted to hour-long research talks by invited speakers. The program will bring together international and local leading researchers and practitioners in the fields of statistical machine learning, high-dimensional statistics, and big data applications.

	Date	Abstract
Tutorials	13–17 May 2024	N/A
Workshop on Statistical Machine Learning for High Dimensional Data	21–31 May 2024	View

Organizing Committee

Co-chairs

Members

Contact Information

Overview

Activities

Venue

List of Speakers and Talks' Title

List of Other Participants

Group Photo