Statistical Data Integration
- Sanjay Chaudhuri (National University of Singapore)
- Partha Lahiri (University of Maryland)
- Pedro Luis do Nascimento Silva (IBGE - National School of Statistical Sciences)
- Danny Pfeffermann (National Statistician of Israel and University of Southampton)
Reliable information about a target population in form of a representative data-set is indispensable for any statistical analysis. The accuracy of the estimates or the measures computed by a statistician depend directly on the information contained in the data-set he or she uses. However, often such data-sets may not have adequate information to produce estimates of required accuracy. It may not have enough observations, which would reduce accuracy of the estimates. More frequently, it may not have enough variables to build a meaningful model for the response of interest. Another common problem is that the available data is not representative of the target population because of informative sampling and what is even more problematic, because of high rates of informative non-response. Even though a statistician would prefer to design and collect appropriate data for the study of interest, data collection is often prohibitively expensive. As a result, one needs to devise procedures either for merging different data-sets or for borrowing information from similar observations within the same data.
Easy collection and storage of big data-sets, obtained from web based applications, social networks or medical records provide interesting opportunities. Such data sources provide what a statistician needs, that is, millions of observations on thousands of variables. These data-sets however are not collected in any designed way. In other words, they are observational and may not represent the population targeted by the analyst. Use of big data sources with or without an integration with carefully designed survey data would often be beneficial in computing official statistics, which are required for making policy decisions. Integration of various data sources is a popular topic of research in several branches of current statistics.
The first week of the programme will be devoted to a workshop on statistical data integration. The workshop will consist of expository lectures on different aspects of data integration methods in statistics. The broad topics of the lectures will include small area estimation, statistical methods for record linkage, data confidentiality, disclosure methods and privacy assessment, multiple imputation techniques and generation of synthetic data to protect privacy, big data integration techniques in official statistics, methods for analysing big data-sets obtained from social networks, online transactions etc. The workshop is designed to be a precursor to a conference that will take place during the second week, where more recent developments in the above topics would be discussed. The programme in the second week will consist of a three-day conference on the current trends in survey statistics.
- Workshop on Statistical Data Integration: 5–8 August 2019
- Daniel Bonnéry, University of Maryland, USA
- Snigdhansu Chatterjee, University of Minnesota, USA
- Cinzia Cirillo, University of Maryland, USA
- Jörg Drechsler, Institute for Employment Research (IAB) of the German Federal Employment Agency (BA), Germany
- Malay Ghosh, University of Florida, USA
- Jiming Jiang, University of California, Davis, USA
- Bani K. Mallick, Texas A&M University, USA
- Indranil Mukhopadhyay, Indian Statistical Institute, Kolkata, India
- Takumi Saegusa, University of Maryland, USA
- Rebecca C Steorts, Duke University, USA
- Jiraphan Suntornchost, Chulalongkorn University, Thailand
- Conference on Current Trends in Survey Statistics: 13–16 August 2019
Conference is fully subscribed and registration is closed.
Please note that our office will be closed on the following public holiday.
– 9 Aug 2019, Singapore National Day.
– 11 Aug 2019, Hari Raya Haji, the following Monday, 12 Aug 2019 will be a Public Holiday.