Registration is open for the free “world of blended data” series of virtual workshops. Register for the entire seriesor just a few.
The Government Statistics (GSS) and Social Statistics (SSS) sections are hosting the series, offered as a part of the ASA Professional Development Program. It targets those who may not be able to travel to conferences but are interested in continuing education opportunities.
Each virtual workshop consists of a one-hour webinar followed by virtual participation in a group discussion and possible set of activities using data and code provided by the presenter. Every component is designed to educate on one aspect of blended data from an acknowledged expert in the field while focusing on applications to surveys and censuses (demographic and establishment).
The series will expose participants to the advantages of using combined data sources for developing inferential models and measures while remaining cognizant of the challenges associated with combining large data sets and the potential pitfalls of analyses of blended data, including privacy considerations. Participants will gain familiarity with commonly used machine learning software such as R and Python. Topics, presenters, and dates include the following:
Overview of Blended Data, given by Frauke Krueter, director of the Joint Program in Survey Methodology, University of Maryland, will emphasize applications of blended data in surveys and censuses. October 17, 3:00 p.m. EDT
How Rare Is Rare? The Importance of Validation, given by Aric LaBarr, associate professor, North Carolina State University’s Institute for Advanced Analytics, will address useful and appropriate methods of model and results validation using blended data, introducing the target shuffling technique. November 21, 1:00 p.m. EST
Introduction to Python for Data Science, given by Hunter Glanz, assistant professor, California Polytechnic State University, will cover how to use Python for data manipulation in preparation for machine learning and present examples using open source government data. January 16, 1:00 p.m. EST
Interpretability vs. Explainability in Machine Learning for High-Stakes Decisions, given by Cynthia Rudin, associate professor, Duke University, will introduce interpretable machine learning models, which come with their own explanations that are faithful to what the model actually computes. These models are contrasted with black box models, presenting applications from the criminal justice system and health care. TBD
Differential Privacy, given by Matthew Graham, US Census Bureau Center for Economic Studies, will introduce differential privacy concepts with an emphasis on census data (as opposed to sample survey data). Special topics such as formal privacy protection for skewed populations and blended data considerations will also be addressed. TBD
This series began in September and runs through March 2020 (excluding December 2019). At its conclusion, GSS and SSS hope to organize a short series of case studies on real-life applications, also conducted virtually.
Contact Jenny Thompson, GSS chair-elect, with questions or suggestions.
Leave a Reply