ASA DataFest 2020: A Virtual Experience

Mine Çetinkaya-Rundel, Robert Gould, and Donna LaLonde

Wendy Martinez (left) and Robert Gould during virtual DataFest

    ASA DataFest is a data analysis competition in which teams of up to five students analyze a real and complex data set over the course of one weekend. But spring 2020 was not typical. As many colleges and universities transitioned to a remote format, the ASA DataFest steering committee considered alternatives for the competition. The goal was to adapt DataFest to the new remote environment while maintaining the parts of the event that make it an inviting and valuable experience for students with a wide range of data analysis experience.

    Student Presentations & Data Sets

    UCLA

    Duke University

    University of Edinburgh

    University of Toronto

    The 2020 DataFest was held as a virtual data challenge in which students worked in teams to explore an impact of the COVID-19 pandemic. Given the variety of potential topics, part of what made this year’s challenge unique was it involved participants finding a data set for their analysis. In a typical ASA DataFest, a surprise data set is revealed to participants at a kick-off event on Friday afternoon and students work throughout the weekend to analyze the data and derive insights.

    DataFest events were held in April through June, a time when data and modeling about the direct health outcomes of the pandemic were rapidly changing and unreliable. Building models and drawing reliable conclusions about infection, mortality, or recovery rates would require participants to understand the nuances and limitations of the COVID-19 health data at a level that would likely not be feasible in the short span of the DataFest competition. Therefore, participants were advised to “tell us about something affected by the COVID-19 pandemic other than its direct health outcomes” to discourage them from presenting conclusions that could be potentially misleading or harmful.

    Suggested analysis questions included the following:

    • How has the pandemic affected the airline industry, and what are some potential downstream effects of this other than economic strain on the industry?
    • As a student, how would you quantify the effect of the pandemic on your education?
    • With shelter-in-place / lockdown orders, many workers have started working from home, which requires internet access. How prepared was the nation / your local area for this shift?
    • How has the spread of the pandemic affected people’s opinion of government tracking and privacy?
    • What is the effect of the social distancing / shelter-in-place / lockdown recommendations and policies on pollution?
    • How can we quantify the potential effects on nutrition and general health of the public, outside of those affected by the virus?
    • How are refugees affected by COVID-19?

    By suggesting these potential analysis questions to students, we were worried we might be hampering their creativity. This was not the case! Students who participated in the event came up with a wide variety of questions on their own. Educators considering classroom projects using COVID-19 data may find the analysis foci from the winning teams useful as a starting point. Here is a sample:

    • Societal Impacts of the COVID-19 Pandemic on Education in the United States: Analysis of data from surveys conducted by the US Census Bureau’s Household Pulse Survey, examining the availability of devices and internet in households with children in public or private schools in the US over a period of four weeks, April 23 – May 26, 2020 (The Data Quails – University of Edinburgh)
    • Relationship Between Dengue Fever Outbreak and Lockdown: Investigation of whether the dengue fever outbreak in Singapore, which coincided with Circuit Breaker (Singapore’s COVID-19 lockdown measures), could be attributed to the Circuit Breaker or, alternatively, if the Circuit Breaker had worsened the dengue fever outbreak (Team lemonchocolatecheesecake – University of Edinburgh)
    • Dreams in the Time of COVID-19: Exploration of Google search trends as well as sentiment analysis of tweets related to people having vivid dreams during COVID-19 outbreak (Apoorv Jha – Duke University)
    • How Research Priorities Shift as COVID-19 Progresses: Exploration of the data set provided as part of Kaggle’s COVID-19 Open Research Dataset Challenge (CORD-19) suggesting that research focus shifted from finding a cure to preventative measures for containing COVID-19 (Team N & N – Duke University)
    • Purchasing Behavior via Amazon and Google Trends: Analysis of purchasing behavior data based on Amazon prices and Google Trends (Team Maskman – UCLA)
    • Driving During Quarantine: Investigation of traffic data to evaluate the effectiveness of the call for social distancing in Toronto measured by the decrease in the amount of people driving in residential areas of the city (Team Shirley Eva – University of Toronto)

    ASA President Wendy Martinez, who virtually welcomed the students participating in the UCLA event, summed up the ASA DataFest experience with this comment: “It is amazing and inspiring what students working together and supported by faculty and volunteer experts are able to accomplish. It makes me optimistic for the future of our profession.”