Q&A with Student Fellows in Data Science

Three fellows from the Data Science for Social Good (DSSG) program offer advice and respond to questions about their experiences, views on data science, and future plans.

Jane Zanzig earned her BS in mathematics and MS in statistics from the University of Washington and is a fellow at the Center for Data Science and Public Policy at The University of Chicago. Her interests are in applying data science to socially relevant problems and increasing representation of under-represented groups in the computational sciences.

Jane Zanzig

How are you spending your summer as a DSSG fellow?

My team is working with the Environmental Protection Agency to prioritize inspections of hazardous waste facilities. We used results from 850,000 historical inspections, records on the type and amount of waste produced at each facility, and tests from other EPA regulatory programs to predict which facilities had the highest probability of violation.

What inspired you to apply?

I was graduating from my master’s program and debating what I wanted to do next. For a while, I have been dead set on wanting to do the PhD, having heard it was the only way to do truly interesting statistical work. However, the projects I had enjoyed most were more applied than theoretical, like estimating infertility prevalence and modeling distributions of storm occurrence. I was really interested in applying the math theory I loved to solve problems that actually matter to people and create solutions that would really be implemented. I actually didn’t decide to apply until a few days before because a friend convinced me I would be a good fit. Thank goodness I had two gracious and forgiving professors who wrote me letters on short notice!

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

Yes, I definitely do. It showed me a lot about both what I have to learn and what I can offer when working in a data science work environment. My advice for anyone, statistician or otherwise, is to think hard beforehand about what you want to get out of the summer. I know I wanted to strengthen my Python skills, and so I voted for our team to do most of our project in Python. Sometimes when you have the stress of deadlines and “deliverables,” it’s easy to lose sight of these goals you set for yourself, but I tried to look at everything as a learning experience.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

I think statistics is essential to data science. We know a lot about how to cast a problem as an optimization problem, which is at the heart of data science. We are also trained to constantly be questioning how the results of our work generalize and what assumptions go into the models we use. Sometimes we have to forget a little bit of what we know about theory when we are building predictive, rather than causal, models—it can sound horrifying to just throw every variable but the kitchen sink into a model and see what shakes out. What I thought was the most interesting part of the summer was the interdisciplinary nature—how differently the people with a computer science background thought about data science workflow than the statisticians and how many tough questions about the implications and context the social scientists brought up. It definitely made the conversation more dynamic.

What advice do you have for young statisticians wanting to work in data science?

This will all seem a bit cliché, but learn to program. Work on more applied projects. Learn how to communicate your work to people from a variety of audiences. The plus side is that we had the chance to work on all of these this summer at DSSG!

What do you plan to do after you receive your degree/fellowship?

I am going to continue working with the Center for Data Science and Public Policy—we are designing a field experiment with the EPA to see if our model can really improve upon their current process in terms of prioritizing inspections. We’re also partnering with Computing Kids, a company I work for in Seattle, to develop a data science course for high-schoolers with an eye toward bringing more under-represented populations to computing and training teachers to teach computer science.

Amy Hepner is a former community organizer and educator, excited about the potential applications of modern statistics and data science techniques on social impact problems.

Amy Hepner

How are you spending your summer as a DSSG fellow?

I’m working with the Australian Conservation Foundation (ACF), an environmental protection organization, to increase digital engagement. The ACF seeks to use data to gain a deeper understanding of their constituency and improve online advocacy efforts.

What inspired you to apply?

I came to data science from a ‘social good’ background a few years ago, excited about the potential of modern statistical methods and their application to community efforts. DSSG offered me a resource-rich environment in which to work, in collaboration with like-minded people, on the sorts of problems that brought me into this field. I was sold the moment I heard about it.

What advice do you have for young statisticians wanting to work in data science?

Learn to love programming—keep tinkering with different languages and tools until you’re smitten. Figure out what drives you and find people with the same interests. Check out meet-ups. Go to hack-a-thons. Build your skills around the problems you hope to work on.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

Not all data science is statistics, and not all statistics is data science. For me, the formal methods our field has developed do play a role in data science, but it’s mostly the intuition I’ve cultivated as a statistician that helps me solve problems. Understanding things like the data-generation processes, bias, variation, classical and Bayesian modeling, information extraction, and outlier impact allows me to make better decisions at each turn when completing a project.

What do you plan to do after you receive your degree/fellowship?

I graduated last May and am looking for work near Pittsburgh, Pennsylvania, at the intersection of data science and social good!

RuobinGongRobin Gong is a PhD student in statistics at Harvard University. Her research interests lie in the foundations of statistical inferential methodologies, as well as applied modeling and causal inference for the social sciences. Gong earned her BS in cognitive psychology from the University of Toronto.

Robin Gong

How are you spending your summer as a DSSG fellow?

For the past summer at DSSG, I’ve been working with three other fellows on the early identification of high-school students who may not graduate on time. In collaboration with three partner public school districts across the country, we used historical student enrollment, academic performance, and discipline data to build at-risk student prediction models powered by statistical and machine-learning methods. In taking this data-driven approach, we aim to better inform schools about the unique challenges individual students are faced with and facilitate focused interventions at an early stage.

What inspired you to apply?

The biggest driving force was my desire to participate in a real-life data project from end to end. I particularly appreciate the exposure we gain on project planning and management through directly communicating with the partners to calibrate inferential goals and resolve issues such as data gaps. The focus on social good particularly accentuates the breadth and relevance of the fellowship projects, ensuring a positive and significant social impact of the work being done. In addition, DSSG fellows’ diverse mix of academic backgrounds creates a rich environment for peer learning, which I have longed to be part of.

Do you recommend fellow statisticians participate in this program in the future? If so, why and what advice do you have for them?

Unequivocally yes, since my experience at DSSG has proven to be an unparalleled hands-on learning experience. I have the following three pieces of advice:

  • Be prepared to get your hands dirty and wrestle with imperfections in real-life data. It takes a lot more effort to get the design matrix ready than running the regression itself!
  • Hold a flexible, creative, and collaborative mentality. Doing a real-life project is like walking through a dark forest—you need to navigate through forking paths, think outside the box, and make friends who will have your back!
  • Treasure the rare opportunity of working alongside 41 brilliant minds! Be humble and take every opportunity to learn about your peers’ work, both inside and outside of DSSG.

The DSSG fellows come from diverse fields. How do you view the relationship of statistics to data science?

Statistics makes up one of the three indispensable ingredients of data science, alongside computer science and the social sciences. With its historical root stemming from political science (statistik: “science of the state”) and modern ties to machine learning and artificial intelligence, statistics as a discipline can offer much more than a handful of modeling techniques. We as statisticians take pride in doing justice to data through our analysis and the principled approach to formulating and solving any data-related problem reflects the condensed wisdom from centuries of statistical research, which is crucial to ensuring the external validity of the models and conclusions delivered by any data science project.

What advice do you have for young statisticians wanting to work in data science?

  • Have the breadth and depth of knowledge of a wide array of topics covering the foundations of statistical inference, from experimental design and survey sampling to modeling techniques to missing data and model evaluation methods, etc.
  • Be a competent, self-sufficient coder.
  • Ground your analysis in the context of the real problem by having an extensive understanding of the bigger picture, scientifically and socially.
  • Hone your communication skills to make sure your analysis results really get across to people who need them.

What do you plan to do after you receive your degree/fellowship?

After receiving my PhD, I plan to pursue a tenure-tracked professorship in statistics at an academic institution with both teaching and research responsibilities.