On-Ramps for Data Science Experiences

Photo of Mark Daniel WardMark Daniel Ward is a professor of statistics and (by courtesy) agricultural and biological engineering, computer science, mathematics, and public health at Purdue University. He is director of the Data Mine.

As the 2023–2024 academic year starts at colleges and universities all over the country, I reflect on ways mentors can create innovative, meaningful experiences for students. Such experiences not only include research guided by a faculty member but also research experiences in which students are working directly—across disciplinary boundaries—with mentors from industry.

In this regard, I am biased because our undergraduate and graduate students in the Data Mine worked on more than 80 projects during the 2022–2023 academic year, and it appears this year will be even more exciting for our team.

The Data Mine is a program in the Office of the Provost at Purdue University that provides students an on-ramp into the data sciences, even if they do not have a background in data science. We are projecting an enrollment of 1,700 undergraduate and graduate students this fall, not only from Purdue but also from across the United States.

In addition to our program on campus, we coordinate the Lilly Endowment–funded Indiana Data Mine, the ASA’s National Science Foundation–funded National Data Mine Network, and the NSF-funded Developing Experiential Accessible Framework for Partnerships and Opportunities in Data Science for the deaf community (DEAF PODS).

I think students in data science and statistics appreciate opportunities to collaborate with students from other areas. Similarly, students from other disciplines (e.g., business, engineering, health science, technology, etc.) value the opportunity to gain insight from statistics and data science students, who can develop statistical or machine learning models.

Unlike a classroom experience, these interdisciplinary experiences are a more accurate mirror of the types of environments in which students will work when they have full-time careers.

Students seem to enjoy learning about a domain from an industry practitioner. There is simply no substitute to learning directly from a person who has worked in a domain throughout their life. Such mentoring is especially meaningful if that person is willing to share their experiences with students throughout a nine-month partnership.

If a student can meet with colleagues from industry one or two times a week throughout an academic year, I wager such discussions will be one of the student’s most worthwhile experiences in college. Even the most experienced statistics and data science faculty are usually unable to replicate the types of advice a colleague from industry can share with students over an extended period.

As faculty members, I think we should work hard to incorporate deep, rich interactions with industry experts throughout the undergraduate and graduate experience.

I think students are more likely to understand AI-powered tools if they are involved in developing them (rather than, say, only studying the mathematical frameworks or computational aspects). By being included in an AI research project, students will not only feel a sense of ownership but a sense of belonging and self-efficacy. Moreover, students who build AI models and have an early first encounter with AI in their career will also have more time to consider the many ways AI is transforming companies for which they may work after college or graduate school.

Another aspect of this alignment of students’ careers with statistics and data science is broadening career pathways. By working with a company early in one’s college experience, a student has the opportunity to develop a first-hand appreciation for the mission of a small or mid-size business, or with a government agency, where they would (otherwise) never have considered working. In states like Indiana, where the brain drain to coastal states can be a concern, this is a crucial economic issue.

My team members at the Data Mine are delighted when a student chooses, for instance, to work with Beck’s Hybrids (a family-owned seed company) or the Indiana Family and Social Services Administration. (View the full list of partners.). Brad Fruth, director of innovation at Beck’s Hybrids, talks about working with students in a video. One project he discusses focused on optimizing the supply chain at Beck’s Hybrids and another focused on statistical and data-driven models for choosing test plot locations.

At the end of the video, the interviewer asks, “Would you recommend other corporations or companies to partner with the Data Mine and why?” Fruth responds tongue-in-cheek, “No! Because we want the students for ourselves, and selfishly, this is all ours for world domination. Don’t do it!”

Michael Douglass, program engineer at Raytheon, also loves working with undergraduate and graduate students on real-world research problems. He has repeatedly told our team working at the Data Mine is now a condition of his employment. In other words, he told his supervisor he will quit if he is unable to continue working with the students. Douglass’ mentoring goes far beyond team meetings with the students. He goes on long bike rides with them and often has meals with them in the dining court of their residence halls. He truly understands the deep impact of mentoring students early in their careers.

Students in the Data Mine are thankful for these experiences. As I was writing this article, first-generation university student Taylor Saunders stopped by my office to discuss her plans for graduate school. When setting up our meeting, she was gushing about her research experience with our team this summer, saying, “I have learned more than I expected and have grown both professionally and personally.” She emphasized she has “found the experience to be both vigorous and exhilarating.”

Saunders is working with a team of peers in our NSF-funded DEAF PODS program. She is studying at Arizona State University but spending the summer at Purdue. Her research project is provided by the Indiana Family and Social Services Administration.

In addition to the mentoring provided by our team, Cristian Guandique, deputy director of data science and engineering, and Matt Kirby, director of engagement and analytics, at the Indiana Family and Social Services Administration meet regularly with the students—both online and at Purdue—to guide their research. This experience enables Saunders and her teammates to gain an immediate understanding of how statistics and data science are practiced in the state government to provide a multitude of services for people in Indiana.

After Saunders completes her summer work with DEAF PODS, she plans to join the ASA’s National Data Mine Network. She wrote to say the following:

I am extremely appreciative to have been given this opportunity, and as of today Jessica [Jud, the Data Mine Senior Manager of Expansion Operations] connected me to [the ASA’s] National Data Mine Network. I am extremely thankful for all of the hard work you and your team put into the Data Mine every day to make these opportunities for myself and other students possible. I spoke to David Glass [the Data Mine managing director of data science] today about further opportunities and mentioned my desire to earn my master’s degree after my graduation date. The Data Mine, and the community here at Purdue, has really inspired me to further invest in my education and skills after my bachelor’s, and I expressed interest in Purdue’s statistics program[…] I am extremely passionate about data science and this program has opened my eyes to all of the possibilities ahead of me.

For faculty developing interdisciplinary applied data science programs, the amount of background in computation, mathematics, and statistics needed is many times a key question. In the Data Mine, we have taken the path of allowing students to join early in their studies, without having a background in these topics. We enable the students to learn data science competencies as they work on their research.

However, the amount of background needed for data science is a topic of ongoing debate. On July 13, for instance, Rob Gould weighed in on this topic in a New York Times article titled “In California, a Math Problem: Does Data Science = Algebra II.” Additionally, a team of eight University of California faculty wrote a letter stating data science can “harm students from such groups by steering them away from being prepared for STEM majors.” These discussions about what training is most appropriate or necessary for data science programs will likely continue for the foreseeable future.

I want to emphasize there are many excellent student opportunities in statistics and data science research throughout the US. For instance, Talitha Washington is doing innovative work as the director of the Atlanta University Center Data Science Initiative. She is also co-PI of the ASA’s National Data Mine Network, which has doubled its number of applicants from last year. This demonstrates the broad appetite for data science research experiences among students is increasing nationwide, including among students pursuing their college degree at minority-serving institutions.

Also, Sat Gupta and his colleagues at The University of North Carolina at Greensboro offer research experiences for undergraduates. Their research program was launched through an ASA initiative funded by the NSF and continues under the auspices of the university’s own NSF research experiences for undergraduates grant.

I firmly believe early research experiences are some of the most valuable ways faculty and industry mentors can support students. Such experiences also give companies and universities new ways to build innovative partnerships. If you are not yet working with students on data science or statistics research experiences, I encourage you to think about exploring such opportunities.