*Steve Pierson, ASA Director of Science Policy*

###### More and more universities are starting master’s and doctoral programs in data science and analytics—of which statistics is foundational—due to the increasing interest from students and employers. *Amstat News* reached out to those in the statistical community who are involved in such programs to find out more about them. Given their interdisciplinary nature, we identified programs involving faculty with expertise in different disciplines to jointly reply to our questions. We have profiled many universities in our April, June, and December 2017 issues and January 2018 issue; here are several more.

## University of Central Florida

**Liqiang Ni**is associate professor of statistics in the department of statistics at the University of Central Florida. His research interests include dimension reduction, multivariate analysis, actuarial science, and business intelligence. He has served as the graduate coordinator since August 2017.

**Shunpu Zhang**is a professor of statistics and chair of the department of statistics at the University of Central Florida. His research interests include bioinformatics, functional estimation, health informatics, large-scale hypothesis testing, and big data analytics.

#### MS in Statistical Computing—Data Mining Track

**Year in which first students graduated/are expected to graduate:** 2002

**Number of students currently enrolled:** 60

**Program format:** In-person, 36 credits, comprehensive exam required, either thesis or project-based, full time and part time, graduate assistantship offer on competitive basis

There are largely two components for required courses in the curriculum. The first tilts to traditional statistics, including a two-semester course for theoretical statistics and one-semester course for regression analysis and logistic regression/GLM, respectively. The second component tilts to applications: two-semester course for data processing and preparation, including coding, and two-semester course for data mining. There is a variety of elective courses students can choose from.

**What was your primary motivation(s) for developing a master’s (or doctoral) data science/analytics program? What’s been the reaction from students so far?**

In the late 1990s, the statistics community began to realize the great potential in data mining and data science. UCF created one of the earliest data mining programs, in part inspired by SAS Company and with support from Disney, Florida Hospital, Blue Cross Blue Shield of Florida, Universal Studios, and many local business partners.

The students responded with a good deal of enthusiasm. We have seen a growing need for an educated and talented workforce at the MS level and beyond that can contribute to industry, government, and academia through innovative applications of data analysis methodologies.

**How do you view the relationship between statistics and data science/analytics?**

We believe statistical science is an integral part of data science/analytics. A good data scientist/data analyst must have adequate training in statistics.

**What types of jobs are you preparing your graduates for?**

We are preparing MS graduates largely for industries. Every year, a few graduates continue their studies in PhD programs.

**What advice do you have for students considering a data science/analytics degree?**

We suggest students have a solid foundation in computer programming, mathematics, and statistics to be a good data analyst. They also should have a keen interest in the new developments in data science.

**Describe the employer demand for your graduates/students.**

Demand for our graduates has always exceeded the supply, especially in recent years.

**Do you have any advice for institutions considering the establishment of such a degree?**

We believe a data analytics/data science graduate program resides best in a statistics department with concentrations in computer programming and software development. Open-mindedness is the key to a successful interdisciplinary program.

## University of Michigan

**Michael Elliott** is a professor of biostatistics and research professor of survey methodology. His research interests include survey methods, causal inference, missing data, and longitudinal data analysis with applications to social epidemiology, cancer trials, women’s health, pediatrics, and injury.

**H. V. Jagadish** is Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science. His research has spanned many aspects of big data, including data usability when they come from multiple heterogeneous sources, and has undergone many manipulations.

**XuanLong Nguyen** is associate professor and director of master’s programs in statistics. His research interests include Bayesian nonparametrics, hierarchical models, and machine learning.

**Elizabeth Yakel** is associate dean for academic affairs and professor in the school of information. Her research focuses on data reuse, teaching with primary sources, and the development of standardized metrics to enhance repository processes and the user experience.

**Ji Zhu** is a professor and director of the data science master’s program in statistics. His research interests include statistical learning; network analysis; and statistical modeling in finance, marketing, and biosciences.

#### Data Science Master’s Program

**Year in which first students graduated/are expected to graduate:** 2019–2020

**Partnering departments:** Biostatistics, Electrical Engineering and Computer Science, School of Information, Statistics (administrative unit)

**Program format:** Full time, on campus; requires at least 25 credit hours in core areas including databases, data and web applications, regression, and statistical learning

The program requires the students to have demonstrated competence in a basic computing sequence and a basic statistics sequence. By taking graduate-level courses, the students need to demonstrate expertise in data management and manipulations, as well as statistical techniques relevant to data science. The students need to take at least one advanced elective from each of the following buckets: principle of data science; data analysis; and data science computation. The students will also have an integrative capstone experience through an approved project.

Students with an undergraduate degree in data science would already have obtained a reasonable level of training toward the core skills and may finish the master’s degree in one year. Students with an undergraduate degree in mathematics or physics, statistics or biostatistics, computer science, and other quantitative disciplines should be able to complete all requirements within two years.

**What was your primary motivation(s) for developing a master’s (or doctoral) data science/analytics program? What’s been the reaction from students so far?**

The data science explosion is fueled organically by new data generated from diverse sources, devices, web services, mobile communication, scientific studies, and social media. Data scientists require a versatile and unique set of skills to manage, process, and extract data from these complex information streams, and then interrogate, analyze, visualize, and interpret the information. Nationally, there is a pressing need for data scientists, and, in fact, for people with every level of data science training. The successful launch of our data science major, which has attracted almost 200 students across campus in its first two years, made it clear that our students want to be part of data science. The collaborative approach we take across departments and colleges enables us to pool resources and offer the best our university has for a truly cross-cutting program.

**How do you view the relationship between statistics and data science/analytics?**

Statistics is undoubtedly a major part of data science. The advancement of statistics has always been driven by new data that arise in science or society, whether they are from agriculture measurements or the industrial revolution or the internet. While data science requires tools from multiple disciplines (e.g., mathematics, computer science) and must work with specific domains of applications (e.g., business or health care analytics), statistics and data science are inseparable. From design of experiments to probabilistic modeling, from data exploration to confirmatory testing, and from estimation to prediction, statistics has been the core to data analysis. Statistics without data science will not thrive, and data science without statistics is certainly unsound.

**What types of jobs are you preparing your graduates for?**

This is a new program, but we provide the training the students need to work as data scientists in a wide range of industries, from financial services to health care, from marketing to social networking. We invite companies to our career fair for the students, and we encourage students to take internships to help them understand what they need to prepare for in school.

**What advice do you have for students considering a data science/analytics degree?**

We offer two master’s degrees, one in applied statistics and the other in data science. At the present time, the applied statistics degree focuses more on modeling and inference, and the data science degree focuses more on data handling and data mining. Some of the students from the applied statistics degree pursue a doctoral degree in statistics, biostatistics, economics, and other quantitative fields. We expect the data science students to be versed in data management and programming. However, there is an increasing overlap between the two programs, as we offer more computing courses to applied statistics students and more statistics courses to data science students.

**Describe the employer demand for your graduates/students.**

We do not have data on our graduates from the data science program, but the vast majority of our graduates from the applied statistics program was employed or went to PhD programs within six months of their graduation.

**Do you have any advice for institutions considering the establishment of such a degree?**

Data science programs by nature cross traditional boundaries, but the department of statistics is a natural and ideal home for such programs. To make such programs successful, the statistics departments must be willing to modernize their existing curriculum to embrace data science and reach out to work with the faculty from other programs. At Michigan, different programs offer complementary courses in data science and, together, we believe we can attract and accommodate students from diverse backgrounds.

## The Johns Hopkins University

**James Spall**has four appointments at The Johns Hopkins University: principal professional staff at JHU/APL; chair of the applied and computational math program; co-chair of the data science program; and research professor in the department of applied math and statistics. Spall has published extensively in the fields of control systems and statistics.

#### Master of Science in Data Science and Post-Master’s Certificate (PMC) in Data Science

**Year in which first students graduated/are expected to graduate:** Late 2018

**Number of students currently enrolled:** More than 120 fully matriculated students in the MS degree and 0 students in the PMC program. There are additional students who have been given a provisional admission status (additional evaluation and/or coursework required) for both the MS and PMC.

**Partnering departments**: Applied and computational mathematics and computer science

**Student type:** Nontraditional/part time/continuing education, although there are a few students pursuing the degree full time

**Program format:** Online/in-person/combination; 30 credit hours required in five years for the MS; 18 credit hours required in three years for the PMC

The program is a combination of selected offerings in two existing rigorous graduate degree programs in applied and computational mathematics (ACM) and computer science (CS). On the ACM side, students will take a foundational course in statistical methods and data analysis, followed by required courses in optimization, statistical models and regression, and computational statistics. On the CS side, students will take a foundational course in algorithms, followed by required courses in databases, visualization, and data science. All students are also required to take one upper-level ACM elective (e.g., data mining, queuing theory, or stochastic optimization) and one upper-level CS elective (e.g., machine learning or big data processing using Hadoop). Qualified students will need to have taken three semesters of calculus (through multivariate), discrete mathematics, Java, and data structures.

**What was your primary motivation(s) for developing a master’s (or doctoral) data science/analytics program? What’s been the reaction from students so far?**

The motivation for starting the program is clear to anybody even slightly paying attention to broad trends in society toward greater quantitative analysis in decision-making and the need for processing and interpreting massive data sets in many diverse fields. JHU had a well-received non-credit sequence in data science through Coursera and the school of public health for several years, and the need for a graduate credit program was fairly clear. In response, the JHU Whiting School of Engineering, through its engineering for professionals division, took on the challenge of creating a rigorous, credit data science program based in both applied math and computer science. Relative to the number of applicants, the data science program has had an overwhelming response since the program was rolled out in fall 2016. The cumulative number of applications grew from 0 to more than 2,000 in less than two years.

**How do you view the relationship between statistics and data science/analytics?**

While there is a wide variety of data science programs, all seem to have a substantial basis in statistics. That connection is not surprising when you consider statistics is defined as the field devoted to “the practice or science of collecting and analyzing data”!

While we will not proclaim to know “the” relationship between statistics and data science, the JHU program in data science is deeply connected to advanced methods in mathematical statistics, modeling, and computational statistics. As such, the prerequisites for the data science program involve mathematics through multivariate calculus (Calculus III), as well as a course in discrete mathematics and exposure to linear algebra and matrix theory.

**What advice do you have for students considering a data science/analytics degree?**

A prospective student needs to be strong in math and adept at programming. Someone considering the program who has not taken mathematics or programming courses in several years prior to starting the program might consider taking a refresher to “hit the ground running.” Also, for the key demographic of students who are working full time or near full time, it is recommended that students initially take only one course at a time. This allows a person to re-acclimate to academic life.

**What types of jobs are you preparing your graduates for? Describe the employer demand for your graduates/students.**

The range of jobs associated with data science, broadly defined, is almost limitless. It seems many large and small employers have people doing data science in some capacity, but without having that label in the job title. Given that most of our students are part time and are partially or fully employer funded, the students are expected to continue with their current employer. For the minority of students not employer funded, we currently have little data regarding employer demand because the program is a new offering. That being said, given the strong demand for the program, there is little doubt that those students in the job market will be able to find relevant positions.

## University of Vermont

**James P. Bagrow**is an assistant professor in mathematics and statistics at the University of Vermont and a member of the Vermont Complex Systems Center. He has degrees in liberal arts (AS) and physics (BS, MS, and PhD).

**Jeffrey S. Buzas**is professor and chair of mathematics and statistics and director of the statistics program. He has degrees in mathematics (BS) and statistics (MS and PhD).

**Margaret J. Eppstein**is professor and chair of computer science at the University of Vermont and the founding director of the Vermont Complex Systems Center. She has a BS in zoology, MS in computer science, and PhD in environmental engineering.

**Peter Sheridan Dodds**is a professor in mathematics and statistics at the University of Vermont, where he is also the director of the Vermont Complex Systems Center and co-director of the Computational Story Lab.

## PhD in Complex Systems and Data Science

**Year in which first students graduated/are expected to graduate:** 2021

**Partnering departments:** Vermont Complex Systems Center (lead), Mathematics and Statistics, Computer Science

**Program format:** In-person (online being developed), thesis/project or coursework, 30 credit hours, traditional/non-traditional/full-time/part-time/continuing education

We provide students with broad training in computational and theoretical techniques for describing and understanding complex natural and sociotechnical systems, enabling them to then—as possible—predict, control, manage, and create such systems.

Our PhD is a natural addition to our educational platform, which already consists of an MS in complex systems and data science and a five-course graduate certificate in complex systems. UVM also now has an undergraduate major in data science.

The major skill sets we aim to train include the following:

- Data wrangling: Methods of data acquisition, storage, manipulation, and curation
- Visualization techniques, with potential for building high-quality web-based applications
- Uncovering complex patterns and correlations in systems through data-fueled machine learning and genetic programming
- Powerful ways of identifying and extracting explanatory, mechanistic stories underlying complex systems—not just how to use black box techniques

Students must have prior coursework or be able to establish competency in the following:

- Calculus
- Coding (Python/R ideal, but not necessary)
- Data structures
- Linear algebra
- Probability and statistics

The basic motivation was that we live in a renaissance time with so many fields moving from data-scarce to data-rich. Students need a suite of skills to be able to contend with the kinds of broad problem solving they will face in the real world, very likely as parts of teams. These students should not be cogs with narrow training. Student response has been extremely positive.

**How do you view the relationship between statistics and data science/analytics?**

Our PhD and master’s incorporate training in computer science, statistics, mathematics, physics (mechanisms), and complex systems.

**What types of jobs are you preparing your grads for? (If you have had graduates, please summarize the types of jobs they took and in what sector.)**

Data science positions at corporations and in governments positions. Students with training that will be formally framed by our PhD have gone on to work for companies, as well as into careers in education.

**What advice do you have for students considering a data science/analytics degree?**

Students should look for data science programs that are truly interdisciplinary. They should be able to develop skills that enable them to explain patterns, and not just reproduce them or generate novel ones. While explanation is fundamental to science, it is also crucial in real-world venues to be able to understand and defend, for example, decisions proffered by algorithms for maintenance of ethical, legal, and assurance standards.

**Describe the employer demand for your grads/students.**

Very strong. We have increasingly received interest in PhD students with a deeper training.

**Do you have any advice for institutions considering the establishment of such a degree?**

Just do it. The world has changed, and it is our responsibility to adapt. We have to frame education so students will have a clear path to becoming data scientists. Many essential courses will already exist, but the development of hybrid core courses on data science will likely also be necessary.

No comments