Steve Pierson, ASA Director of Science Policy
More universities are starting master’s programs in data science and analytics, of which statistics is foundational, due to the wide interest from students and employers. Amstat News reached out to those in the statistical community who are involved in such programs. Given their interdisciplinary nature, we identified programs involving faculty with expertise in different disciplines to jointly reply to our questions. In our April issue, we profiled four universities; here are several more.
NC State University
Michael Rappa is a professor of computer science and the founding director of the Institute for Advanced Analytics at North Carolina State University. As head of the institute, he is the originator and principal architect of the nation’s first Master of Science in Analytics, established in 2007.
The Master of Science in Analytics (MSA) is a novel curriculum aimed squarely at producing graduates with the multi-faceted skills needed to draw insights from complex data sets and communicate those insights effectively. It is the product of a three-year collaboration by an interdisciplinary group, including mathematicians, computer scientists, statisticians, economists, geographers, operations researchers, and faculty with expertise in various fields of business and management.
Please describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
The MSA is a single, fully-integrated course of study—not a menu of core and elective courses—taught exclusively to students in the program. It is highly interactive. Students work in teams and receive personalized coaching to improve their productivity. It is an intensive 10-month learning experience designed to immerse students in the acquisition of practical knowledge and application of methods and techniques. The curriculum is carefully calibrated and continuously updated to meet the evolving challenges facing data scientists. The institute houses classrooms, team rooms, study spaces, and other amenities under one roof, as well as the faculty and staff, who are available to interact with students throughout the day.
Master of Science in Analytics
Year in which the first students graduated/are expected to graduate: 2008
Number of students currently enrolled: 120
Partnering departments: Institute for Advanced Analytics
Program format: In-person, 30 credit hours, full-time, practicum team project
MSA students hone their skills working on challenging problems with actual data shared from sponsoring organizations. The practicum spans eight months and culminates with an executive-level report and presentation to the sponsor. Students work with leading industry-standard programming tools. Since the program’s inception, MSA students have engaged in 134 projects with more than 100 sponsors spanning virtually every industry segment and including some of the world’s leading organizations and best-known brands.
With a decade of experience and hundreds of graduates, the curriculum has a proven track record of producing superior student outcomes.
What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?
The importance of being able to store and quickly manipulate large amounts of electronic data has a history that spans several decades. However, by the 1990s, the rapid emergence of the web presented us with huge amounts of real-time streaming data. The supply of university graduates with the skills needed to manage, analyze, and draw insights from the new data reality wasn’t keeping up with the demand. We saw an opportunity to offer a new graduate degree focused on producing data-savvy students who could meet the evolving needs of employers.
There has been an overwhelming response to the MSA program. Already, in just a decade, it has become one of NC State’s largest graduate degree programs in terms of degrees conferred annually. It’s also the university’s most selective degree program, with more than 1,000 applicants each year and an acceptance rate in the low teens. It’s gratifying to see the quality of students attracted to the program and their passion for data.
The program gets high marks from students. They enjoy the team-driven learning experience and hand-on approach to working with data in their practicum projects. They can see the close connection between what they are learning and what they will be doing on the job as an analytics professional. It greatly enhances their employability.
What types of jobs are you preparing your graduates for?
In the institute’s employment report, you will see each job title for the positions our students enter upon graduation. The positions can be bucketed into three large categories: analysts, consultants, and data scientists. The first two categories come with a variety of adjectives (e.g., risk analyst or integration consultant). The data scientist position is a relatively recent development. It has come on strong in the last five years.
The institute’s annual employment report also provides information about the distribution of employment by industry sector (financial services, software/internet, and consulting are the big three). Typically, MSA graduates land in any one of a dozen industry sectors. There are also government placements, including the armed services, and perhaps one or two graduates who will head into a position within a university.
What advice do you have for students considering a data science/analytics degree?
The ultimate litmus test is your passion for working with data. People will tell you about the great job opportunities and the flashy headlines. Sorry to say it, but there’s really nothing sexy about working with data. It’s hard work. It’s tedious. There are times it will make your head hurt. For every great insight, there are a hundred frustrating dead ends. But if you’re really good and love the work, the potent insight now and then makes it all worth it. This could be said for just about any college major, if you take it seriously and set your sights on performing at the highest level.
Describe the employer demand for your graduates/students.
The institute has a decade of experience and has placed 651 students in the profession—with an unparalleled track record of 90–100% placement by graduation year after year since our inception. We collect comprehensive data on every placement, and indeed every job offer, and keep track of our graduates as they progress in their careers. The current median starting salary is $100,000 for graduates with prior work experience and $90,000 for graduates without prior work experience.
You don’t hear employers lamenting anymore about the shortage of talent. What they fret about is the scarcity of high-quality, well-prepared graduates. The institute is as successful as it is because we have the highest standards for admission and a unique learning format proven to produce high-quality results. Our students are the kind of graduates employers look to recruit.
Do you have any advice for institutions considering the establishment of such a degree?
Unless you have unlimited resources, work together. Put creative energy into the kinds of organizational innovations that will facilitate collaboration across unit boundaries. Universities have their own histories and cultures that define how the academic disciplines fit within departments and colleges and give the institution a set of possibilities and constraints uniquely its own. No single approach to establishing a degree will fit every university. Rely on the members of your faculty who are by their nature boundary-spanners. Enable them to do what they do best.
The Institute for Advanced Analytics is, by design, a university-wide collaboration. The institute brings together faculty in fields such as mathematics, statistics, computer science, operations research, and business disciplines to work together to develop, refine, and deliver the Master of Science in Analytics. The result is, by every measure, a resounding success for us.
Penn State University
Colin J. Neill is an associate professor of software and systems engineering and director of engineering programs. He is the author of more than 80 articles about the development and evolution of complex software and systems and the management and governance thereof. As director of engineering programs, Neill oversees the division’s portfolio of graduate degree programs, including the MPS in data analytics delivered both in residence and online. John I. McCool is a distinguished professor of systems engineering. He has taught courses in statistics, experiment design, reliability, statistical process control, applied data mining, probability models, and optimization. His research includes statistical inference for the Weibull distribution and industrial statistics.
How do you view the relationship between statistics and data science?
Statistics provides the foundational concepts of random sampling, the central limit theorem, common probability distributions, hypothesis testing, and predictive modeling. These concepts undergird the intelligent application of computer intensive data mining tools of the data scientist such as neural networks, decision trees, cluster analysis, and association modeling.
That said, one can certainly ponder the role of statistics in the age of Big Data. Statistics tells us how to infer from samples of the population, so what does that mean when we potentially don’t need to sample the population, given the computation and data storage we have at our disposal?
Please describe the basic elements of your data science curriculum and how it was developed.
It was developed in collaboration with faculty from engineering, business, statistics, information technology, and software engineering/computer science. Separately, we all recognized the value of an interdisciplinary program that covered the techniques, technologies, theory, and application of data science and analytics and sought to create a program that simultaneously spanned that broad expanse, yet dealt with each aspect in depth.
The core of the program covers the central statistics of analytics, as well as the computational statistics and machine learning used in predictive and prescriptive analytics. The program we created has options that allow students to focus on a specific area within data analytics—technologies used in development of such systems, data storage and processing at scale, prescriptive analytics techniques, business analytics focused on applying analytics for strategic advantage, marketing analytics (coming soon), and hopefully other areas as more academic partners come on board.
MPS in Data Analytics
Year in which the first students graduated/are expected to graduate: 2017
Number of students currently enrolled: 200
Partnering departments: School of Graduate Professional Studies (Lead); Smeal College of Business; Applied Statistics; Harold & Inge Marcus Department of Industrial and Manufacturing Engineering
What was your primary motivation(s) for developing a master’s data science program? What’s been the reaction from students so far?
As we said above, we all recognized the value of a program that focused on addressing the data deluge seen in almost every area of the private and public sectors. The student reaction has been phenomenal. Applications have been very strong, and that allows us to maintain high admissions standards so the entire student body is accomplished and driven. This allows for a rich classroom environment—whether virtual in our online program or literal in our face-to-face program. Our students seem particularly energized by the opportunity to engage in faculty research in our Big Data lab, a research group that also functions both physically and virtually.
What types of jobs are you preparing your graduates for?
Since we have multiple options within the program, we are preparing students for a broad array of professional roles, but I personally find the job titles out there aren’t well defined, so I hesitate in using them too categorically. I certainly believe our program can prepare graduates for roles as data scientists, data architects, and data analysts, depending on the option pursued and the electives selected.
What advice do you have for students considering a data science degree?
My main advice would be do it! Every job outlook report indicates it is everything from the sexiest job of the 21st century to the most highly sought after for the next decade and beyond. One can certainly get related skills in a computer science or statistics degree, but data science combines them, adds to them, and puts them into context, and that is valuable in the marketplace.
Describe the employer demand for your graduates/students.
The demand for data science graduates is incredible. In just more than a year of offering the degree, we have had direct requests for graduates and interns from employers in transportation, logistics, health care, automotive, entertainment, and finance.
Do you have any advice for institutions considering the establishment of such a degree?
Well, we aren’t seeking competition, but if I were to offer advice, I would say find academic partners in the various aspects of data science—statistics, machine learning, data processing and storage, information retrieval—as well as the various domains that are employing analytics so you have domain knowledge, too. Of course, with so many partners, the forming of consensus gets harder, but as the saying goes, the hardest steel is forged in the hottest fire.
University of Vermont
James P. Bagrow is an assistant professor of mathematics and statistics at the University of Vermont and a member of the Vermont Complex Systems Center. He has degrees in liberal arts (AS) and physics (BS, MS, and PhD).Jeffrey S. Buzas is professor and chair of mathematics and statistics and director of the statistics program at the University of Vermont. He has degrees in mathematics (BS) and statistics (MS and PhD).Peter Sheridan Dodds is a professor in mathematics and statistics at the University of Vermont, where he is also the director of the Vermont Complex Systems Center and co-director of the Computational Story Lab.Margaret J. Eppstein is professor and chair of computer science at the University of Vermont and founding director of the Vermont Complex Systems Center. She has a BS in zoology, MS in computer science, and PhD in environmental engineering.
Please describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
Our program provides students with a transdisciplinary education that prepares them for business environments or a PhD in an analytic field. Our program is more scientific than professional and is unique in its combination of complex systems and data science (CSDS). Throughout the MS in CSDS program, students are challenged to create defensible arguments for their findings, with warnings against the many potential pitfalls associated with exploring large-scale data sets, coupled with the use of computational process-based models that lend insight into emergent properties of complex systems. Admissions requirements include courses in calculus, programming, data structures, linear algebra, and probability and statistics. We offer opportunities for students to make up missing prerequisites.
What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?
Almost all scientific fields have moved from data scarce to data rich, and sophisticated analyses have been made possible by the advent of distributed computing and storage, with accompanying advances in algorithms and theory. As Big Data has become a common thread across disparate disciplines, so too have methods for contending with the many difficulties presented by large-scale data analysis. The program was created to address the opportunities created by these conditions. Faculty were already working on modeling and analysis of complex systems using transdisciplinary approaches, and we already had a five-course Certificate of Graduate Study in Complex Systems, so it was natural to build an MS degree upon this foundation.
Quote from a student course evaluation: “This class changed the way I see the world.”
How do you view the relationship between statistics and data science/analytics?
Data science is at the intersection of statistics and computer science. Data munging, machine learning, visualization, text processing, heterogeneous data types, and web scraping are examples of tasks not typically addressed in traditional statistics programs. Inferential logic/methodologies and design of experiments are not typically taught in computer science programs. A large number of schools have business-themed data science programs. The core data science part of our MS in CSDS at UVM provides more general purpose training, though certainly a career in the business world would be a possible outcome for students.
Master in Complex Systems and Data Science
Year in which the first students graduated/are expected to graduate: 2016
Number of students currently enrolled: Five
Partnering departments: Department of Mathematics and Statistics, Department of Computer Science
Program format: In-person instruction, 30 credits (coursework only, project, and thesis options). Support for finding internships, no graduate teaching assistantships, research assistantships possible for students working with externally funded advisers.
What types of jobs are you preparing your graduates for?
We have developed close relationships with several companies, and they are helping to support our programs. The program is new and we don’t yet have data to address demand for our graduates, but it is worth noting that data scientists are increasingly in demand across the spectrum of occupations in government, finance, corporations, and journalism. The job title of data scientist is now commonplace. Popularized by Nate Silver and Moneyball, training in data science is being sought after across the United States. Perhaps the clearest evidence is the growth of data science degrees globally. Also, these degrees, which are largely master’s level, are easily being filled by applicants within the United States.
What advice do you have for students considering a data science/analytics degree?
We aim to serve students coming from a wide variety of backgrounds and therefore deliberately keep the prerequisites to a minimum. Students must have a bachelor’s degree in a relevant field and prior coursework in computer programming, data structures, calculus, linear algebra, probability, and statistics.
Our program is ideal for students interested in the intersection of statistics, computer science, and mathematics with applications in any of a wide variety of domains. The degree provides more exposure to computing and statistics than the traditional statistics or computer science degrees (respectively), offers unique transdisciplinary courses in complex systems and data science, does not require strictly disciplinary courses (e.g., we do not require computer science–specific courses like operating systems), and provides a great deal of flexibility in customizing coursework to student interests.
Do you have any advice for institutions considering the establishment of such a degree?
The statistics, computer science, and mathematics programs at UVM have a collegial relationship, which has helped significantly in the formation of our BS in data science and our MS in complex systems and data science degrees. We work closely on course scheduling so the courses in the different disciplines do not conflict. Cross-listing of courses also provides for increased options for students. Collaboration is also made easier in that the participating disciplines all reside in the College of Engineering and Mathematical Sciences.
University of Wisconsin-Madison
Mark Craven is a professor in the department of biostatistics and medical informatics at the University of Wisconsin-Madison. His research involves developing machine-learning methods to infer network models of interactions among genes, proteins, environmental factors, and phenotypes of interest.
How do you view the relationship between statistics and data science/analytics?
Data science is the combined use of tools and concepts from statistics/biostatistics and computer science/biomedical informatics for gathering, integrating, analyzing, interpreting, and visualizing data for scientific inquiry and decision-making. In addition to those two core disciplines, data science incorporates case studies, methods, theory, and principles from other fields, including systems engineering, human-centered design, and information sciences. Biomedical data science is focused on the quantitative and computational aspects of generating and using data to further biomedical research, broadly construed.
Please describe the basic elements of your data science/analytics curriculum and how it was developed.
Our program in biomedical data science includes areas such as machine learning and data mining, optimization, database methods, image analysis, formal study design methods for biomedical research, and formal statistical principles for quantifying uncertainty and making inferences.
Master of Biomedical Data Science
Year in which the first students graduated/are expected to graduate: 2017
Number of students currently enrolled: Six
Partnering departments: None
Each student must take a core sequence comprising one course in each of biostatistics, bioinformatics, medical image analysis, and clinical informatics. They also each develop an area of concentration with two additional courses. Examples might include, among others, clinical biostatistics, more advanced bioinformatics or computational biology, or clinical informatics. Students also take a research ethics course and may engage in a capstone research project.
What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?
Recent growth in the size and complexity of data arising in biology, biomedical research, and public health policy—including applications in high-throughput biology, medical image analysis, clinical and health services research, and genetics and genomics—requires continued research and training in the separate disciplines of statistics and computer science, and, as importantly, their synthesis.
Nationwide, the biomedical research community is struggling to manage, share, analyze, and fully exploit expanding quantities of data in the biomedical sciences. The need for a workforce capable of innovating, implementing, and using methods from biomedical data science is widely recognized. This demand has been driven by the following factors:
- The proliferation of high-throughput biological experimental methodologies (e.g., next-generation sequencing, microarrays, SNP arrays) has transformed biology into a data-intensive science.
- Increasingly, biomedical studies and clinical decision-making are integrating and making inferences with varied types of data (genotypes, molecular profiles, images, electronic health records, and population-based data), which heightens the need for sophisticated computational methods.
- Incentives, such as those specified by the Health Information Technology for Economic and Clinical Health (HITECH) Act, are accelerating the adoption and broadening functionality of electronic health records and health care billing records, including application in comparative effectiveness research.
The NIH has clearly identified biomedical data science as an area of priority for increased training for clinical and translational research to proceed at a pace that takes advantage of the tremendous output of scientific and clinical data. The Data and Informatics Working Group of the NIH director’s advisory committee made a specific recommendation to “build capacity by training the workforce in the relevant quantitative sciences such as bioinformatics, biomathematics, biostatistics, and clinical informatics.” Following this report, the NIH formally recognized the need to expand the quantitative sciences workforce and methodology through its Big Data to Knowledge (BD2K) initiative, which has called for innovative new research and training programs focused on the management and analysis of biomedical data. Thus, there is a pressing need and a keen interest among translational researchers for such training.
What types of jobs are you preparing your graduates for?
This is a new program, and we are eagerly anticipating our first graduates. The jobs for which they are preparing are quite varied. Some of our students joined the program with medical or other advanced clinical degrees. They are gaining methodological skills and experience that will complement their clinical training and facilitate their work as medical researchers. Other students—typically coming in with a bachelor’s degree—will be looking for positions in industry in an array of fields including biotechnology, direct-to-consumer genetics, electronic health records development, and medical instruments.
What advice do you have for students considering a data science/analytics degree?
The best advice for students interested in biomedical data science is to develop a basic foundation in mathematics (optimally, at least two semesters of calculus, plus linear algebra) and computer sciences (two semesters), and to develop an interest and some coursework in biology or biomedical investigation.
Upon graduation, students should continue to emphasize all three contributing scientific areas in their professional development, including (bio)statistics and computer sciences. Our students need to be prepared to quickly deploy skills in computer science and data analytics. In addition, a basic foundation in an area of biology or biomedical science is exceptionally valuable. For this type of degree, students need to be a “triple threat,” instead of simply focusing their efforts in one area.
Describe the employer demand for your graduates/students.
Employment opportunities for data scientists are growing rapidly and include numerous and growing opportunities in the health care industry. In a January 2016 article in the Denver Post, Shawn Wang, vice president of data science for Anthem Insurance’s health care analytics department, was quoted as saying, “Data science has been mature for the last couple years in retail, e-commerce, and fintech (financial technology). They’re really strong. We have to leverage those. Our preference is to find people within the health care space, but we know there is a limited supply. It’s not easy.”
This is true in Wisconsin, as well. UW computer sciences professor Jignesh Patel stated in the Milwaukee Journal Sentinel, “Wisconsin has potential in the big data arena, particularly in the arenas of health care IT where Madison has deep expertise …”
Data-driven job search websites and resources bear out this trend. For example:
- The Jobs Rated Report 2016 list of the top 200 jobs at Careercast.com lists data scientist at #1 and statistician at #2. Glassdoor’s list of the 25 best jobs in America also places data scientist at #1.
- The Indeed.com job trend chart for data scientist indicates that data scientist jobs as a fraction of all listings increased approximately eight-fold between August, 2012, and August 2016.
Do you have any advice for institutions considering the establishment of such a degree?
There are many units on campus with interests and initiatives in data science. We have been successful by focusing on the biological and biomedical application area. I would advise any institution considering this area to build on existing partnerships between statistics, biostatistics, computer sciences, and biomedical informatics. No one unit can or should “own” this area, so proceeding in a broad and inclusive way makes the most sense.
South Dakota State and Dakota State Universities
Thomas Branden- burger has more than 20 years of leadership, academic, and consulting experience in both the private and public sectors. He is a retired U.S. Naval Officer and a former information technology consultant at Perot Systems. Currently, he is an associate professor of statistics at SDSU teaching predictive analytics courses.Jun Liu is an assistant professor of information systems at Dakota State University. He has been serving as the coordinator of the Master of Science in Analytics program since 2014. Liu earned a PhD in MIS from the University of Arizona. His research is in enterprise data management, business intelligence, large-scale networks, and Big Data analytics.
Please describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
Development Principles. Both universities leveraged their industry ties and sought expert opinions on both content and delivery of material. SDSU has a formal industry advisory board with regional business executives. DSU solicited input regarding our program from global leaders in data science and analytics such as IBM, SAS, and Cloudera. Both universities use industry analytics and data tools. SDSU uses SAS- and R-based platforms for coursework. DSU also uses the SAS Academic initiative free access. Additionally, IBM has collaborated with DSU and allowed no-charge access to software (such as BigInsights and Cognos) and participation in the IBM Academic Skills Cloud pilot program.
A common concern was expressed in reviewing other programs nationally. These programs seemed to either be too theoretical or too managerial for what we were hoping to achieve. It was thought that both programs had the following common set of three goals for the program:
- Relevancy to the practitioner
- A continuum of skills
- High standards of student output
Leveraging Strengths. The joint program takes advantage of faculty expertise at both universities. DSU’s faculty in information systems has expertise and experience on the IT side of data science and teaches courses such as system development, databases and data warehousing, machine learning and predictive modeling, and Big Data. DSU’s professors in health informatics and information assurance provide expertise in highly applicable areas including health data analytics, forensic statistics, and fraud detection. SDSU’s faculty focuses on the statistics/mathematics side of data science and has deep expertise across the full spectrum of analytics and applied statistics and mathematics practices pertinent to this program.
Master of Science in Data Science (MSDS)/Master of Science in Analytics (MSA)
Websites: South Dakota State and Dakota State universities
Year in which the first students graduated/are expected to graduate: 2015
Number of students currently enrolled: 40 and 70, respectively, at SDSU and DSU
Partnering departments: South Dakota State University Mathematics and Statistics (Brookings, SD) and Dakota State University College of Business and Information Systems (Madison, SD). The two programs share six common core courses that are jointly offered.
Program format: Thirty credit hours. Six core courses (18 credits) are offered jointly with Dakota State University with three courses from SDSU in predictive analytics and modeling and three courses from DSU in Big Data and information systems technologies.
Prerequisite Knowledge Requirements. To level set entrance into the programs, students in both programs are expected to have taken courses or have work experience in programming principles, database design and programming (including familiarity with SQL), and statistical principles before they enter the program. If a student does not meet these requirements, he/she is required to take additional prerequisite courses to cover any gaps.
What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?
The primary goal of both programs was to fill demand that was being expressed by industry partners and to fulfill the strategic missions of the universities to South Dakota. SDSU offers the MSDS, an MS in mathematics, MS in statistics, and PhD in computational science and statistics. In addition to the MS in analytics, DSU offers an MS in information systems, MS in health informatics, MS in information assurance, MS in applied computer science, DSc in IS, and DSc in cybersecurity.
How do you view the relationship between statistics and data science/analytics?
Data science is a much broader field, encompassing everything related to data, from data cleansing, data manipulation, data storage (including databases and data warehousing), and data analysis (including machine learning, statistics, text mining, social network analysis, etc.) to Big Data analytics. Traditionally, statistics focuses on inference, including testing hypotheses and deriving estimates, while analytics focuses on using machine learning to extract insights from data and to make predictions. Nowadays, we are often dealing with Big Data in different formats that require the use of a very different technology stack than used previously with traditional statistical analysis. These two fields are rapidly merging. We require our students to have a solid background in statistics and also keep up with emerging technologies such as Big Data analytics and large-scale machine learning.
What types of jobs are you preparing your graduates for?
South Dakota has a strong banking industry, especially in consumer lending, with many major credit card and student loan companies. Many SDSU graduates have taken jobs in this field in situations that require the prediction of customer behavior, whether that be credit risk, marketing, portfolio management, or forecasting and optimization of customer contact. These all require the analysis and modeling of large transaction-based data sets. Others have gone on to work for the health care industry. The application of analytics in health care has grown dramatically in a short period. Still others have gone into areas as diverse as large consulting firms, manufacturing, large agriculture firms, consumer retail companies, and private weather forecasting. Many of DSU’s graduates are working as data scientists/analysts in the health care domain in South Dakota and other midwestern states. DSU also has graduates working as data scientists or software engineers with analytics focus in financial institutions. The jobs the graduates take are representative of the continuum of skills both universities are teaching across the full domain lifecycle of data.
What advice do you have for students considering a data science/analytics degree?
Data science/analytics students should be familiar with the whole process, from data collection, data cleansing, exploratory analysis, and data transformation to data storage and data analysis. Data science/analytics students should be good programmers who can use languages such as R and Python to do data processing and data mining. It is recommended that any aspiring data scientist learn statistics with a heavy focus on statistical programming using real-world examples. A focus should be put on establishing both breadth and depth of skill. Not a jack of all trades approach, but rather a Swiss Army knife approach. Be good at several things.
Describe the employer demand for your graduates/students.
Demand for graduates is high. The feedback from employers who have hired our students is positive because of the practical and hands-on nature of our program.
Do you have any advice for institutions considering the establishment of such a degree?
It’s harder than it looks. Traditional university hierarchies do not reward for non-research-based activities. Often, this is an obstacle at research-based universities and so sometimes smaller or private universities are more likely to be able to establish these programs. Ensure the leadership of your university is fully on board and establish the lines of funding at the outset. Find a couple key external stakeholder companies that have a vested interest in making it happen.
Additionally, it is difficult to recruit professors with strong analytics and data science backgrounds because they are in high demand.
Harvard University
Rafael Irizarry is the director of the health data science master’s program. He has worked on the analysis and signal processing of microarray, next-generation sequencing, and genomic data. Recently, he began developing diagnostic tools and discovering biomarkers. He also develops open-source software, and is one of the leaders and founders of the Bioconductor Project.
Please describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
The new master’s degree program in health data science provides students with the rigorous quantitative training and essential computing skills needed to manage and analyze health science data to address important questions in public health, medicine, and basic biology. The program trains students to extract knowledge from data and to communicate this knowledge across disciplines.
The first year consists of case-based training in statistical inference, machine learning, and programming, as well as training in public health and biomedical sciences. Through this case-based approach, students simultaneously learn computing skills necessary to manage and analyze data and start gaining experience in answering scientific questions with data. Although these skills are generally applicable, we focus on applications related to public health and the biomedical sciences.
These skills are further developed during an intensive semester-long course during the third semester that focuses on project-based work. This culminating research experience allows students to integrate the knowledge and skill they have attained to answer real-world questions. Program faculty define the projects assigned in this course.
SM in Health Data Science
Year in which the first students graduated/are expected to graduate: 2019
Number of students currently enrolled: Expected matriculation for our first class in fall of 2017 is 16 students
Partnering departments: Biostatistics Department
Program format: 60-credit SM, including hands-on, semester-long, project-based research course (7.5 credits); traditional/full-time program format
A total of 60 credits of coursework is required for the MS in health data science.
This includes a 30-credit ordinally graded core curriculum consisting of the following courses:
- BST 222 Basics of Statistical Inference (Fall, 5 credits)
- BST 260 Introduction to Data Science (Fall, 5 credits)
- BST 261 Data Science II (Spring, 2.5 credits)
- BST 263 Applied Machine Learning (Spring, 5 credits)
- BST 262 Computing for Big Data (Fall, 2.5 credits)
- HDS 325 Health Data Science Practice (Fall, 7.5 credits)
- EPI 201 Introduction to Epidemiology Methods I (Fall, 2.5 credits)
Students are also required to take five credits of coursework in computer science. In addition to the computer science courses, a minimum of 22.5 additional credits come from a list of elective courses offered by the departments of biostatistics, biomedical informatics, computer science, statistics, and epidemiology.
All candidates for admission to master’s programs must have the following:
- An undergraduate degree in mathematical sciences or allied fields
- Practical knowledge of computer scripting and programming, as well as experience with a statistical computing package such as R or Python
- Calculus through multivariable integration
- Excellent written and spoken English
What was your primary motivation(s) for developing a master’s data science/analytics program? What’s been the reaction from students so far?
The main gap we aim to address relates to bringing the subject matter question to the forefront and treating the statistical techniques and computing as tools that help answer the question. Through answering these questions, students learn to connect subject matter to statistical frameworks. We also cover computing and programming in much more depth, teaching R, Python, and techniques for handling data sets that do not fit in memory. Our program also has a stronger focus on machine learning techniques and computing than the traditional statistics master’s.
The program is designed to be an essential bridge between developing a solid understanding of statistical issues and building the computing and programming skills to implement best practices in applied health science research.
How do you view the relationship between statistics and data science/analytics?
Statistical inference and methodology are integral to a data scientist’s toolbox. However, the demand for data science education is surging and traditional courses offered by statistics and biostatistics departments are not meeting all the needs of those seeking this training. Some programs have been adapting by having computing play a more prominent role. While we agree that increasing the training of computing skills is necessary, our main motivation for creating this program was the necessity to bring applications to the forefront.
Although traditional statistical programs are housed in departments with faculty performing research that falls exactly into what students interested in data science want, educational programs don’t always teach what we do. Our program looks to change this, and we will prepare students to create, connect, and compute with data to answer real-world questions from the public health and biomedical fields.
What types of jobs are you preparing your graduates for?
The SM in health data science is designed to be a terminal professional degree, giving students essential skills that are in demand in a growing data-driven industry. The program also provides a strong foundation for students interested in continuing in a PhD program in biostatistics or other quantitative or computational science with an emphasis on data science.
What advice do you have for students considering a data science/analytics degree?
For those seeking hands-on training that builds skills to apply to real-world problems, a data science/analytics degree offers the rigorous quantitative training and essential computing skills needed to manage and analyze data to do this. The data science/analytics degree is different from a computer science or statistics degree in that it focuses on solving real-world problems with data, rather than on learning theory and methods and using data only as an example.
Describe the employer demand for your graduates/students.
We anticipate strong demand for graduates from this master’s program in health data science. First data will be available after the first cohort graduates in 2019.
Do you have any advice for institutions considering the establishment of such a degree?
Our main recommendation is that those who develop data science courses should not only have rigorous statistical and computing training, but also experience analyzing data with the main objective of solving real-world problems.
Leave a Reply