Roger Peng is a professor in the department of biostatistics at the Johns Hopkins Bloomberg School of Public Health. These days, he is working in environmental biostatistics, researching the health effects of air pollution and climate change. He is interested in spatial-temporal data.
Thinking about transitioning into a career in data science? We asked Roger Peng, a professor in the department of biostatistics at the Johns Hopkins Bloomberg School of Public Health, to answer some questions about earning a certification in data science as a way to pivot into the field. He co-directs the data science specialization, a 10-course introduction to data science.
Describe the basic elements of your data science/analytics curriculum and how the curriculum was developed.
Our curriculum is a 10-course sequence that focuses on the entire lifecycle of a data science problem, from formulating and refining a question to getting, cleaning, and wrangling data to modeling and presenting results. Our sequence is based on the R programming language and is designed for people with relatively little background. We end with a final project that integrates all the skills students learned throughout the sequence. Brian Caffo, Jeff Leek, and I designed the sequence with the idea in mind that these are all the skills we would want any data scientist to have if they were working for one of us.
What types of jobs are you preparing your graduates for?
Data science is a diverse field, with many people doing different things. We don’t target any specific type of job, except that we assume a significant part of the job will involve data analysis.
Is there a particular field that would transition well into data science?
At the moment, I think the answer is “no.” There are certainly core skills that are important in data science—such as statistics, programming, and data analysis—but I don’t think there is a field of study that would necessarily give someone a significant advantage when transitioning to data science. That may change in the future as the boundaries of data science shrink or expand and as the demand for data scientists changes. But I have personally seen many people, from all kinds of backgrounds, successfully move into a data science career.
What are the best skills to pick up when transitioning into data science?
This can be a difficult question to answer because the answer shifts as the field evolves. But it so far seems that some key skills to have include programming in a data analytic language like R or Python, making basic statistical inferences, having facility with database technologies, and using exploratory data analysis tools. In addition to learning these skills, it’s also useful to become familiar with the communities that surround each of these tools. For example, the R community has grown significantly over the past 20 years and participating in that community has been valuable beyond simply knowing the language.
What are the top three reasons to earn a certificate in data science as opposed to getting a master’s degree?
I think the top three reasons would be cost, recency, and specificity. Master’s programs generally are far more expensive than certificate programs and so one has to seriously consider the trade-offs with enrolling in a master’s program. I think having a formal degree like a master’s is a good long-term investment, but it is a significant financial expense that cannot be ignored. Certificate programs tend to be a bit more up-to-date than formal degree programs and they tend to focus on more specific technologies and skills. This can be relevant when applying for jobs that require knowledge of the latest technologies. Finally, many certificate programs are designed in conjunction with employers and will have the advantage of explicitly teaching what those employers are looking for.
What do you find fun about data science?
To paraphrase John Tukey, you get to “play in everyone’s backyard.” Being a data scientist gives me an opportunity to learn about so many areas of interest—scientific, business, etc. —and I’m endlessly fascinated by the specifics of these different problems.
Are data analysts and data scientists the same thing?
My personal opinion is that data analysis is something data scientists do (and they do it a lot!), but data scientists are often responsible for dealing with non-data–related things such as developing analysis requirements, building and engineering software systems, and developing collaborations with other specialties. Many data analysts do these other things, too, but it is not often their responsibility. I think there is significant overlap between the two, but not 100 percent overlap.
How do you view the relationship between statistics and data science?
I see statistics as an important and central element of data science. The work of the data scientist currently involves knowledge from multiple fields, and statistics is one of them.
Can someone who just earned a data science certification become a data scientist immediately?
Right now, that depends on the position they’re looking for and their background. People with scientific and programming backgrounds may be able to go into data science with a certification.
Thanks for the nice suggestions on career guidance and testing. You have provided a valuable information.
While I am no mathematician, I enjoy my business analytics doctoral-level class. I love data analytics for marketing and reading research on behavioral sciences and its application to business sales. While I do not want to get a job since I own my TV Agency, learning data analytics will create a competitive edge.