View from a Woman Statistician and Data Scientist in the Era of Big Data

Kelly H. Zou is senior director and analytic science lead, Real-World Data and Analytics, Patient & Health Impact, at Pfizer Inc. She is an elected fellow of the American Statistical Association and an Accredited Professional Statistician. Her research interests include health care policy, Big Data, and outcomes research. She has written more than 130 professional articles and four books.

How or why did you choose statistics as a career path / area of study?

Zou: I took a probability course as an undergraduate student, with a major in mathematics and a minor in physics. I always had the aspiration to study astronomy, astrophysics, or signal process. The last one turned out to be my PhD thesis topic.

In my high school, I received a prize following an astronomy contest throughout Shanghai. In my undergraduate school, I received a biography on Albert Einstein as an award from the physics department. I worked with data in physics laboratory experiments as a lab assistant and equations in the math laboratory as a licensed tutor.

My mathematics adviser noted I was intrigued by probability theory and one day asked me if I had ever thought of statistics. The rest was history! I found my calling and destiny as a PhD student in statistics in my junior year in college.

I was fortunate to have the most wonderful doctoral thesis adviser, W. Jackson Hall at the University of Rochester, through whom I became an academic “descendent” of Johann Carl Friedrich Gauss. I also had joint post-doctoral advisers Sharon-Lise Normand and Clare M.C. Tempany at Harvard Medical School.

I would say both mathematics and physics led me to statistics and beyond. Luckily, these disciplines have laid a solid theoretical and methodological foundation for sound statistical practice and applications.

Please see the interview I did in Amstat News about my journey in the fields of statistical and data science.

What inspires you about statistics and data science?

Zou: Bhoopathi Rapolu wrote in The Guardian, “A data scientist takes raw data and marries it with analysis to make it accessible and more valuable for an organization. To do this, they need a unique blend of skills—a solid grounding in maths and algorithms and a good understanding of human behaviors, as well as knowledge of the industry they’re working in, to put their findings into context. From here, they can unlock insights from the data sets and start to identify trends.”

To put it bluntly, what fascinates me the most are the following aspects:

  • What you see is what you get (in an empirical frequentist sense)
  • What assumptions are appropriate under what circumstances (in a Bayesian sense)

My doctoral research was on semiparametric methods associated with the assessment and validation of classification accuracy, while my post-doctoral research was on observational data analysis. Thus, they have provided some sophisticated and practical tools for me to be a statistician, a biomedical engineer, a quantitative analyst, and data scientist, however you name it. However, the tools may evolve, but the inquisitive nature, mathematical principals, and coding skills will take each of these roles far along a career path.

Kelly Zou is among those interviewed in “Meet Inspirational Women in Statistics & Data Science.”

I also enjoy chairing the Statistical Partnerships Among Academe, Industry, and Government (SPAIG) Committee of the American Statistical Association, because the statistical profession is fertile for collaborations and interdisciplinary research. I am also interested in the policy aspect and quantitative data analysis and comparative effectiveness research as chair-elect of the ASA’s Health Policy Statistics Section.

What challenges do women face in the statistics and data science professions?

Zou: I, along with other members of the Joint Committee on the Status of Women (JCSW) at Harvard Medical School and Harvard School of Dental Medicine, published an article about the gender differences in research grant applications and funding outcomes for medical school faculty. The JCSW observed and found the following: “Gender disparity in grant funding is largely explained by gender disparities in academic rank. Controlling for rank, women and men were equally successful in acquiring grants. However, gender differences in grant application behavior at lower academic ranks also contribute to gender disparity in grant funding for medical science.”

In terms of mathematically oriented professions, women are particularly excellent in visualizations, analyses, and interpretations. For female students to become part of a future generation of statisticians and data scientists, providing early interests and scholarships in science, technology, engineering, and mathematics (STEM) programs such as those from the National Science Foundation (NSF), is paramount. Furthermore, tips and opportunities for scientific internships are also critical for them to gain real-world hands-on experience, beneficial mentorship, and career development.

Besides, collaborative projects in the era of Big Data may also let women data scientists shine with their talents. The SPAIG Committee, for example, recognizes outstanding statistical partnerships with the annual SPAIG Award.

What is the most exciting aspect of your job, and what does a typical day in your job involve?

Zou: Currently, I am senior director and analytic science lead, Real-World Data and Analytics (RWDnA), Patient & Health Impact, at Pfizer Inc. I was previously statistics lead, Statistical Center for Outcomes, Real-World, and Aggregate Data (SCORAD), in the same company. I have also been associate professor at Harvard Medical School and associate director at Barclays Capital.

My detailed experience is provided on the ASA website. The common theme has always been the four Vs of Big Data: Veracity (uncertainty of data), Variety (different forms of data), Velocity (analysis of streaming data), and Volume (scale of data). For example, I have worked on the problems of two-dimensional pixel or three-dimensional voxel image data, financial tick data, electronic health records, medical and insurance claims, and international survey questionnaires such as patient-reported outcomes. Thus, they have provided many unique and common challenges, which require different analytic tools and programs. There is no particular typical day per se when the teams are dealing with these four Vs.

In the context of my daily work, the term “real-world data” (RWD) comes up frequently. It means “data used for decision making that are not collected in conventional randomized control trials.” Such data can be Big Data with the characteristics of the four Vs.

My days are filled with being part of teams and interacting with talented team members who are analytic scientists, data scientists, statisticians, and programmers. I also interact with cross-functional stakeholders such as outcomes researchers, medical and clinical colleagues, epidemiologists, payer insight analysts, and liaisons and collaborators with other organizations.

There are not only face-to-face meetings, but also web-based communications with colleagues from other parts of the world, as well as within the United States. Thus, communication skills are important to possess to effectively discuss with team members rapid queries of RWD, minor and major analytic needs with various levels of complexity, peer-reviewed publications based on non-interventional observational studies and pragmatic trials, and regulatory interactions.

Besides multiple product areas, I also cover the Asia-Pacific region and China as a large country, which requires understanding country-specific policies on patient privacy protection, data access, storage, and regulatory landscapes. Traveling to collaborate and present on RWD-based topics is also required from time to time.

Recently, for example, I have been fortunate enough to take on international assignments to China, Hong Kong, Japan, Malaysia, and Taiwan. Being a native speaker of Chinese and being educated in the United States certainly give me helpful cultural perspectives. When being part of the exciting and cutting-edge Big Data and RWD development around the world, it is helpful to develop communication skills, respect different cultures, understand different country’s requirements and challenges, and gain insights in the relevant context.

What would you say to girls in school/college who may be considering statistics or data science as a study option/career choice?

Zou: Nowadays, a statistician or a data scientist may have a wide background and sharp mind and be a keen observer. Although these are necessary characteristics and elements for success, it is helpful to also master “hard” tool sets such as quantitative training, mathematical training, and computer language and coding skills. On the other hand, “soft” skills such as communicating, identifying what the customer wants, and translating it into quantifiable and actionable results are also keys to a successful career.

The girls in school or college may embrace and strengthen learning opportunities in S-STEM. They may have the attitude that the sky is the limit in terms of their future careers in data science when analyzing and dealing with the four Vs. After all, it is not surprising at all that Forbes has listed data scientist and statistician as the top two best jobs in 2016, respectively. On the other hand, it is also important not to rush into the field before weighing your options carefully.

Do you think the perception of statistics or data science as a male-dominated career can be changed, and if so, how?

Zou: There are many women in statistics. For example, the ASA database contained records for 18,944 members in September 2015, 34.6% of whom were women. Compared to prior years of data when the percentage of women was 32% in 2012 and 31.5% in 2010, we see a slight upward trend in the share of women in the ASA membership.

Then, what about women in statistics and data science? An insightful article published in Wire explained why women appear to be invisible in data science. “The problem with data science in academia is that’s not where the magic happens. It happens at the Googles, Facebooks, Microsofts, and IBMs, as well as startups like Dstillery. This is where the richest data and most interesting problems are—and it’s not accessible to most academics. In fact, for them, getting access requires heavy networking.”

Hence, in my view, women statisticians and data scientists may consider the following advice:

  • Be imaginative, inquisitive, and creative for the four Vs of data
  • Be savvy and master communication skills
  • Challenge the norm, but be mindful of the underlying mechanisms and methodology
  • Expand horizons to include subject-matter expertise areas
  • Gain hands-on experience in hardware and software development
  • Network with others in the quantitative professions
  • Participate in internships and practical training
  • Possess a zest to learn and think beyond the massive data and their surfaces
  • Seek excellent career mentors and sponsors
  • Understand the policies and challenges for data access and analysis

Finally, there is no “one size fits all” when becoming a successful data scientist, but these useful skill sets may set the candidates apart and make them shine, regardless of gender.