How to Work with Data Scientists

RJ MachtmesRyan J. Machtmes, GStat, is an independent research consultant and mathematical statistician, member of the ASA Committee on Statistics and Disability, and accredited graduate statistician of the American Statistical Association. He is a longtime member of the American Statistical Association and the Phi Kappa Phi honor society.

The emergence of data science as a significant field of interest in business heralds a challenge to the practice of statistics. How we, as a profession, respond to this challenge will determine our individual and collective futures. Do we evolve with modernity, or relegate ourselves to a subset of potential outcomes and directions?

Data science poses an important shift to the province of traditionally trained statisticians, particularly in business, as data scientists seem best equipped with the technical skills necessary to harness the power of Big Data (as currently defined). With the novelty of data science comes the proclivity for computer science, business, and statistical science to claim it, while forcing some statisticians to adopt data science as their profession to remain competitive in the job market. Responding to this emergent issue is incumbent upon the members of the statistics profession.

As has been advocated by statistics leaders, including ASA presidents, practicing statisticians are data science. What could be more central to our role as data stewards than to advocate for appropriate use of these data streams? The advent of new and complex problems of statistical inference is not cause to relegate such problems to the exclusive province of information technologists, but to both lead the charge for appropriate use of these technologies and support those statisticians being mandated to assume the roles of business intelligence and data science. We need to determine how best to respond to and work within this new paradigm.

The unique contributions mathematical statisticians are prepared to make to Big Data projects cannot be understated. As indicated in the recently released ASA whitepaper, “Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society”:

Statistical thinking not only helps make scientific discoveries, but it quantifies the reliability, reproducibility, and general uncertainty associated with these discoveries. Because one can easily be fooled by complicated biases and patterns arising by chance, and because statistics has matured around making discoveries from data, statistical thinking will be integral to Big Data challenges.

This does not mean statisticians, data scientists, and information technologists should not work together to solve important Big Data challenges, but rather that statisticians must have equal presence on interdisciplinary teams asked to respond to such issues. The mandate is clear: Statisticians, data scientists, and information technologists need to work together to resolve Big Data challenges. I’ve worked with data scientists and information technologists. The working relationships I shared with these professionals succeeded because of the interdisciplinary nature of the teams. They worked because we complemented one another’s strengths, respected each other’s contributions to the effort, and provided assistance when needed. We formed cohesive teams, and all members were able to share their respective knowledge and experience.

With these examples in mind, I consider the following elements of my interaction tangible best practices other statisticians may apply when working with data scientists:

  1. Respect one another professionally. Without mutual respect, there can be no lasting team structure to help balance the workloads.
  2. Don’t assume you know everything, or that anyone does. Every member of the team contributes something.
  3. Develop one another’s skills and knowledge areas. If there is a data-related concept a statistician doesn’t know, or a statistical concept a data scientist doesn’t know, there is opportunity for professional growth.
  4. Trust each other sufficiently to admit when the other knows something you don’t and use it as a learning experience. It is humbling and exposes vulnerabilities, however fleeting, to admit gaps in one’s knowledge. But, by admitting one’s knowledge gaps in confidence, we can not only learn and eliminate those knowledge gaps, but also help accomplish the mission.
  5. Compromise, without compromising your professional practice as a statistician. It is important that we share knowledge between professionals and learn new skills, but it is also important that we understand when not to compromise sound application of statistical methodology (to say nothing of ethical guidelines, which should never be compromised).
  6. Actively share project leadership when possible. Both data scientists and statisticians work from different perspectives and skill sets; as such, each contributes to mission accomplishment.
  7. Continue to discuss and debate (constructively), as discussion leads to intellectual growth. Inherently, working relationships between statisticians and data scientists can be tenuous. Statisticians may think data scientists disregard important assumptions for analysis and underlying theory in an effort to generate an optimal solution to a business intelligence problem, while data scientists may think statisticians are intractable and recalcitrant into obtuse theoretical considerations. However, by continuing to debate such issues constructively, we are better able to produce analyses that are both efficient and accurate, and help advance the science.
  8. Finally, joke together, as it helps get through the work day.

To be fair, no amount of my belaboring the point will resolve an issue that, much like the dawn of data mining in the 1980s, will take more discussion, debate, and research. While it is important for the debate to continue for long-term benefit of statistical science, I do hope this column provides an example of the way forward for statisticians working with data scientists. Beyond that, it is my hope this column might positively contribute to the larger ongoing discussion by providing an example of ways traditional statisticians might work together in symbiotic relationship with data scientists for mutual benefit, each learning new skills from the other, with an air of mutual professional respect.