Ryan J. Machtmes, GStat, is an independent research consultant and mathematical statistician, member of the ASA Committee on Statistics and Disability, and accredited graduate statistician of the American Statistical Association. He is a longtime member of the American Statistical Association and the Phi Kappa Phi honor society.
The emergence of data science as a significant field of interest in business heralds a challenge to the practice of statistics. How we, as a profession, respond to this challenge will determine our individual and collective futures. Do we evolve with modernity, or relegate ourselves to a subset of potential outcomes and directions?
Data science poses an important shift to the province of traditionally trained statisticians, particularly in business, as data scientists seem best equipped with the technical skills necessary to harness the power of Big Data (as currently defined). With the novelty of data science comes the proclivity for computer science, business, and statistical science to claim it, while forcing some statisticians to adopt data science as their profession to remain competitive in the job market. Responding to this emergent issue is incumbent upon the members of the statistics profession.
As has been advocated by statistics leaders, including ASA presidents, practicing statisticians are data science. What could be more central to our role as data stewards than to advocate for appropriate use of these data streams? The advent of new and complex problems of statistical inference is not cause to relegate such problems to the exclusive province of information technologists, but to both lead the charge for appropriate use of these technologies and support those statisticians being mandated to assume the roles of business intelligence and data science. We need to determine how best to respond to and work within this new paradigm.
The unique contributions mathematical statisticians are prepared to make to Big Data projects cannot be understated. As indicated in the recently released ASA whitepaper, “Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society”:
Statistical thinking not only helps make scientific discoveries, but it quantifies the reliability, reproducibility, and general uncertainty associated with these discoveries. Because one can easily be fooled by complicated biases and patterns arising by chance, and because statistics has matured around making discoveries from data, statistical thinking will be integral to Big Data challenges.
This does not mean statisticians, data scientists, and information technologists should not work together to solve important Big Data challenges, but rather that statisticians must have equal presence on interdisciplinary teams asked to respond to such issues. The mandate is clear: Statisticians, data scientists, and information technologists need to work together to resolve Big Data challenges. I’ve worked with data scientists and information technologists. The working relationships I shared with these professionals succeeded because of the interdisciplinary nature of the teams. They worked because we complemented one another’s strengths, respected each other’s contributions to the effort, and provided assistance when needed. We formed cohesive teams, and all members were able to share their respective knowledge and experience.
With these examples in mind, I consider the following elements of my interaction tangible best practices other statisticians may apply when working with data scientists:
- Respect one another professionally. Without mutual respect, there can be no lasting team structure to help balance the workloads.
- Don’t assume you know everything, or that anyone does. Every member of the team contributes something.
- Develop one another’s skills and knowledge areas. If there is a data-related concept a statistician doesn’t know, or a statistical concept a data scientist doesn’t know, there is opportunity for professional growth.
- Trust each other sufficiently to admit when the other knows something you don’t and use it as a learning experience. It is humbling and exposes vulnerabilities, however fleeting, to admit gaps in one’s knowledge. But, by admitting one’s knowledge gaps in confidence, we can not only learn and eliminate those knowledge gaps, but also help accomplish the mission.
- Compromise, without compromising your professional practice as a statistician. It is important that we share knowledge between professionals and learn new skills, but it is also important that we understand when not to compromise sound application of statistical methodology (to say nothing of ethical guidelines, which should never be compromised).
- Actively share project leadership when possible. Both data scientists and statisticians work from different perspectives and skill sets; as such, each contributes to mission accomplishment.
- Continue to discuss and debate (constructively), as discussion leads to intellectual growth. Inherently, working relationships between statisticians and data scientists can be tenuous. Statisticians may think data scientists disregard important assumptions for analysis and underlying theory in an effort to generate an optimal solution to a business intelligence problem, while data scientists may think statisticians are intractable and recalcitrant into obtuse theoretical considerations. However, by continuing to debate such issues constructively, we are better able to produce analyses that are both efficient and accurate, and help advance the science.
- Finally, joke together, as it helps get through the work day.
To be fair, no amount of my belaboring the point will resolve an issue that, much like the dawn of data mining in the 1980s, will take more discussion, debate, and research. While it is important for the debate to continue for long-term benefit of statistical science, I do hope this column provides an example of the way forward for statisticians working with data scientists. Beyond that, it is my hope this column might positively contribute to the larger ongoing discussion by providing an example of ways traditional statisticians might work together in symbiotic relationship with data scientists for mutual benefit, each learning new skills from the other, with an air of mutual professional respect.
Ryan J. Machtmes,
Thank you for a wonderful posting ,Wish you and all AMSTAT /ASA professionals all te best a Happy New Year 2015 ,
A highly well written message to all of us. Co-operation , collaboration , Unity and Respect are all hallmarks for any professional , much more to the border line impinging sister disciplines — mentioned.
I am Statistician fist then Operations Researcher and Business Analyst . And mostly solo in my many years in the Corp Staff function roles ,Govt Contractor and other domain areas. I have never failed in any project where we pulled all forces together, and yes compromise ( not a whole lot) must be earned — with in reason for the outcomes to be successful.
Only way I know to keep your professional integrity and stability at the work place. And of course with Management support.
Best for even more successful postings and for ASA professional advance. ” If you can’t measure, you can not learn ”
Now I suggest that we provide an outline of the group functional and some line and other forms for a group of one each of IT, BIG DATA SCIENCE, MATH -STAT /OPERATIONS RESEARCH functions reporting to a Data Science Officer — Not all need to be exclusive nor FT as a resource. But the DSO need to be a recognized expert / Manger as well.
Best wishes,
C.S. Ganti
A good one , read my full text above.. I am posting this as comment / reply on Linked IN from where I got his write-up of yours .
Best ,
The title throws me. We are Data Science (see Amstat News October 2013) and even Burtch Works agrees: http://www.burtchworks.com/2014/11/17/must-have-skills-to-become-a-data-scientist/
‘Education – … Their [data scientists] most common fields of study are Mathematics and Statistics (32%), followed by Computer Science (19%) and …’ Naturally, for Statistical Data Science (analysis of data) this will be about 100%. See our new discussion: http://goo.gl/VjDg5U
So, courteously, why should I read an article titled (a profession limiting title by the way) to inform me about how to work with me?