Stumbling backward and landing well
Andrew Beam is a second-year master’s student at North Carolina State University. As an undergraduate, he majored in computer science, computer engineering, and electrical engineering, also at NC State.
Unlike most students entering a graduate statistics program, my background is not in statistics or mathematics. As an undergraduate, I majored in computer science and engineering. I took courses on topics such as ANOVA and basic random variables, but, as I later found out, I knew very little statistical theory (just ask my first-year professors).
After graduating, I found myself working for a biotech company on projects that bordered on both computer science and statistics. I mainly wrote R scripts for data analysis, but increasingly became involved with projects that were statistical in nature and found that my statistical ability was wanting.
Through a fortuitous series of events, I became engaged with the EPA’s computational toxicology research program, in which there are many talented scientists working on a vast array of problems, nearly all of which have a large statistical component. I was working on a nonlinear regression problem and found the standard approach of Gauss-Newton often failed, even with a concerted and meticulous search for initial values. I distinctly remember the supervising statistician telling my coworker and me that what we were embarking on would be difficult for most graduate students in statistics. I naïvely thought it would be no sweat, since I “was so good at programming.” Statistics was easy, right? After struggling for a month or two and getting nowhere, I finally abandoned hubris.
… since I “was so good at programming.” Statistics was easy, right?
Doing what any good computer scientist would do when at an impasse, I threw more computation at the problem using a parallel search technique. My idea was to view the regression as a searching and optimization problem, something I knew a great deal more about. In this context, I could use methods I was familiar with to solve the problem, such as evolutionary algorithms.
It was at this point that I was put in contact with Alison Motsinger-Reif of the NC State statistics department. She helped refine my original idea, taking time out of her busy schedule to help me, despite my not being a student. The approach worked well, and we put together a paper outlining the technique.
After a year of work, I was offered the chance to pursue a master’s of science in statistics while working on problems with the EPA computational toxicology group. Given the interesting problems I would be working on and how badly I wanted the statistical training, I eagerly agreed and submitted my application to NC State.
I’m not sure what my expectations were as I entered my first year. I knew my reasons for being there, but I didn’t have a clear picture of what statistics really was as a discipline. Andrew Gelman once remarked in an Amstat News article that he “… was worried that statistics was just too easy to be interesting.” I didn’t expect my classes to be easy (which, of course, they were not), but I think his comment reflects the general lack of understanding of what “real” statisticians do. I certainly had no idea as an engineering student what my statistics professors worked on when they weren’t teaching. Was there more to it?
As my first year progressed and I delved deeper into the world of statistics, I occasionally asked, “What are the ‘big’ questions in statistics?” or “If you could solve just one problem in statistics, what would it be?” Surely I was missing something, but I was unable to find a satisfactory answer.
I was aware of the work being done in other fields—physicists were searching for the Higgs-Boson (also called the “God” particle) while trying to formulate the theory of everything; mathematicians were hard at work on famous problems such as Goldbach’s, the Hodge, and the Twin Prime conjectures; computer scientists wanted to know if P = NP; and biologists were untangling the foundations of life. David Hilbert once said, “If I were to awaken after having slept for a thousand years, my first question would be, ‘Has the Riemann hypothesis been proven?’” Was there an equivalent question for the slumbering statistician?
I eventually discovered what “real” statisticians do, and the answer was simpler than I imagined. Everything. Statisticians are involved in a formative role in nearly all science disciplines, in addition to laying new statistical theory. John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.”
We are now being told about the coming “data deluge” and how statistics is the new “sexy” profession. Statistical savoir-faire beyond p < 0.05 is quickly becoming a necessity for most research scientists. When fellow graduate students in other fields learn I am studying statistics, most of them express a longing to know more about it. I know the feeling of statistical confusion and am glad it is receding.
So, if you are like me and stumbled somewhat backward into statistics, be glad for it; you have landed well.
This is not unlike my current path. Thanks for sharing and letting others of us know that we aren’t alone in this quest!
Thanks for sharing your experience. I have a degree in English, but I “discovered” statistics while taking a intro to stats course last Fall as a prerequisite for vet school. 3 weeks into that class I decided to change my plans for vet school and pursue stats full on.
Great piece, I was an IT professional but during my pursuit of PhD in Computer Science I stumbled upon Machine Learning then Data Mining then Statistics, I just can’t get it out of my mind.
Statistician is the most sexy job in the next 20 years!
This is a really cool/well written article. I talked to someone at JSM who was doing the same thing, software engineer to statistician, and the quote he said that suck out to me was that he “wanted to solve real problems”, and is now doing environmental statistics as well.
There are a lot of people who went straight into statistics but wish they had a much stronger computer science background. It’s cool to see people respecting other disciplines, rather than fighting them against each other (Statistics Vs. Data Science!)
Thanks for sharing, Andrew, now I more than believe am on track.
thanks for sharing I am a graduate in computer science and mathematics and am working as a developer in one Microsoft partnered company in the Gambia .i want to do my masters in statistics and machine learning but was not sure if i will find the field very interesting and what will be the challenges i will be facing . thanks