From Computer Scientist to Statistician

Stumbling backward and landing well

Andrew Beam is a second-year master’s student at North Carolina State University. As an undergraduate, he majored in computer science, computer engineering, and electrical engineering, also at NC State.

Unlike most students entering a graduate statistics program, my background is not in statistics or mathematics. As an undergraduate, I majored in computer science and engineering. I took courses on topics such as ANOVA and basic random variables, but, as I later found out, I knew very little statistical theory (just ask my first-year professors).

After graduating, I found myself working for a biotech company on projects that bordered on both computer science and statistics. I mainly wrote R scripts for data analysis, but increasingly became involved with projects that were statistical in nature and found that my statistical ability was wanting.

Through a fortuitous series of events, I became engaged with the EPA’s computational toxicology research program, in which there are many talented scientists working on a vast array of problems, nearly all of which have a large statistical component. I was working on a nonlinear regression problem and found the standard approach of Gauss-Newton often failed, even with a concerted and meticulous search for initial values. I distinctly remember the supervising statistician telling my coworker and me that what we were embarking on would be difficult for most graduate students in statistics. I naïvely thought it would be no sweat, since I “was so good at programming.” Statistics was easy, right? After struggling for a month or two and getting nowhere, I finally abandoned hubris.

… since I “was so good at programming.” Statistics was easy, right?

Doing what any good computer scientist would do when at an impasse, I threw more computation at the problem using a parallel search technique. My idea was to view the regression as a searching and optimization problem, something I knew a great deal more about. In this context, I could use methods I was familiar with to solve the problem, such as evolutionary algorithms.

It was at this point that I was put in contact with Alison Motsinger-Reif of the NC State statistics department. She helped refine my original idea, taking time out of her busy schedule to help me, despite my not being a student. The approach worked well, and we put together a paper outlining the technique.

After a year of work, I was offered the chance to pursue a master’s of science in statistics while working on problems with the EPA computational toxicology group. Given the interesting problems I would be working on and how badly I wanted the statistical training, I eagerly agreed and submitted my application to NC State.

I’m not sure what my expectations were as I entered my first year. I knew my reasons for being there, but I didn’t have a clear picture of what statistics really was as a discipline. Andrew Gelman once remarked in an Amstat News article that he “… was worried that statistics was just too easy to be interesting.” I didn’t expect my classes to be easy (which, of course, they were not), but I think his comment reflects the general lack of understanding of what “real” statisticians do. I certainly had no idea as an engineering student what my statistics professors worked on when they weren’t teaching. Was there more to it?

As my first year progressed and I delved deeper into the world of statistics, I occasionally asked, “What are the ‘big’ questions in statistics?” or “If you could solve just one problem in statistics, what would it be?” Surely I was missing something, but I was unable to find a satisfactory answer.

I was aware of the work being done in other fields—physicists were searching for the Higgs-Boson (also called the “God” particle) while trying to formulate the theory of everything; mathematicians were hard at work on famous problems such as Goldbach’s, the Hodge, and the Twin Prime conjectures; computer scientists wanted to know if P = NP; and biologists were untangling the foundations of life. David Hilbert once said, “If I were to awaken after having slept for a thousand years, my first question would be, ‘Has the Riemann hypothesis been proven?’” Was there an equivalent question for the slumbering statistician?

I eventually discovered what “real” statisticians do, and the answer was simpler than I imagined. Everything. Statisticians are involved in a formative role in nearly all science disciplines, in addition to laying new statistical theory. John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.”

We are now being told about the coming “data deluge” and how statistics is the new “sexy” profession. Statistical savoir-faire beyond p < 0.05 is quickly becoming a necessity for most research scientists. When fellow graduate students in other fields learn I am studying statistics, most of them express a longing to know more about it. I know the feeling of statistical confusion and am glad it is receding.

So, if you are like me and stumbled somewhat backward into statistics, be glad for it; you have landed well.