How to Do Statistical Research

Terry SpeedTerry Speed is head of bioinformatics at the Walter & Eliza Hall Institute of Medical Research in Melbourne, Australia, and an active emeritus statistics professor at the University of California, Berkeley. His research interests lie in the application of statistics to genetics and genomics and to related fields such as proteomics, metabolomics, and epigenomics.

Editor’s Note: Adapted from “Terence’s Stuff,” IMS Bulletin, Vol. 34, No. 1

Statistical research, for me, usually begins with either trying to find a half-reasonable answer to a question, where I have found no prior approach exists, or trying to find a 60% reasonable answer based on something that is already half-reasonable. In brief, doing something where there is currently nothing, or doing a little better where there is currently something. If what already exists is pretty good, I’ll use it.

This takes place in a context (data, questions). I lost interest in context-free statistical research long ago, partly because any “standard” or “routine” method, model, tool, or technique is likely to need modification or extension in a new context. Therein lies the chance to do some research, if that interests you. If not, use something “off-the-shelf” and hope it does a good job. (In my experience, finding out whether a given method, model, tool, or technique does the job is frequently a research problem itself.) If there is no standard or routine method, model, tool, or technique, go for it and hope nobody notices until you are done!

A strategy I discourage is “develop theory/model/method, seek application.” Developing theory, a model, or a method suggests you have done some context-free research; already a bad start. The existence of proof (Is there a problem?) hasn’t been given. If you then seek an application, you don’t ask, “What is a reasonable way to answer this question, given this data, in this context?” Instead, you ask, “Can I answer the question with this data; in this context; with my theory, model, or method?” Who then considers whether a different (perhaps simpler) answer would have been better?

The ideal research problem in statistics is “do-able,” interesting, and one for which there is not much competition. My strategy for getting there can be summed up as follows:

  • Consulting: Do a very large amount
  • Collaborating: Do quite a bit
  • Research: Do some

Why? A very large amount of consulting means meeting many people and many problems and learning a lot, including where we are ignorant. Then, you might spot some low-hanging fruit. Quite a bit of collaboration gives you an in-depth knowledge of something, rubs your nose in your ignorance, and perhaps motivates you to reduce it a little. Research keeps the brain active and is fun. It also helps careers (fame, fortune), but you know that.

A clarification: For many—perhaps most—of you, the way to do statistical research is to get more data (in context, with questions) through consulting and collaborating. However, for a few of you, it may be to get less data—to find the opportunity to focus on research more and do less consulting and collaborating.

I say do a very large amount of consulting. How can you make this happen, with whom, and how much is “a large amount”? Naturally, the answer depends on your situation. If you are at a university or other research institution, you should have no real difficulty answering these questions. If you are somewhere else, it can be harder.

I say collaborate quite a bit. How do you find collaborators, how do you choose them, and how much is “quite a bit”? The answer here also depends on your situation. Collaboration can arise out of consulting. Collaborate on a topic in which you are interested, with people in the field you like, who are good at what they do, and who are conveniently located so you can see them frequently and become part of the team. When you are asking how much is enough, you can do more! You’ll know. Talk it over with your mentor.

Mentors can help a lot. Help you to get started, help you to carry on, help you know when to stop. Find one! Similarly, your boss can help. S/he should support your efforts, understand your aspirations, accommodate your needs, and see that your efforts are recognized. You may not always be so lucky!

As for Actually Doing Statistical Research …

Most of what I have talked about is arranging the conditions for research opportunities to present themselves; this is by far the major part of the problem. Doing the research is also important. So I offer some quotes and comments (guess the sources!), as well as some of my own experience.

  • Research is 1% inspiration and 99% perspiration.
  • Develop your techniques.
  • If at first you don’t succeed, try, try again. (Then quit. No use being a damn fool about it.)
  • Keep it “as simple as possible, and yet no simpler.”
  • Chance favors prepared minds.
  • My method of overcoming a difficult problem it to go around it.
  • An approximate answer to the right question is worth a good deal more than the exact answer to an approximate problem.
  • Never stop listening to and learning from others.
  • Use all the resources available: CIS, PubMed, etc.
  • Research is the process of going up alleys to see if they are blind.
  • Emulate the masters and mistakes (i.e., copy, but with attribution).