Students Nationally to Compete in ASA DataFest 2015

Jeffrey A. Myers

Students from 20 universities and colleges will analyze a large and complex data set and compete for prizes and the attention of employers during the ASA DataFest 2015 competition—a unique collaboration between academe, students, and industry that will be held this spring around the country.

DataFest is an annual competition in which teams of up to five undergraduate students work to reveal insights from a large and rich data set. This unique program takes data-analysis learning beyond the constraints of the typical statistical science course by enabling the students to work on large data sets. Students from engineering, math, computer science, statistics, social science, and other fields of study participate in the event, which is sponsored nationally by the ASA.

ASA members and their employers are invited to sponsor or participate in ASA DataFest, says Robert Gould, founder of DataFest and chair of the DataFest core organizing team. You can do the latter by serving as a consultant at your local event. Contact ASA Program Director Donna LaLonde or your local event organizer to get involved.

“ASA DataFest is a great opportunity to help the next generation of data professionals. You’ll be impressed by what the students do and you’ll have lots of fun, too. I promise!” Gould said.

The following events are scheduled (check the DataFest website for an up-to-date list and organizer contact information):

  • March 20-22—Duke University (host) with the University of North Carolina and North Carolina State University
  • March 27-29—Purdue University
  • March 27-29—Five Colleges at the University of Massachusetts (host) with Smith College, Hampshire College, Amherst College and Mt. Holyoke College
  • April 10-12—Penn State University
  • April 10-12—Emory University
  • April 10-12—University of Maryland (host) with George Washington University and Georgetown University
  • April 24-26—UCLA (host) with the University of California, Riverside; University of Southern California; Pomona College; and Cal Poly Pomona

During the 48-hour event that begins on a Friday evening and concludes Sunday afternoon, each team competes head-to-head for prizes in categories ranging from “Best in Show,” “Best Visualization,” and “Best Use of External Data.” Each team presents its findings to a panel of judges—comprised of graduate students, professors and representatives of the company or organization that provides the data set.

Just as important, the student-competitors try to catch the attention of company and organization representatives attending the event to offer competitors advice and identify students with the best analytical skills for potential job opportunities.

Each year, the data and the challenge are different, but the common theme of making sense of Big Data—larger and more complex than the data sets undergraduate students usually encounter in the classroom—is carried over. The data set, which consists of real-world data of current interest to the providing organization or business, is not unveiled until the start of the competition so participants cannot prepare in advance.

For the first ASA DataFest in 2011, the data consisted of 10 million Los Angeles Police Department arrest records spanning a five-year period. In 2012, the data set was provided by micro-lending site Kiva.org, and online dating service eHarmony.com provided the data in 2013. Last year, GridPoint, a company that offers data-driven energy management systems, the provided data set. Organizers have another large data set that will challenge ASA DataFest 2015 competitors.

ASA DataFest was launched by Gould and the statistics department at UCLA in 2011. In five years, it has expanded to its current slate of events.