Discussing Data Science

There is no doubt in my mind that the title “data scientist” comes with sentiments of prestige and utmost responsibility. When one thinks of data science in general, the fields of computer science and statistics seem to come to mind. In my present role, I feel as though I have a unique experience: I can satisfy the expectations of my position; I have a foot in both fields, but belong to neither. Here’s a short story of how I got to where I am and my opinion about where we stand in the ‘data science tug-of-war’ between computer science and statistical science.

Carl Letamendi (Dr. L) is the data scientist at an NYC-based primary education charter school management organization. He earned his PhD in conflict analysis/social science and holds an MBA in finance.

My undergraduate degree is in business, and my master’s degree is in finance, which essentially has a lot to do with statistical methods, forecasting, etc. While I was working on my master’s, I knew I didn’t want to be in banking. It wasn’t of interest to me. As a graduate student during the recession, I did become interested in financial crises and all other social consequences and structural violence that resulted from financial crises, however. I immediately went on to earn my PhD, which was, ironically, in a branch of social sciences called “conflict analysis”—a perfect combination to satisfy my research agenda.

As a PhD student, I had a strong preference for quantitative methods research, and I developed a research interest in quantifying aggregate social behaviors (via indexing) to predict social realities. I created a theory I call “The Cycle of Aggregate Sentiment.”

I was lucky enough to secure two short fellowships at the National Institutes of Health’s National Institute on Drug Abuse (NIH/NIDA) and at the U.S. Department of Agriculture’s Office of Civil Rights, Diversity, and Inclusion (USDA/APHIS/OCRDI), which gave me the opportunity to apply my analytical abilities to challenging projects in public health. However, after completing my fellowships and having my PhD conferred upon me in 2014, I, like most 20-somthings with a PhD, experienced something overly educated millennials experience—I had an extremely difficult time finding a job! After 300+ job applications and dozens of interviews, I felt like I was being discriminated against for having a terminal degree, for actually qualifying for the position, and for being too young.

One of my former classmates told me her employer needed someone who was quant-savvy and understood finance … in other words, me! After connecting with the CEO of the organization, interviewing, and performing a few SPSS work samples using raw data that was sent to me, I was hired! My wife and I stuffed our belongings into a U-Haul and moved from Florida to the NYC area.

My initial role in the organization was that of an “analyst” (financial and data), but we soon realized I didn’t just “analyze”; the data culture was nonexistent and I had no formal training. Eventually, my superiors decided “data scientist” suited me best, since I am the person in the organization who is turned to for data and I encompass the three main skills needed for data science: computer programming, content knowledge, and statistical/quantitative abilities.

I essentially take raw data and students’ assessment scores, decide how these data could be used and what kinds of correlations and assertions I can draw, and develop creative ways to solve any issues the data show me. I do not have a formal “research agenda,” nor is there a set of specific reports the organization expects. It’s basically my job to figure that out and promote a data culture across all elementary and middle schools in our network. It’s been an interesting and lonely road. To date, I believe I am the only data scientist in primary education!

As an outsider with a coveted title, I have noticed “data science” seems to be the rope in a tug-of-war between the fields of statistics and computer science. In my opinion, it is somewhat unfair, but most positions that advertise for my job title require strong knowledge of computer programming skills (SQL, Python, Hadoop, R, C++, etc.). In fact, with just the title alone, I am flooded with emails via LinkedIn from data science/IT recruiters!

I have basic knowledge of Python and R, but even if I were 100% proficient, I wouldn’t use it at work. I use a lot of stats, SPSS, and Excel, but very little programming. However, many data scientist positions I have seen require more programming and coding, less statistics. Personally, I don’t think a computer science graduate can do what a statistics graduate can do, and vice versa. I believe data science requires two fields coming together, just as epidemiology brings together pathologists and statisticians, for instance.

I think data science is a buzzword being used in the tech world, among companies that have what qualifies as Big Data (Facebook, LinkedIn, Twitter, etc.). However, I think they are looking for candidates among themselves, and not among statisticians. It is almost as though our modern statisticians are expected to know how to code. I don’t know if this is something that we, as “quants,” will have to accept and adapt to, or if the field will push coding back to computer science folks and leave the statistics to statisticians. I guess this is one of those instances when it is appropriate to say only time will tell.

I read the ASA and Significance magazines often and one thing I notice is that those who give advice to current statistics students sometimes say they would double major—statistics and computer science—if they could do it all again. But see, I don’t think it’s necessary. I simply think that we, in the field of statistics and in the social sciences, should understand programming to a level of proficiency expected of us to meet the demand. There are free courses out there that can teach the basics. I like to perceive data science, computer science, and statistics as adjacent cogs in a wheel with a similar objective, but they cannot easily replace each other!

If statistics and computer science are engaged in a game of tug-of-war and data science is the rope, I think we should actually use the rope to bind us so we can collaborate by contributing our unique areas of expertise. My hope is that employers will realize a need for both statisticians and computer scientists and advertise positions for both, so as to maximize their analytical potential as an organization.

Discussing Data Science

No Comments

Leave a Reply Cancel reply

Welcome

Departments

From the Archive