Blogs

Having What It Takes to Be a Data Scientist

Discover the requisite components of the data scientist skill set to successfully attain data expertise in the big data era

Marketing Program Manager for Data Warehousing, IBM

Making an observation about their jobs in 2008, engineers Jeff Hammerbacher and D.J. Patil coined the term data scientist to refer to people engaged in a variety of activities in which insight is refined from data. The term launched a phenomenon waiting to happen.

Sure, data warehouses have been around since, well, almost forever, and analytics and mining of data from billions of smart devices has for many years been the serious pursuit of dedicated professionals. The data scientist personifies the quest for knowledge—a job title worthy of taming big data. By 2012, the data scientist was not just a hot job title when the Harvard Business Review opined that the data scientist was the sexiest job of the twenty-first century.1 But what makes a data scientist a data scientist?

Skill set

As the data scientist title implies, working with, having a passion for, and being comfortable with data are natural prerequisites. Basic college-level statistics and at least a working knowledge of statistical programming languages such as open source R are also good starting points for building the skill set. The practical application of these statistical methods and multivariable calculus is also a necessary, high-level skill to possess. Package statistics and algorithms have their place, but the ability to know when and how to use them, or when to build customized analytics, is the awareness organizations are looking for in data scientists.

Formal training as a software engineer also can be useful in small companies that have more data challenges than they have resources to face them. A software engineer’s data management and application development experience can come in handy in these situations. Data cleansing is a fact of life, and scripting skills are also often used to find and fix data quality problems.

Machine learning is the discipline that actualizes artificial intelligence. This endeavor can take the form of supervised or unsupervised learning, and encompasses the desire to have a machine think or arrive at conclusions without explicit human programming. Machine learning is popular in scientific applications with disparate data sets at heavy volumes, and data scientists attack this data with algorithms written in programming languages such as Python or R.

The ability to communicate the gems found in data work is the second half of the battle for data scientists, and data visualization tools enable this effort. Successfully transferring the findings to different audiences points out that the soft skill of group presentations is not emphasized as much as it should be. Being able to draw effective graphical representations of complex data interactions is a true test of someone demonstrating the ability to think like a data scientist.2

Pathways

Under the broad data scientist umbrella, there are several pathways for developing complementary data scientist skill sets. According to an article by Dave Holtz in the higher-education provider and Stanford University spin-off, Udacity,3 data scientists can be found in the following different work roles:

  • Data analyst: This specialist aspires to be a data scientist by building a skill set that includes basic data extraction, visualization, and analytics. These skills compose an apprenticeship for building credibility that can lead to advanced analytics and problem-solving endeavors.
  • Data engineer: Because demanding data environments need to be managed, this area requires individuals who are not necessarily statistics or machine-learning experts. Too much data coming too quickly is the immediate problem that requires solving and presents the opportunity for candidates to grow, excel, and shine.
  • Contributor at a data provider: An advanced degree in mathematics or statistics is a good fit for this pathway because the role involves delivering analysis or machine-learning packages as a consulting service. The focus and lifeblood of the organization depends on providing specialized insights that are not readily available elsewhere.
  • Specialist at a data-driven organization: Individuals in this area mesh with an existing data team, so it offers ample opportunities for mentorship. This situation can call for a data specialist or possibly a professional who has several areas of expertise in big data.

Employer needs

Many employers don’t expect to find any one person having 5–10 years of relevant experience in each of the key skill sets outlined here. And as in most job searches, a match between an employer and a prospect is the primary determinant for candidate selection. Having stated these disclaimers, consider how Airbnb, a web-based rental agent for accommodations, approaches a search for a data scientist.

Airbnb conducts the routine phone-screening process that assesses a prospect’s data-driven skills and what the candidate understands about Airbnb. A data scientist prospect can advance to the next round by passing a data challenge that should be fairly easy for anyone possessing a data scientist skill set. The on-site data challenge lasts for a day and involves the Airbnb team interacting with the applicant to jointly arrive at a solution for a broadly stated problem. The hirer measures communication ability, working with a team toward a deadline, and attention to solution details. After passing this major test, additional interviews transpire to assess the prospect’s ability to articulate Airbnb’s core values and capability to work well with business partners.4

Team membership

For professionals seeking data scientist positions, the job search very much resembles a high-level search for an IT consulting company. Prospects need to research a potential employer’s business model and be ready for case study problem solving. Applicants can also take self-paced courses to beef up their data analytics skills. A last piece of advice can be taken from Patil in a comment he made about how he now views the data scientist term he helped coin: “People make a mistake by forgetting that data science is a team sport.…[T]here’s not one single data scientist that does it all on [his or her] own.”5

Please share any thoughts or questions in the comments.

1Data Scientist: The Sexiest Job of the 21st Century,” by Thomas Davenport and D.J. Patil, Harvard Business Review, October 2012.
2,38 Skills You Need to Be a Data Scientist,” by Dave Holtz, Udacity.com, November 2014.
4How Does Airbnb Hire Data Scientists?” by Riley Newman, Quora.com blog, January 2014.
5Data Science Handbook: 3 Tips for Becoming a Data Scientist,” by William Chen and Carl Shan, Venturebeat.com, December 2014.