How everyone can become a data scientist

General Manager, IBM Data and AI, IBM

We all know how it works: you walk into a doctor’s office complaining about some ache or pain, so they take your temperature, get you on the scale, check your blood pressure and perhaps even get out the rubber hammer.

The data collected in a typical office visit is only a fraction of the data that could be collected if health were redefined as a data problem. This approach, which is common throughout the world, is as much based on instinct and gut feeling as it is on medical training and experience.

In the near future, collecting data and applying it to solve healthcare problems will transform the cost and effectiveness of medicine—the question is how quickly we can get there.

This week, IBM announced a set of tools, technology and processes to bring data science to the masses. Said another way, armed with IBM technology, everyone can be a data scientist.

Democratizing the access to data in your organization

Every organization sees Hadoop as providing an open-source, rapidly evolving platform that is capable of collecting and economically storing a large corpus of data, waiting to be tapped. Yet, most organizations are not yet fully realizing the value of Hadoop due to a lack of skilled data scientists and developers to extract valuable insight.

IBM wants to make everyone a data scientist, and developments this week include:

  • The announcement of a new version of BigInsights that centers on providing In-Hadoop Analytics, including: text analytics, interactive web tooling for text analytics, seamless integration of R via BigR and enhanced machine learning optimized for Hadoop scale—all aimed at providing the data science and analytics to uncover hidden patterns in the data
  • Confirmation of a firm commitment to open source with the Open Data Platform initiative (IBM is a Platinum Founding Member, along with GE, Verizon, Hortonworks, Pivotal, Teradata, SAS and a number of other Hadoop leaders)
  • Sponsorship of a new curriculum to advance data science and big data skills by making courses freely available to the over 230,000 registrants on the

Medicine in the data era

Getting back to my healthcare example, if you are using an activity tracker, or one of the many healthcare apps available to consumers today, you can imagine how data can influence healthcare. One day , you’ll walk into a doctor’s office and the physician will immediately know why you are there. She won’t need to take your vitals, as she will receive that data directly from your wristband device every day. Instead, the discussion will immediately turn to potential treatments, along with the probability of success with each one. With this quick diagnosis, based on more data and fewer opinions, you are on your way after ten minutes, with confidence. This is medicine in the data era, administered by a physician steeped in mathematics and statistics. In the data era, even doctors will become data scientists.

This post is adapted from my book, “Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns.”

For more information