Next-generation data science: Open analytics ecosystems

Post Comment
Big Data Evangelist, IBM

Open data science is proving to be a seedbed for innovation in the world economy. Open data science projects are revolutionizing the fabric of business in diverse industries, spawning new ecosystems of innovation.

Open team collaboration is essential for unlocking creativity in data science. Creativity comes when people from many backgrounds, roles and skill sets use open source data science tools—such as Apache Spark, R, and Apache Hadoop—to develop and deploy new designs for working and living. Data science initiatives foster innovation when teams combine the key roles and skill sets in pursuit of common objectives:

  • Data scientists can use data science tools for teasing out the insights they’re looking for and for making those insights actionable immediately through applications, visualizations and other consumables.
  • Data engineers can build data processing pipelines that leverage machine learning, stream computing and other capabilities to ingest data from disparate sources, aggregate and cleanse it, and deliver it downstream to smart applications of all sorts.
  • Business analysts can use statistical exploration tools to answer domain-specific questions quickly, easily and without the need of IT assistance.
  • Application developers can use algorithmic capabilities to endow their applications with cognitive smarts that learn from fresh data and take actions that are continually optimized in keeping with contextual, predictive and environmental variables.

Tool capabilities

The pivotal importance of Spark and R in team data science stems from their ability, within open analytics ecosystems, to carry out several objectives: 

  • Facilitate the democratization of self-service data analytics development across enterprises and communities, especially when these programming tools are accessible from within teams’ primary development workbench
  • Enable distributed teams to address bigger data-centric problems and reap commensurately larger business results more rapidly than ever, especially when accessed in a shared, public cloud service
  • Accelerate development of high-performance analytics applications rapidly, flexibly and easily, especially when using them with browser-based notebooks that support code, text, interactive visualization, math and media
  • Provide a unified execution model for big data processing and analytics capabilities all in one environment, especially when deployed in conjunction with Apache Hadoop, NoSQL databases and other cloud-based data platforms
  • the amount of code and number of tools needed to combine a deep stack of cognitive capabilities in a single application, especially when used in conjunction with rich libraries of machine learning, streaming analytics, graph computing, natural-language processing and other algorithms
  • Allow teams to refine analytics applications interactively and iteratively, especially when used in conjunction with data and model governance features that are integrated into the data lakes around which the data science development lifecycle revolves

Tool innovation

Next-generation open analytics tools—such as IBM Data Science Experience (DSX)—provide a critical cloud-based, self-service enabler for decentralized teams to develop innovative data applications. To realize the productivity benefits of cloud-based collaboration in your business ecosystem, join the DSX. And if you’re a working data scientist, data engineering, or data application developer, register to attend the IBM DataFirst Launch Event, 27 September 2016, in New York, New York. It offers the opportunity to engage with open source community leaders and practitioners and learn how to accelerate processes for putting data to work in your burgeoning cognitive business.