Innovative business applications: The disruptive potential of open data science

Big Data Evangelist, IBM

Open data science is proving to be a seedbed for innovation in the world economy. Innovators in many industries are at the forefront of using the tools and techniques of open data science to build new designs for working and living.

In addition to IT, open data science projects are revolutionizing the fabric of business and development in a wide range of industries. Creativity comes when people from many backgrounds, roles and skill sets use open source data science tools—such as Apache Spark, R, and Apache Hadoop—to develop and deploy new designs for working and living.

Collaborative creativity

Open team collaboration is essential for unlocking creativity in data science. Rob Thomas, vice president, product development, IBM Analytics, at IBM, said in a recent blog post that tools are essential for making the most of powerful, new distributed computing environments such as the Internet of Things (IoT). One example of such a new open source tool is Quarks, which can help teams leverage Spark to drive algorithmic intelligence to the edge of the Internet of Things.

Data science initiatives can foster innovative designs and disruptive applications when teams combine the key roles and skill sets in pursuit of common objectives. Data scientists can use data science tools for teasing out the insights they’re looking for and for making those insights actionable immediately through applications, visualizations and other consumables. Business analysts can use statistical exploration tools to answer domain-specific questions quickly, easily and without need of IT assistance. Application developers can use algorithmic capabilities to endow their apps with cognitive smarts that learn from fresh data and take actions that are continually optimized in keeping with contextual, predictive and environmental variables. And data engineers can build data processing pipelines that leverage machine learning, stream computing and other capabilities to ingest data from disparate sources, aggregate and cleanse it, and deliver it downstream to smart applications of all sorts. 

Open analytics tools provide a critical enabler for decentralized teams to develop innovative applications in a complex world. The pivotal importance of Spark and R in these efforts stems from several factors: 

  • Facilitating the democratization of self-service data analytics development across enterprises and communities, especially when these programming tools are accessible from within teams’ primary development workbench
  • Enabling distributed teams to address bigger data-centric problems and reap commensurately larger business results more rapidly than ever, especially when accessed in a shared, public cloud service
  • Accelerating development of high-performance analytics applications rapidly, flexibly and easily, especially when using them with browser-based notebooks that support code, text, interactive visualization, math and media
  • Providing a unified execution model for big data processing and analytics capabilities all in one environment, especially when deployed in conjunction with Hadoop, NoSQL databases and other cloud-based data platforms
  • Reducing the amount of code and number of tools needed to combine a deep stack of cognitive capabilities in a single application, especially when used in conjunction with rich libraries of machine learning, streaming analytics, graph computing, natural-language processing and other algorithms
  • Allowing teams to refine analytics applications interactively and iteratively, especially when used in conjunction with data and model governance features that are integrated into the data lakes around which the data science development lifecycle revolves

Advanced applications

A common theme in open data science initiatives is the use of Spark to develop applications that deliver predictive, real-time and machine-learning capabilities to the point of action. Leveraging these and other capabilities, IBM customers are using Spark for a range of applications: 

  • Anomaly detection in cybersecurity and antifraud measures
  • Automated pattern detection in the physical sciences
  • Churn reduction in customer relationship management (CRM)
  • Customer experience optimization in mobile apps
  • Predictive maintenance in the Internet of Things
  • Predictive merchandising through in-store beacons
  • Real-time performance insights in competitive athletics
  • Real-time recommendation engines in ecommerce
  • Targeted offers in outbound marketing 

In addition, research and development (R&D) communities worldwide are experimenting with a dizzying array of new applications that address opportunities to bring Spark, R and open data science into every sphere of our lives. And then the Spark Technology Center has a number of projects, including the following, in development

  • AMBER Alert Aid: This Spark application provides for broadcasting extremely serious missing children cases through the AMBER Alert system. It uses the analytics capabilities of Spark to find vehicles in car traffic video feeds that match descriptions in AMBER Alert reports.
  • Bluemix Genomics: This Spark application enables scientists to understand how genetics contribute to complex diseases. It facilitates processing and analysis of massive amounts of genome data.
  • Red Rock: This Spark application enables users to act on real-time, data-driven insights discovered from Twitter. It transforms a huge volume of Twitter data into an easy-to-digest set of visualizations accessible to a general audience.
  • Search by Selfie: This Spark application allows for real-time facial detection, recognition and intelligence in customer engagement scenarios. It enables the gathering of instant and continual facial recognition that is within reach of business users outside of large-scale enterprises—retailers, event planners or security firms along with potential applications for missing persons as well. Users can capture a photo, extract key features, transform those features to normalize the data of the faces and train facial-recognition models in Spark.
  • SETI + Spark Explore Space: This Spark application analyzes 100 million radio events that have been collected over several years to identify faint signals indicative of intelligent extraterrestrial life. It uses sophisticated mathematical models and machine-learning algorithms to separate terrestrial interference from signals truly of interest. The Search for Extraterrestrial Intelligence (SETI) Institute’s mission is to explore, understand and explain the origin and nature of life in the universe.
  • Tone Analyzer with Watson + Spark + Twitter: This Spark application is used to sift through Twitter data in real time to gauge customer emotions on multiple tone dimensions, ranging from anger to cheerfulness to openness.

Key announcements

At the Apache Spark Maker Community Event, 6 June 2016, IBM is sharing important announcements for helping customers to use Spark, R and open data science to drive business innovations. IBM is also hosting a stimulating evening that is of keen interest to data scientists, data application developers and data engineers. It features special announcements, a keynote, a panel discussion and a hall of innovation. Leading industry figures who have already committed to participate include John Akred, CTO at Silicon Valley Data Science; Matthew Conley, data scientist at Tesla Motors; Ritika Gunnar, vice president of offering management, IBM Analytics, at IBM; and Todd Holloway, director of content science and algorithms, at Netflix.

Register for this in-person event. If you can’t attend, then register to watch a livestream presentation of the event.