Data Scientists: Illuminate Your Patterns with Pictures
Scientific inquiry is all about finding non-obvious patterns in observational data. It's no surprise that that is also the core of data science.
Patterns may be obvious to any sentient creature, or they may be deeply invisible - until we invent the conceptual or technological tools to bring them to the surface. The conceptual tools may be groundbreaking paradigm shifts, such the "thought experiment" that shaped Einstein's insight into special relativity, or powerful new frameworks of visual notation, such as Feynman's diagrams of subatomic particle interactions.
Patterns feel ghostly and unreal until we can actually see them, on some level, with our eyes. The chief technological tools are whatever scientists and engineers can use to bring these ghosts to light. In the realm of the subatomic, the magical inventions have been visualization technologies such as the cloud chamber and the scanning tunneling microscope (the latter was invented by IBM, by the way).
Most real-world data science serves commercial interests, rather than pure science. But the restless search for deep patterns is no less critical in the business wars than among geniuses vying for Nobel Prizes. Today's data scientists have two broad sets of pattern-sensing tools: advanced visualizations and statistical algorithms. No advanced analytic toolkit is complete without a best-of-breed library of them, with visualizations serving as the core interface at the heart of every step in the development, maintenance, and governance processes. You will find these complementary technologies - visualizations and algorithms - supported within IBM SPSS Modeler and in the complementary Big Data platforms, such as IBM Netezza Analytics, IBM InfoSphere BigInsights, and IBM InfoSphere Streams, where data is stored and resource-hungry computations are performed.
Visual patterns serve the following core functions in the data-science lifecycle:
- Framing the opportunity, problem, and solution: The broader pattern for data science is the business landscape, which has geographic, temporal, process, and other highly visual contexts. Making the case for your next data-science project--be it in social media marketing, supply-chain optimization, or whatever - might hinge on your ability to frame the stakes through well-chosen visuals. Depending on the complexity of the problem, this might be as simple as a quick PowerPoint chart with graphical builds. Or it may involve dynamic visualizations generated from your advanced analytic tools. Or, if you need to present a more complex narrative, you might develop an elegant infographic that combines these visuals, plus detailed in-context annotations and a dynamic process flow, illustrating how the proposed solution will achieve key business metrics.
- Focusing collaboration: The principal patterns that keep data science development teams on track are visual. That's why leading analytics tools all use wizard-driven graphical approaches for defining sources, variables, functions, and dependencies at all steps in the process. These visual patterns - for data discovery, acquisition, preparation, and modeling - provide a common frame of reference for statistical analysts (who know the algorithms) and business analysts (who know the applications) to work toward common objectives.
- Illuminating relationships: At the heart of data science are the patterns, also known as "graphs," showing relationships among myriad variables in the underlying data. Depending on what you're looking for, these graphs can be of mind-boggling complexity, especially as we incorporate a wider array of profile, transaction, temporal, geospatial, social, behavioral, and other factors into the underlying statistical models. To some degree, you may need to rely on machine-learning and artificial-intelligence technologies to surface the most meaningful patterns from this welter. But you'll never eliminate the need for interactive visual exploration by teams of expert humans.
Pattern thinking is the core of any creative endeavor, and the soul of science is creative problem solving. Science progresses when we find new ways of visualizing patterns that have heretofore lain just below the surface.
How will you drive new visualization paradigms, practices, and tools into your enterprise data-science program?
For More Information:
- IBM's big data platform
- The big data conversation
- Follow IBM big data on Twitter
- on the IBM Netezza data warehouse appliance