Top analytics tools in 2016

Post Comment
CEO & Co-Founder, Jigsaw Academy, The Online School of Analytics

Data analysis is not cut and dried, providing results in absolute terms. Rather, many tools, techniques and processes can help dissect data, structuring it into actionable insights. As we look toward the future of data analytics, we can expect certain trends in tools and technologies to dominate the analytics space:

  • Data analysis frameworks
  • Visualization frameworks
  • Model deployment frameworks

Data analysis frameworks

Open-source frameworks such as R, with its increasingly mature ecosystem, and Python, with its pandas and scikit-learn libraries, seem poised to continue their dominance of the analytics space. In particular, certain projects in the Python ecosystem seem ripe for quick adoption:

  • blaze
    Modern data scientists work with myriad data sources, ranging from CSV files and SQL databases to Apache Hadoop clusters. The blaze expression engine helps data scientists use a consistent API to work with a full range of data sources, lightening the cognitive load required by use of varied frameworks.
  • bcolz
    By providing the ability to do processing on disk rather than in memory, this interesting projects aims to find a middle ground between using Hadoop for cluster processing and using local machines for in-memory computations, thereby providing a ready solution when data size is too small to require a Hadoop cluster but not so small as to be handled within memory.

R and Python ecosystems, of course, are only the beginning, for the Apache Spark framework is also seeing rapid adoption—not least because it offers APIs in R as well as in Python.

Building on a general trend of using open-source ecosystems, we can also expect to see a move toward distribution-based approaches. Anaconda, for example, offers distributions for both Python and R, and Canopy offers a Python distribution geared toward data science. And no one will be surprised if we see the integration of analytics software such as R or Python in a standard database.

Beyond open-source frameworks, a growing body of tools is helping business users interact directly with data while helping them produce guided data analysis. Tools such as IBM Watson, for example, attempt to abstract the data science process away from the user. Although such an approach is still in its infancy, it offers what appears to be a very promising framework for data analysis.

Visualization frameworks

Visualizations are on the verge of being dominated by the use of web technologies such as JavaScript frameworks. After all, everyone wants to create dynamic visualizations, but not everyone is a web developer—or has the time to spend writing JavaScript code. Understandably, then, certain frameworks have been rapidly gaining in popularity:

  • plotly
    Offering APIs in Python, R and Matlab, this data visualization tool has been making a name for itself and seems on track for increasingly broad adoption.
  • bokeh
    This library may be exclusive to Python, but it also offers a strong potential for rapid future adoption.

What’s more, these two examples are only the beginning. We should expect to see JavaScript-based frameworks that offer APIs in R and Python continue to evolve as they see increasing adoption.

Model deployment frameworks

Many service providers are willing to replicate the SaaS model on premises, notably the following:

  • Domino Data Labs
  • Yhat
  • Opencpu

What’s more, in addition to needing to deploy models, we’re also seeing a growing need to document code. Accordingly, we might expect to see a version control system similar to Github but that is geared toward data science, offering the ability to track different versions of data sets.

Going forward, we anticipate that data and analytics tools will see increased implementation in mainstream business processes, and we expect such use to guide organizations toward a data-driven approach to decision making. For now, keep your eye on the foregoing tools—you won’t want to miss seeing how they reshape the world of data.

Experience the power of Apache Spark in an integrated development environment for data science. Also, join the data science experience and explore how you can use Spark and R to build your own data science applications.