Blogs

Propelling the Future of Big Data and Data Science

Announcing the release of IBM Big SQL 5.0 on the Hortonworks Data Platform

Post Comment
Offering Manager, Hadoop, Analytics, IBM

Data is a potent business resource and the key to gaining and maintaining competitive advantage. Last month, IBM and Hortonworks announced a partnership to bring data science to the world on an open platform, offering Hortonworks Data Platform (HDP) along with IBM Data Science Experience (DSX) and IBM Big SQL to help everyone from data scientists to business leaders better analyze and manage their data and accelerate data-driven decision making. This collaboration marks the coming together of an industry leading 100% open source Hadoop distribution, an innovative data science platform, and a powerful complex SQL engine. 

This partnership began with our founding memberships in the Open Data Platform Initiative (ODPi.org). The IBM and Hortonworks commitment to open source is proven by our ODPi history, so the choice is clear: Go with an open strategy for data science and Hadoop with IBM/Hortonworks; or, risk getting locked into a proprietary distribution. Our clients will benefit from the innovation this partnership will foster. 

Today we are thrilled to announce that IBM Big SQL 5.0 is available, leveraging the power of Hortonworks Data Platform and other data storage options to provide our clients with a powerful modern data warehouse architecture. Big SQL exploits Apache Hive, HBase, and Spark concurrently for best-in-class analytics capabilities on Hadoop clusters with hundreds of terabytes or petabytes of data, while also providing federated query, administration and security features. 

We know your data is valuable. IBM will also be offering support for Hortonworks Data Platform and Hortonworks DataFlow to give peace of mind, problem resolution support, and maintenance protection to help reduce the risk, time to market, and costs required to deploy any applications powered by them.

What’s New in Big SQL 5.0?

Big SQL V5.0 is designed to make it easier to unlock value from Apache Hadoop. With boosts and improvements to performance, the enterprise-ready version 5.0 of Big SQL brings an impressive framework of features with powerful new capabilities and benefits: 

  1. Intuitively access and leverage the power of Spark 2.1 to analyze the data residing in Hortonworks Data Platform via: 
  • Apache Spark 2.1 integration, featuring efficient synthesis between Spark executors and Big SQL workers.
  • Federated access to relational database management system (RDBMS) sources outside of Hadoop with IBM Fluid Query technology such as Oracle, IBM DB2, and IBM PureData Systems for Analytics.
  • Support on IBM OpenPower servers, enabling clients to run SQL and Spark on a rich family of Linux servers.
  • Capability to create Big SQL tables in a location that is specified as a WebHDFS uniform resource identifier (URI).
  • Capability to create tables that reside on external object stores. Support is provided for Amazon's S3 protocol.
  • Using Spark Connector for connecting new data sources such as SAP Hanna and Cassandra. 
  1. Ingest, shape, curate and publish data while adhering to the appropriate security policies as needed in a typical enterprise via: 
  • Enable Ranger security in Big SQL service to allow:
    • Permissions set in Hive to be propagated to Big SQL tables and views. 
    • Permissions set for Hive through the Ranger service to be propagated to Big SQL tables and views
  • Optimize resource utilization through Yarn/Slider integration (TP).
  • Automatic synchronization of the Big SQL and Hive catalogs, which is enabled by default and ensures that Hive table updates are available to Big SQL almost immediately.
  • Out-of-the-box support for Oracle's SQL and PL/SQL dialects, which enables many applications that were written against Oracle to run in Big SQL virtually unchanged.
  1. Experience improved consumability, performance, and efficiency by providing the appropriate UI to intuitively manage, monitor, and administer the solution via: 
  • Enhanced federation with easier configuration, additional source support, and easier nickname generation through Big SQL’s UI Data Server Manager (DSM).
  • New elastic boost option to create logical Big SQL workers on top of existing Big SQL workers. Logical Big SQL workers help to improve performance and scalability by making better use of available CPU resources.
  • ANSI-compliant SQL parser that can successful execute all 99 TPC-DS queries unmodified at 100TB, even when running 4 concurrent query streams. Only SQL-on-Hadoop engine can do that and can do so 3x faster than Spark SQL, while using far fewer resources.
  • Improved performance by enhanced ANALYZE execution for statistics management.
  • Improved efficiency in the handling of the ORC file format, which leads to improved performance during query processing.  

We are very excited to join forces with Hortonworks to drive innovation and leadership in the open source community, providing our clients with the opportunity to maximize the value of their data. We hope to join you on your big data journey.  For more information about the release, please visit the IBM Hadoop page.