Databases that learn
Applying the power of Neural Networks to SQL processing in Db2
In 1952, IBMer Arthur Samuel created the first implementation of a machine learning system in America — to play checkers.
At first, the system was beatable. Samuel continued to improve the learning capabilities of his checkers program, and in part trained the program by having it play thousands of games against itself. By 1961, Samuel’s programs played the fourth-ranked checkers player in America and won. This demonstrated a level of play not yet achieved by a computer. The evolution of machine learning continued and the range of applicable domains expanded.
By 1997, the IBM Deep Blue computer was able to rival the best chess prodigies of the world, beating World Chess Champion Gary Kasparov. In 2011, IBM Watson technology competed against legendary Jeopardy! champions Brad Rutter and Ken Jennings, winning the first place prize of $1 million. Today, improvements to learning algorithms, combined with cheaper and more powerful computational capacity, as well as plentiful data, are making it possible for machine learning to go mainstream. The computer algorithms that can perform analogous feats that power machine learning are now automating the mundane and providing deep insight into the very complex without the need for explicit human programming of rules and heuristics.
Next-generation SQL processing powered by machine learning
Machine learning is being used at the heart of next- generation methods for self-driving cars, facial recognition, fraud detection and much more. At IBM, we’re applying machine learning methods to SQL processing so databases can literally learn from experience.
SQL is the industry standard language for accessing, querying and manipulating structured data. It is used by stock markets, investment banks, hospitals, logistics firms, insurance companies, pension funds, manufacturers, small companies and large ones. It is a data language that is at once rich in capabilities, elegant in its descriptive power and ubiquitous in use. It’s everywhere. Improving SQL processing literally helps a vast cross section of industries.
SQL offers a fascinating opportunity for the application of machine learning. In particular, for complex query processing, an individual query may have thousands or even millions of possible execution strategies that will produce a correct result. While all of these strategies are correct, some will perform much faster than others. The role of the database SQL compiler is to define the best execution strategy to use to produce the answer to a query. It’s a lot like driving to work. There are many paths you could take, but depending on traffic, construction, weather and unforeseen events, some routes may be preferable. The optimal path may change from day to day or hour to hour. In much the same way, the best strategy to compute a SQL query can be subtle to find and can easily change for a given query as the data inside the database shifts over time and workload pressures vary.
Classically, SQL databases such as Db2 use a sophisticated model to evaluate and select the best execution strategy for each query based on data statistics and advanced modeling of CPU, RAM, network and I/O. This method, used widely across the in the database industry since its introduction by IBM in 1979, does a great job.
Even so, with the application of machine learning methods, we believe it is possible for a database to learn from experience and continuously improve. This will take us to improved levels of stability, consistency, and performance never seen in the decades old and heavily researched domain of SQL processing. We achieve this by applying neural network methods to SQL processing in Db2. Neural networks are a machine learning paradigm that emulates the way scientists believe our brains learn over time.
For everyone working in data, improving consistency and performance of SQL processing so the database will find better strategies to compute query results is an exciting benefit. Performance will become more consistent, and database administrators will spend far less time tuning to squeeze out the final few percent of performance or debug troublesome cases.
To learn more about the latest advancements in machine learning, register to watch "Machine Learning Everywhere," a one hour web broadcast that features comments from thought leaders such as world chess champion Garry Kasparov and General Manager of IBM Analytics Rob Thomas. In addition to the main topic, you’ll see a special demo of Db2 with machine learning applied to SQL execution.