Making Watson Smarter… Faster
Behind every supercomputer is a talented DBA
February 2011 saw a lot of excitement, as the IBM® Watson™ supercomputer beat the humans Ken Jennings and Brad Rutter on Jeopardy! But if you look behind Watson’s wonder, you’ll find some database technicians whose jobs weren’t that much different from the jobs of many database administrators (DBAs). Without the long hours they worked tuning the DB2 database that stored metadata associated with Watson’s thought process, the outcome might have been different—or at least taken a lot longer to deliver.
Although Watson made the result look effortless, a lot of work was going on behind the scenes. The question-and-answer data for Watson itself was stored in the UIMA format, which is highly suited for the analysis of unstructured data. However, only a subset of the analysis metadata was important to understand how Watson arrived at its answers; this data was pulled out of the UIMA analysis metadata and stored in a Derby open-source database.
A custom Web application, the Watson Error Analysis Tool (WEAT), was then used to visualize the data. For example, after a series of test matches, the team would use the WEAT tool to see how Watson was thinking. “We wanted to see why it chose the wrong answer from its top options,” says Eddie Epstein, the IBM engineer responsible for Watson’s scalability. “What caused a wrong answer to be ranked ahead of a better one?”
But the WEAT tool wasn’t perfect. Running a single query could take minutes, and a group of developers all trying to use WEAT at the same time only made things worse. WEAT was working, and Watson was getting smarter, but the IBM team needed to make progress faster.
Enter Tong Fin, who implemented several key changes. First, he moved the data from the Derby database to IBM DB2® data management. This migration immediately helped improvd performance, particularly for multiple WEAT instances accessing a common database. He also optimized the schema of the metadata within DB2 and achieved an order of magnitude speedup for the slowest queries.
Finally, when Fin looked at the WEAT metadata results, he realized that for a large number of queries, Watson was accessing a common set of data. Only for a smaller number of queries was it accessing a subset of that data. Fin separated the subset of less-common data into a second table, and then optimized the first table to run even faster.
The result of Fin’s work? WEAT ran faster, the development process went faster, and well, you know the rest. Watson set a new mark for what computers can achieve—and a DBA helped it get there.