Informix 12.10: New Grid Queries
Experience a paradigm shift in thinking about database queries
IBM® Informix® 12.10 was announced and released on March 26, 2013, but I was in South Africa working on a project and missed the launch webcast. Two days later, I was speaking at the launch of the South Africa Informix User Group with Stuart Litel, president of IIUG, and Jerry Keesee, Director, IBM Informix Database Development. Jerry also missed the Informix 12.10 launch webcast because he was working with customers and presenting at the new Informix user group meeting in South Africa. This is an exciting release—and if you have not seen the webcast, you can watch the replay here.
One of the most interesting topics covered in the launch webcast was the benchmark by Pronto Software, an IBM Informix enterprise resource planning (ERP) vendor that showed major performance improvements with the new release. My beta testing with the software also reflects a performance improvement; more will be forthcoming on that topic. For an excellent overview of the new features, read this article by Joseph Veleeparambil and Shawn Moe.
One element of the new release that I think represents a real game-changer is the new grid queries feature. We have always been able to perform distributed queries in Informix, where you select data that specifies an external source. In distributed queries, you have to know where the external data is located—that is, which database and server to query. In grid queries, you don’t have to know the database or the server where the data resides. The tables, databases, and servers just have to be defined as part of the same grid. This also means that you could distribute your very large data tables across many servers and run queries as if you were querying one table. (Does this remind anyone of Informix XPS?)
Let’s try a simple example with three Informix servers called s1, s2, and s3. Suppose each has a database called “parts” with a table called “inventory.” The table has the same structure on each server; the data is not replicated so it will be unique to each server. A traditional distributed query to get all the parts that need to be ordered would be a union of three queries specifying the “database@server:table.” It might look like this:
Select partnum, sku, qty from parts@s1:inventory where order_date >= today
Select partnum, sku, qty from parts@s2:inventory where order_date >= today
Select partnum, sku, qty from parts@s3:inventory where order_date >= today;
In a distributed query, you must specify the remote database, the server, and the table to get results.
In a grid query, however, once you have created and identified the databases, servers, and tables as part of a grid, you do only one query. This query might be:
Select partnum, sku, qty from inventory GRID ALL grid1
where order_date >= today;
You can also set an environment variable, SELECT_GRID_ALL, which indicates that all queries are grid queries. Once this is set, you do not even need to include the keyword GRID in your SQL. For example:
Select partnum, sku, qty from inventory
where order_date >= today;
All three examples will return the same results. The last example will be the easiest to implement and get right. With the environment variable, your queries do not even change. You could set up front-end query tools to use the grid—tools that do not even need to know they are doing a grid query. The paradigm shift is that we can do queries on a grid and not have to worry about where the data is located, or what server it is on. The grid becomes the database.
Feedback? Questions? Let me know in the comments.
|[followbutton username='IBMdatamag' count='false' lang='en' theme='light']|