Blogs

Granting a Gateway to Powerful Analytics: Part 2

Consumption points keep the business intelligence <br>tail from wagging the dog

Senior Principal Consultant, Brightlight Business Analytics, a division of Sirius Computer Solutions

Part 1 of this article examined some of the subtle pitfalls and caveats of tightly coupling IBM® PureData™ Analytics powered by IBM Netezza® technology to a business intelligence (BI) tool. This coupling seems to be the de facto configuration for these two technologies, and only later does the realization occur that doing it another way is preferable. What is the alternative? First and foremost, begin by establishing an entry point for the end-user base. Perhaps the objective is to support a particular tool or application, or just provide a portal for supporting a downstream data process. In short, providing a one-stop shop for the end-user base is the primary aim. It should not give them access to physical data structures, let them see how data is organized in the machine, or let them pick and choose. Instead, it should just serve up the interface and the content thereby.

Establish database entryways

The entry point will be a database with the role of proxy, or a consumption point, that is the only point of entry for the end-user base. This entry point also allows a high degree of security and control at the database level. The consumption point doesn’t host any data; it behaves as a pass-through mechanism or portal to other databases. Consider the following three data-delivery mechanisms:

  • Synonyms—to tables in another database: They do not carry metadata with them, causing a BI tool to run blind.
  • Views: They are a good option—more on views shortly.
  • Stored procedures: To a very limited degree, I would argue that the only reason to implement a stored procedure in a BI and data warehouse solution—regardless of platform—is to bridge or adapt the weaknesses of the BI tool to the strengths of the PureData and Netezza platform. Many have—and valiantly—attempted to use stored procedures for back-end, extract-load-transform (ELT) work, but such solutions are neither sustainable nor scalable for the back end in any database engine. Why imagine that Netezza is exempt?

If considering the use of views, another caveat pops up: using nested views or views calling views. To reuse SQL logic in an existing view seems so practical as a delivery alternative. However, nested views aren’t viable in any other database engine. So again, why think that Netezza is exempt, especially if any one of those tables under the view could expand to a size that the view cannot optimize? If the view is nested, every view consuming it starts to drag. I’ve seen folks nest these things nine levels deep. Without nesting, however, how can logic be reused? Setting up a design-time implementation of the view that shares snippets of common data definition language (DDL) is easy enough to do. This implementation allows for sharing common logic without nesting. The same is true of a stored procedure. By setting up their textual content in a bash shell and $ parameterizing things such as cross-database references, both views and stored procedures are resolved at the time of implementation—at design time—when they are installed. The example presented in part 1 of this article—inverting the three-part query when the BI tool must perform a complex join—provides a good candidate for using a stored procedure. The point is, only implement a handful of stored procedures to handle problematic issues; avoid proliferating them so that they dominate the implementation. In addition, whether using views or stored procedures, I would strongly suggest striving to auto-generate a large portion of their contents instead of having to handcraft it. Whether with an automated tool or a custom-built factory the investment is worth it, and the logic will never have to be edited by hand ever again.

Customize the end-user experience with a view

Many discriminating analysts and end users don’t like the stodgy look of all-uppercase column names such as MARKET_BASKET_TOTAL. They instead prefer Market Basket Total. To satisfy this preference, simply script a function for accessing the catalog and building out an installation-ready view that uses aliases to zap the column names into the desired format. What if an end user wants to mask a column or a row based on the end user’s database group permissions or other logical value? Views transparently shape this reality, which means the database attributes don’t have to change. Just serve up the data as the end-user base wants it, and maintain physical data performance. Shaping logic for the consumption point decouples the end-user base from the physical Netezza structures and provides the freedom to shape and tune them without destabilizing the end-user platform. It provides a high degree of security and control over what the end user sees. Tables, columns, and rows can be included or excluded for a highly customized user experience that is not available through common, wide-open data access. In addition, for downstream, non-end-user processes that consume data from the machine, a consumption point enables knowing exactly which tables end users are accessing rather than worrying about them openly raiding any data point they choose. As a result, both BI tools and downstream consumers are decoupled and adapted, and it engages a healthy, durable relationship with an extremely powerful data processing machine. Please share any thoughts or questions in the comments.