Navigating the Database Programming Terrain
8 Powerful tips for transitioning to the IBM Netezza platform
Okay, it happened to you. After a couple of years of trying to get Microsoft SQL Server to use more than eight processors at once, somebody decided to transition to a warehousing platform, and IBM® PureData™ for Analytics powered by Netezza technology was selected. It’s not like your company is going to find any Netezza programmers to hire, and you were a star on the legacy system, so now you have to be a star on the new one. How different can databases be?
Clearly, it happened to me. Nothing is as easy to program in as SQL Server, but the rough terrain in Netezza can make it more challenging than MySQL, Oracle Database, IBM DB2® database software, or even PostgreSQL. Here are eight tips to ease the transition followed by a comprehensive SQL example (see figure):
1. Create procedures in Aginity Workbench for Netezza with Ctrl+F5
When using Aginity Workbench for Netezza, and a stored procedure will not compile, simply press Ctrl+F5 or run as batch to compile stored procedures. In addition, select Edit Procedure or Create Procedure from the object browser’s context-sensitive menu to get a window in which pressing F5 executes the Create or Alter Procedure command.
2. Be clear on the difference between nzsql and nzplsql
SQL Server lets you define variables and employ control structures in scripts. Netezza does not. SQL Server lets you use common table expressions (CTEs) within stored procedures. Netezza does not. In SQL Server, you can write your script and paste some boilerplate CREATE PROCEDURE code around it. In Netezza, moving back and forth between scripts and functions, while more useful, is also much more work.
3. Hack down a query to see the error
Aginity Workbench does not show the error for a significantly long query, and there is no way to fix this behavior. Just remove parts of the query and hope they didn’t contribute to the error until the query is small enough that the error message shows at least part of the error.
4. Do not mess around with case sensitivity
With the default collation, SQL Server provides one record for a set grouped across a field containing mixed case and uppercase strings of the same characters. Netezza returns two records. The mixed-case data is probably better suited for reporting, and it sorts higher than the uppercase string. Queries are going to look a lot like this one:
SELECT max(happy) from mable_ GROUP BY UPPER(happy);
DROP TABLE mable_;
Including the GROUP BY UPPER statement is undoubtedly going to create a performance hit; if the query is slow, this clause could be the problem—but it probably isn’t.
5. Use EXECUTE AS CALLER to share temporary tables
The EXECUTE AS CALLER statement will make temporary tables visible with the expected names in _v_table and _v_relation_column, against which you will want to validate. Netezza does not have table variables. Passing a temporary table with a known name and shape is one of the sneaky ways to get data out of a stored procedure that provides one return value of arbitrary type or one result set. And if EXECUTE AS OWNER is used, the temporary table is not going to be available to two procedures with different owners. Note that this statement should go to production only if the caller is going to have access to the required data.
6. Write a DROP for every CREATE TEMPORARY TABLE
And do it right away. It can be placed at the end of the procedure if desired, but at some point in the future somebody is going to call the stored procedure twice in the same user session. Netezza will then fail to create that temporary table because it already exists.
7. Rejoice that loops are pretty easy
Cursor navigation can be pretty easy in Netezza. There is none of SQL Server’s nonsense about lining up variables and fetching records a bit at a time. Declare a RECORD-type variable; call the—case-insensitive—field name with a dot; and then step through records with a FOR rec in <query> LOOP; <do stuff>; END LOOP; syntax. LOOP is such a great feature that it would be my great pleasure to elaborate on it, but that’s all there is to it.
8. DECLARE variables and ALIAS parameters
“DECLARE variables” isn’t a tip so much as it is a requirement, but here is a helpful way to think about it: each variable declaration is a statement. That’s why they are separated by semicolons and not commas. Netezza doesn’t require an ALIAS for parameters, of course. But even if you do prefer $2 to pSector, please consider the human cost. Whoever is tasked with maintaining and extending the code—which will likely be you—is going to appreciate a separate, representatively named variable for each variable use.
Netezza is an awesomely powerful tool for warehousing, analytics, and statistical analysis. Hopefully, these tips will smooth some of the bumps in its adoption.
CREATE TABLE tblSelect13Characters AS SELECT 'Hello, World!' AS PROMPT
-- You aren't going to have a chance to distribute and organize later
DISTRIBUTE ON (PROMPT) ORGANIZE ON (PROMPT);
/* CREATE OR REPLACE is cool, right? See, it's not all pain. Name, parameter set, return type, language. There are no OUTPUT parameters. You get one and only one return value. You can finesse this in various ways - especially using the XML in the SQL Toolkit -- but it's now a fact of the world you live in.*/
CREATE OR REPLACE PROCEDURE SELECT13CHARACTERS(NCHAR(13)) RETURNS REFTABLE(tblSelect13Characters)
/* As far as I know, the parser can only handle one language, but you have to tell it which one you're using every time. */
LANGUAGE NZPLSQL AS
/* BEGIN_PROC is some sort of macro for people who tried to use stored procedures before version 6.0 really appreciate.*/
-- DECLARE is optional; you only need it if you want to define variables or alias paramters
-- I strongly recommend ALIASing your parameters. It's just so much more readable.
pPrompt ALIAS FOR $1;
-- 'p' for Parameters, 'v' for Variable, as suggested by Netēzza.
vLength byteint; -- which you may know as tinyint
-- if you want to iterate through a result set, you are going to need a RECORD type variable
rec RECORD; -- I always call it 'rec' because the documentation does
vTimeOfDay := timeofday();
/* return data to the user through notices. If you're hard-working and committed, this can be a workaround to the 'one return value' rule */
-- String substitution is indifferent to data type
RAISE NOTICE 'ENTERING SELECT13CHARACTERS(nchar(13)) at %', vTimeOfDay;
-- Assignment is a colon and an equals sign. PL/SQL derived from PASCAL, remember.
vLength := CHARACTER_LENGTH(TRIM(CAST(pPrompt AS nvarchar(13))));
IF vLength > 13 THEN
-- Note the semicolons at the end of each line. They're not just for CTEs!
RAISE NOTICE 'There is no way to get here. I just wanted to show off ELSE IF.
Man, I love strong typing';
ELSE IF vLength < 13 THEN -- this is the ELSE for one IF block, and an IF for another
-- EXCEPTION starts looking for a handler up the call stack
RAISE EXCEPTION 'This procedure only selects 13-character strings. % does not qualify', pPrompt;
ELSE -- This is the ELSE for the second distinct IF block
RAISE NOTICE 'Loading % into return data set', pPrompt;
-- Two IFs mean two END IFs!
END IF; END IF;
-- So, this is the model I tend to use. Load a table named OUTPUT, then copy it into the REFTABLE.
CREATE TABLE OUTPUT_ AS SELEcT pPrompt AS PROMPT;
-- Heck, let's show the caller what we're going to return
FOR rec in SELECT * FROm OUTPUT_ ORDER BY PROMPT DESC LOOP
RAISE NOTICE 'The thirteen-character string is ''%''', rec.prompt;
/* All of your hard work comes down to this. You load your data into the REFTABLE. It is then available to the caller, who is just waiting for the thread of execution. */
EXECUTE IMMEDIATE 'INSERT INTO ' || REFTABLENAME || ' SELECT * FROM OUTPUT_;';
/* You have to manage your own temporary tables! Without the DROP for every temporary table you create in the procedure, your proc might work fine for a while. But, one day, a process will reuse a session. And you get unexpected failures that will be a huge pain to track down. */
DROP TABLE OUTPUT_;
-- END_PROC has to follow END, to balance the BEGIN_PROC
/* SELECT, EXECUTE and CALL all work the same. You can mix it up, but someone in your group will probably set a standard at some point */
SELECT SELECT13CHARACTERS('Hello, World!');
EXECUTE SELECT13CHARACTERS('Bye bye, Love');
CALL SELECT13CHARACTERS('13 Characters');
/* Run this on its own. You have to be in a stored procedure to handle exceptions
SELECT SELECT13CHARACTERS('Too Short'); */
/* Suprise! Your data is silently truncated. */
SELECT SELECT13CHARACTERS('By way of contrast, this is way too long');
/* To manipulate the data, you have to paste it into another temporary table. Copying is very fast in Netezza, so long as you keep the distribution key consistent */
CREATE TEMPORARY TABLE LIVETOSERVE_ AS SELECT SELECT13CHARACTERS('live to serve')
DISTRIBUTE ON (PROMPT) ORGANiZE ON (PROMPT);
SELECT 'I ' || PROMPT FROM LIVETOSERVE_;
-- even your scripts have to drop their temporary tables!
DROP TABLE LIVETOSERVE_;
-- We need to drop the stored procedure before the REFTABLE
DROP PROCEDURE SELECT13CHARACTERS(nchar(13));
DROP TABLE tblSelect13Characters;
Sample SQL exhibiting application of the tips for transition
Please share any thoughts or questions in the comments.