Data Scientists: Run Your Mad Experiments

June 18, 2012
Smarter business is a game of incremental improvements. It depends on your ability to produce a steady stream of innovations in your operational processes. Incremental tweaking is not usually a glamorous activity. Minute process adjustments don't usually call attention to themselves. And that's a good thing, because you can roll them out in stealth, with competitors not suspecting or customers detecting any disruptions in your quality of service.   Read More

Week's Worth of Koby Quick-Hits June 11th - 15th, 2012

June 18, 2012
Here  are the quick-hit ponderings that I posted on the IBM Netezza Facebook page this past week. I went deeper on the themes of information glut, DW appliance POCs, handheld petabytes, and online recommendation engines. I started up a new thread on crowdsourcing in big data modeling, development, and exploration. I posted most of these while attending and presenting at the Hadoop Summit (more on that in next week's quick-hits etc.): Read More

Healthy Data: Setting Achievable Goals

June 14, 2012
So far, we’ve discussed the definition of “healthy data” and provided Read More

What Skills are Essential for Big Data?

June 13, 2012
The planetary economy now spins on an axis of big data. Each of us feels pressure to evolve our skills in order to stay ahead of the big data curve. We do this both to remain employable and to seize new opportunities for professional and personal growth. Read More

Next Best Action in Real Time: The True Test of Big Data

June 11, 2012
In business, every moment is a moment of truth. Every moment can spell the difference between keeping a customer or losing them to a rival that makes them a better offer or delivers a superior experience. And no two moments are ever the same. If you don't seize that tiny window of opportunity, you've lost it, and possibly the customer, forever. Read More

Week's Worth of Koby Quick-Hits June 4th - 8th, 2012

June 11, 2012
Here are the quick-hit ponderings that I posted on the IBM Netezza Facebook page this past week. I went deeper on machine learning, continued my meditation on all-in-memory, put out some more Hadoop thoughts in advance of next week's Hadoop Summit (where IBM's Anjul Bhambhri will speak on convergence of Hadoop and data warehousing), and tried to anchor social sentiment in the nitty-gritty of behavioral propensity. I opened up a new thread of meditation: the value of proofs of concept (POC) in the data warehousing (DW) appliance procurement process. Read More

Exploring Uncharted Data: Is there any insight out there?

June 8, 2012
The biggest table in any Netezza database that I know of has over 600 billion rows!! That’s the claim made by our customer, Catalina Marketing. So although most of the data in the world is not relational, there is a huge amount of relational data and IBM technologies are more than capable of performing the most complex analytics on it. Netezza has extensive libraries of in-database analytic functions1 to support SPSS, SAS, R and other analytic tools and languages.  And the special capability that Netezza has to deal with ad-hoc queries means that if your data is relational, or can be mapped to a relational schema conclusively, like the CDRs I wrote about in a previous post, it is a great platform for analytics.  If!   Read More

Informing the Demand Side: Opening Data Warehouses to Collaborative Applications

June 8, 2012
As our populations grow in a world of limited resources governments and individuals seek ways to lighten our load on the planet.  In the Smart Grid R&D Program, PNNL investigates how modernizing the electric grid can help the US meet its carbon management goals. In The Smart Grid: An Estimation of the Energy and CO2 Benefits, a team from PNNL identify nine mechanisms by which the Smart Grid will reduce carbon emissions by 442 million metric tons, or 12 per cent, by 2030. Making the grid smart will save the nation the equivalent of 66 coal power stations, or enough electricity to power 70 million of today's homes. PNNL’s Smart Grid is probably the largest consumer collaboration underway in the US and this collaboration - addressing the information needs of both supply and demand sides of electricity economics - contributes enormously to the success of the program. Consumers on the grid receive real-time pricing and these signals inform their decisions on how and when they consume electric power. PNNL’s report attributes a quarter of the total saving to the conservation effect of consumer information and feedback systems. Read More

Healthy Data: Getting with the Program

June 6, 2012
In my last post, we reviewed what Read More