The Internet of Things: A time series data challenge
The world is becoming more and more instrumented, interconnected and intelligent, resulting in mountains of newly generated data. With storage costs coming down significantly, companies now want to leverage this instrument-generated data (including meter, temperature and all types of sensor data over time) for conducting analysis. Among all the types of big data, data from sensors is the most widespread and is referred to as time-series data.
So many records, so little time
Traditionally, since the early 1980s, RDBMS has been the de facto storage system. But the RDBMS in its natural format was not designed for storing or managing time-series data. For example, a power utility company may have one million customers and store one reading per month. This factors to one million readings per month (or 12 million readings per year) which would be stored in a relational database table. But today the same companies want to understand the usage of power not just on a monthly basis for billing purposes, but at more frequent intervals for load balancing, CRM purposes and more. So now this data is being collected multiple times each day. That is a lot of data!
If meter data is being collected 10 times per hour (every six minutes) this basically means that each meter sends data 87,600 times a year (10 [times per hour] multiplied by 24 [hours per day], then multiplied by 365 [days per year]). Compare this to 12 times a year and the data size has grown 7,300 times!
A regular relational database would store each meter reading as a separate row, which means that the table grows vertically. There is a lot of duplication as the static meter information like ID and time stamp are repeated in every row. A database administrator (DBA) will tell you that you can avoid that by normalizing it into multiple tables and creating indexes, but ask a good DBA and they will also tell you how they would avoid indexes if possible and that, for analytical queries, de-normalizing is the way to go. This is especially relevant when you consider the scale of the data.
If you consider a million meters sending in data, that’s 87.6 billion new rows per year. Even if the table can handle that much data, loading and querying that data will take too long to satisfy client SLAs (Service Level Agreements). So, to handle this time-series big data, you need a time-series solution.
Static and dynamic data to the rescue
IBM Informix is an object relation database management system that has a technology which was created as an add-on called TimeSeries. It works on a simple, but brilliantly well performing concept of breaking down the data into two pieces: static and dynamic. The static data is the meter information that does not change (like the meter ID) and the dynamic data is the information that changes, like the meter reading itself.
Using Informix TimeSeries, even if there are 87,600 entries per meter, it will all still be stored in a single row. While it is a single table for storing and retrieving purposes, you can think of it as two tables: “front” table and “rear” table. The front table contains the static info and is like a regular relational table with a primary key. The dynamic meter reading will be stored in the rear table, in accordance with time. So each meter’s data is stored in a single row, with every new entry growing the table horizontally.
The best of both worlds
Informix TimeSeries stores the data a lot more efficiently by taking the best of both worlds of relational (the ease of use of the SQL interface) and time-series (efficiency of TimeSeries). A number of the benchmarks conducted by various entities show hugely disparate results that clearly skew in favor of Informix TimeSeries solution:
- Dealing with More and More Data: Challenges for the Utility CIO
- Scalability Demonstrated for Meter Data Management
The time series big data challenge is one of the most prominent issues when venturing into the world of the Internet of Things, and it needs to be handled appropriately—IBM Informix TimeSeries does exactly that.
Informix for everyone
Learn more about Informix TimeSeries in this new white paper focused on the chief information officer and also in the IBM Redbooks publication "Solving Business Problems with Informix TimeSeries."
And, if you are a new-age developer, not wanting to learn a new API, or restricted to certain languages or platforms, rest assured: representational state transfer (REST) API support is available in IBM Informix. REST allows you to use any programming language or platform that supports HTTP. This API enables unified access to NoSQL/JSON collections, relational database tables and Informix time series data.
For a fuller explanation, consider attending the session “REST: The Key to Driverless, Unified Access to JSON, Relational and Temporal Spatial Data” at the IBM Insight Conference in Las Vegas, NV, October 26 through 30.