#ITIQSpotlight: Trends and challenges in hybrid cloud analytics
In any successful modern organization, analytics is likely to play a central role in helping decision-makers design and execute effective business strategies. At IBM, as we work with clients across the globe, we’re seeing ever-increasing levels of maturity and confidence in data-driven business models.
However, analytics technologies and practices continue to develop and evolve at a rapid pace—which means that it’s necessary for even the most mature analytics adopters to take a step back, challenge their current perspectives, and engage with some fresh thinking. Regularly reviewing your assumptions and revising your strategy is the best way to ensure that you build an analytics capability that will truly stand the test of time.
In an upcoming series of articles, we’ll be talking to clients who are making time for this kind of blue-sky thinking, and evolving their analytics capabilities to master the disruptive forces in their markets. As an introduction, we decided to explore some of the issues with Braden Callahan, an IBM Spark and Hadoop Evangelist.
Braden Callahan is a Spark and Hadoop Evangelist at IBM. Braden has been working in the data and analytics space for many years. His first exposure to big data came while working at Demand Media—and inspired him to build an analytics practice from scratch, helping the company track depreciation and revenue cycles for millions of individual assets. Following subsequent experience working with data warehousing and pure-play Hadoop vendors, he now focuses on helping organizations blend all aspects of analytics—both traditional and big data—into holistic solutions.
Andrea Braida: In many industries, disruption is the number-one concern—companies either need to find ways to combat disruptive new entrants to their traditional markets, or embrace disruptive technologies themselves to take the fight to their competitors. In your experience, to what extent do your clients see big data analytics as a key tool for dealing with disruption?
Braden Callahan: Among companies that are currently embracing big data technologies, I’m seeing two extremes.
The less-mature adopters are focusing primarily on technologies like Apache Hadoop as a pure infrastructure play—using it to store and archive data in a cost-effective way. They know that all their data is potentially important, so they want to keep it for future use—but they aren’t at a stage where they are ready to actively realize its potential.
The second group are at the other end of the spectrum—they are fully invested in experimenting with predictive models, machine learning, artificial intelligence and cognitive solutions to take full advantage of their big data-sets. In particular, they’re looking to move away from tracking general trends and mass audiences, and finding insights at a much finer grain.
For example, in retail, it’s no longer about broad demographic swathes, but about individual buying patterns and offers. That’s massively disruptive because it gives retailers a level of precision that empowers them to pinpoint exactly where, when and how to speak to each customer, giving them a huge advantage over their more traditional rivals. At the same time, it requires massive computing power, which is where technologies like Hadoop and Apache Spark come in.
Andrea Braida: So if an organization wants to move into that second group, how can they align their analytics strategy to embrace the disruptive capabilities of Hadoop and Spark?
Braden Callahan: It’s clear that there’s a big trend towards a rapid prototyping approach: experimenting with data, building analytical apps quickly, and seeing whether they deliver enough value to fit into the main strategic roadmap.
However, the stumbling block for companies that want to adopt a rapid prototyping philosophy is that there are so many tools and approaches available, and it’s very difficult to work out where you need to invest to make the vision a reality.
That’s where a cloud strategy comes in. With cloud tools, you can try out a tool or start a project, and if it doesn’t work out, you can stop. There’s none of the traditional up-front investment or inertia that comes with selecting, purchasing and installing tools on-premise.
So cloud is a huge disruptor, because it frees you up to focus on learning the tools themselves rather than pouring time and effort into set-up and administration. It means you can experiment and fail fast—and if the costs and risks of experimenting are low enough, then it’s worth being experimental. Experimentation is where disruptive ideas and new business capabilities come from.
“Cloud is a huge disruptor. It means you can experiment and fail fast—and if the costs and risks of experimenting are low enough, then it’s worth being experimental. Experimentation is where disruptive ideas and new business capabilities come from.” -Braden Callahan
Andrea Braida: As this kind of thinking catches on, are we heading towards a tipping-point where organizations will start to focus more on Hadoop and Spark, and less on traditional analytics and data warehousing?
Braden Callahan: Two years ago, I would have said that most companies thought of big data technologies as a way to augment the things that they were already doing with their data warehouses. For example, they would use Hadoop to act as a staging area for large historical data-sets, and load them into the data warehouse for analysis when needed.
Today, I think we’re moving towards a different mindset, where Hadoop and Spark actually start to take over some of the roles that a data warehouse traditionally played. In particular, with Spark, you can do a lot of your analyses without needing to move your data into the warehouse. It’s fast enough to run queries that you’d previously have needed a data warehouse for; and it’s much more cost-effectively scalable. Even more importantly, it’s the first technology that really makes big data analytics approachable for most users. You don’t need to learn Java and write complex MapReduce jobs—you can take your existing skills in SQL, Python or Scala, and translate them into the Spark environment for instant results.
Traditional data warehouses aren’t going to go away, but I think we’ll see companies do more of their deeper analysis outside of the data warehouse. That’s what’s going to push us towards that tipping-point.
“Spark is the first technology that really makes big data analytics approachable for most users. You can take your existing skills in SQL, Python or Scala, and translate them into the Spark environment for instant results.” -Braden Callahan
Andrea Braida: So looking at the next three years, where do you see companies placing their analytics investments?
Braden Callahan: I think people will focus on building efficient, effective hybrid clouds for analytics. The hype is that you can put everything in the cloud, but actually, in most cases, you can’t. Almost inevitably, you will have some data that is too sensitive to live outside your firewall, or there will be security or regulatory considerations that are going to prevent an all-cloud strategy from working out.
So investing in a hybrid approach is going to be the most viable option—but the challenges there are around implementing seamless security between cloud and on-premise systems, and allowing the different environments to interact in a low-cost, high-performance way. Tools to solve these problems do exist, but they’re only just starting to be adopted.
On the analytics side specifically, the question is: what is the best blend of on-premise and cloud technologies? For example, at the moment, really large Hadoop clusters are generally still run on-premise, but I think we’ll see this change as cloud services get more mature.
“The hype is that you can put everything in the cloud, but actually, in most cases, you can’t. Investing in a hybrid analytics architecture is going to be the most viable option.” -Braden Callahan
Andrea Braida: Who are the stakeholders that are driving these disruptive analytics investments? Are these all top-down, corporate-level initiatives, or are we seeing a more organic approach led by individual lines of business?
Braden Callahan: It’s interesting—as organizations in general become more data-driven, line of business teams tend to want to be more involved in the development of analytics. They want to build solutions for their own domain-specific problems, and the approach of using cloud tools for rapid prototyping empowers them to do just that.
At the same time, though, there’s a tension here, because there’s still a need for governance and oversight—particularly as data protection and security concerns are coming to the fore. If the line of business has a completely free rein, it becomes very difficult to know where all your data is, and make sure you’re stewarding it responsibly. On the other hand, if you give too much control to the central IT function, it can stifle the creativity you need to be innovative and disruptive.
Andrea Braida: So as usual, it’s not all about the technology—it’s about the organizational culture too. How do companies go about defusing this tension between freedom and control?
Braden Callahan: One good approach is to give the line of business a lot of freedom to create their own prototypes, but to ensure that when they’ve found a solution that works for them, they hand it over to IT to implement in a really robust, productionized way.
Another model that can be very successful is to create a team within the business that bridges between the IT and line of business organizations, with a specific goal to do deep analyses and design algorithms to solve very targeted business problems. This team is able to focus on hard problems, without getting involved in the operational responsibilities and distractions that IT and business teams have. It’s a model borrowed from the finance sector, where you have teams of quants who are real experts in their field, and you want to keep them free to do their best work.
It’s a difficult balance though—with both approaches, you need a lot of trust and communication between the different teams to make sure everyone understands and works towards the same goal. It’s a challenge that different organizations need to solve in their own ways; there’s no silver bullet.
It’s exactly these issues—where technology and culture collide—that we’ll be seeking to explore in our upcoming #ITIQSpotlight article series. We’ll be talking with IBM clients about how they see their analytics strategies evolving to help them master disruptive influences in their industry; whether they are reaching that tipping-point between traditional and big data analytics; and how they are building new organizational structures and working practices to unleash creativity, while still maintaining control.
Watch this space for the next #ITIQSpotlight, and until then, I recommend checking out this related discussion on Data Warehousing.