Blogs

Hadoop and Spark coaches: Training and insights for all maturity levels

Post Comment
Portfolio Marketing Manager, IBM

Prasad Pandit is program director for product management, IBM Open Platform (IOP), at IBM, and has more than 20 years of experience in the technology sector. Pandit worked as a developer, architect and senior consultant before moving into product management. Working day to day with IBM clients at every stage of their big data journey gives him a broad perspective on the challenges they face and the specific use cases in which technologies such as Apache Hadoop and Apache Spark can add value.

Hadoop and Spark are at the core of many of today’s big data architectures, but different organizations are at different levels of maturity in their adoption of these technologies. In a recent interview with Andrea Braida, Pandit was asked to provide deeper insight into how IBM is evolving its Hadoop and Spark services to help clients at each maturity level address the very different challenges they face.

Why is coaching, guidance and subject matter expertise such a critical issue for Hadoop and Spark deployments in particular?

The big data market is still evolving very quickly, and the technology is very complex; it’s a big ecosystem with lots of different components, all of which are being developed by separate, largely independent teams. Companies that have been using Hadoop for many years can still struggle to keep up with the latest developments. And for beginners, even dipping a toe in the water can be a daunting prospect.

Many of the technologies that large enterprises rely on for data management and analytics have been around for decades. Relational databases were first developed in the late 1970s, for example, and have been used in industry and taught in universities ever since. As a result, although administrating an IBM DB2 or Oracle database is a skilled job, a relatively large pool of talent is available to do it.

By contrast, Hadoop is only just over 10 years old, and a real shortage of people with the skills to set up and manage a cluster still exists. And because Spark only really took off in 2014, the skills gap there is even more severe. Many companies are looking for a partner who can bring the right expertise to the table and also act as a mentor to help them build up their own capabilities in house.

Companies that have been using Hadoop for many years can still struggle to keep up with the latest developments. And for beginners, even dipping a toe in the water can be a daunting prospect. Many companies are looking for a partner who can bring the right expertise to the table and also act as a mentor to help them build up their own capabilities in house.” —Prasad Pandit

What kinds of trends in Hadoop and Spark adoption has IBM identified, and how do these trends feed into the design of the new IBM Lifecycle Support Services for IOP?

We’ve really been led by the demand patterns we see in the market. Broadly speaking, the majority of clients who are looking for support on their big data journey fall into one of three main categories, which relate to their maturity level or position in the Hadoop or Spark lifecycle.

First, we’re seeing a lot of organizations that are interested in big data, but need to experiment and find the right use cases before they commit to their first deployment. Our Developer Assist service provides a block of hours with IBM experts to explore their technical- and deployment-related questions or even work through a specific use case.

For example, we have a client in the banking industry that wanted to invest in Spark to accelerate its analytics processes. The organization wanted to take data from many different departmental systems, create a Spark [Resilient Distributed Data set] RDD context across this data, persist that data in memory and run queries in real time. This approach would help with all kinds of use cases—for example, providing insight for call center teams or fraud detection teams within seconds after a new transaction is processed.

The Developer Assist Support Plan is well suited for this type of use case. IBM experts helped the organization work through the main technical considerations and understand how to organize and prioritize demands on a Spark cluster in a dynamic fashion. This approach helped to minimize the latency between data creation in the source systems and data visualization in the end-user dashboards.

How about organizations that have already identified their use case and want to go ahead with a deployment?

That’s the next big step on the big data journey, and another place where our clients often find that they need extra support. Many organizations lack the skill set to deploy, configure and manage Hadoop clusters that can scale and meet their production demands. That initial hurdle of getting the cluster into production is a point at which many organizations stumble, and getting it right is very important. We have seen many cases in which a badly configured environment seemed to work initially but went on to cause numerous problems and crises further down the road.

To give clients a boost over this challenge, we’ve packaged up a set of best-practices configuration and deployment methodologies into our new Initial Install and Planning service. IBM experts build the cluster and work with the in-house team to create development, test and production environments as required, giving the organization’s data scientists the platform they need to get their Hadoop and Spark workloads up and running.

 “Many organizations lack the skill set to deploy, configure and manage Hadoop clusters that can scale and meet their production demands. That initial hurdle of getting the cluster into production is a point at which many organizations stumble, and getting it right is very important. We have seen many cases in which a badly configured environment seemed to work initially but went on to cause numerous problems and crises further down the road.” —Prasad Pandit 

And I’d guess that the third service is for organizations who have already completed their deployment. If they are already so far along the maturity curve, why do they need support?

This service comes down to the fact that, as we already discussed, big data technologies are still evolving at a rapid rate. Even for organizations at the leading edge of Hadoop and Spark adoption, keeping track of which component is due to release a new version—whether that new version will break compatibility with any of the other components—is still challenging. And that challenge includes how and when they should upgrade.

In some cases, we’re seeing clients who are really pushing the limits of what the current technology can do, and they are waiting on the open source community to catch up with their requirements. For example, one client in the insurance industry has a very advanced Hadoop environment and is currently relying on some change requests that need to be committed by the open source community. It also needed advice on where some of the components are headed because the functionality it requires doesn’t exist in the current stable releases of the components it uses.

What these kinds of clients really need is a dedicated expert who can act as an interface between them and the many individual development teams across the Hadoop ecosystem. Our Designated Support Engineer service is designed to fill that gap. It provides advice, mentoring, configuration planning and reviews, follow up and resolution of support tickets, and it effectively provides a single point of contact for any Hadoop- or Spark-related problems.

The focus is on long-term project success over a 3-, 6- or 12-month period, helping clients make the right strategic decisions about how to enhance their environment over time. In the case of the insurer I mentioned, for example, one of the key services our engineer provided was a detailed review of the recent Spark 2.0.0 release and whether it would provide the capabilities the insurer needed for future projects.

“The purpose of these new services is to make selecting the right engagement to suit clients’ needs easier and more convenient than ever for them. We’re 100 percent committed to delivering whatever our clients need to succeed with Hadoop and Spark.” —Prasad Pandit

What if my needs don’t fall neatly into one of these three categories?

That’s absolutely not an issue. Identifying the three categories has helped us formalize some of our support services into a format that will make sense for a large proportion of clients, but it won’t restrict the scope or availability of the services we offer. If a client needs something a little bit different, we can customize a package that is right for that client.

The purpose of IBM Lifecycle Support Services for IOP is to augment the all-round services that IBM already offers and enable clients to get the most out of their relationship with IBM. It also makes selecting the right engagement to suit their needs easier and more convenient than ever for them. We’re 100 percent committed to delivering whatever our clients need to succeed with Hadoop and Spark.

Where can I learn more about IBM Lifecycle Support Services for IOP?

You can visit our Hadoop site to learn more about Hadoop and the service options. And of course you can always speak to your IBM sales team.