Taking a strategic approach to data access, movement and preparation
Hernando Borda, offering manager for IBM Bluemix Data Connect, IBM Analytics, at IBM, plays an instrumental role in building out the capabilities of IBM Bluemix Data Connect, a cloud-based, data preparation and movement service. Borda’s background is in computer science and software engineering, and he is a named author on an IBM patent for multithreaded processes.
A new way to extract, transform and load
We interviewed Borda to find out why he believes IBM Bluemix Data Connect is a well-suited tool for a new strategic approach to data preparation.
What are the key challenges that IBM Bluemix Data Connect aims to solve?
Well, everyone agrees that data analytics is a growing source of value and competitive differentiation for enterprises. What’s also clear is that it’s challenging for data professionals—including application developers, business analysts and data scientists—to quickly get their hands on the good-quality data they need. According to a Forrester study, these people are spending up to 80 percent of their time finding and refining data, which means they don’t have much time to actually do useful analysis with it.
A key challenge is that enterprise data is, rightly, subject to heavy governance and security. Data is arguably the most valuable asset for all businesses, and IT professionals need to absolutely safeguard it by controlling access. This safeguarding leads to the ask-and-wait cycle: data professionals submit a request for a particular set of data, wait for the response, find that it’s not exactly what they were expecting, resubmit a refined response, and so on. The cycle is frustrating for them, and it’s also just as frustrating for IT staff, who need to put aside more interesting and valuable work to try to interpret ad hoc requests. In addition to the administrative costs and inefficiencies, this iterative cycle of tactical requests introduces significant delays, making it challenging for the enterprise to seize new opportunities.
Analyzing data from multiple sources can yield richer insights, and therefore data scientists are also faced with the challenge of getting access to multiple and disparate sources of data including on-premises data and data from the cloud. For example, they might want to understand the impact of weather conditions on sales. In this scenario, not only do they need to source the appropriate internal data by negotiating with IT, but they also need to combine it with data from weather.com to create a hybrid data set. Read the report “Don’t let data preparation get in the way of your analytics.” It discusses how leaving data preparation until the end can kill your analytics, and why cloud-based data refinement services are a game changer for data science professionals.
Where does IBM Bluemix Data Connect fit in?
IBM Bluemix Data Connect is a self-service data preparation and movement solution that enables business users to load data from multiple sources, transform it and deliver it to multiple targets. It automates tedious and time-consuming tasks to shape, format and get data ready, and it enables data professionals to preview their data, cleanse it and deliver it for downstream analytics. And because the service sits on the IBM Bluemix platform, pushing the cleansed and profiled data straight into analytics services such as dashDB and IBM Watson Analytics can be easy and seamless.
And it provides a user-friendly, spreadsheet-like interface that empowers data professionals to find the data they need, get it into the format they want and deliver it to their preferred analytics tools. By giving data professionals fast, self-service access to relevant and easily consumable data, IBM Bluemix Data Connect cuts out the middleman and reduces time to insight. Because we’re open to a large number of sources and targets, IBM Bluemix Data Connect also plays perfectly in the hybrid cloud and on-premises scenario I outlined earlier for trying to combine sales data with weather data. It provides not only connectivity to the most common and widely used data sources in the cloud, but it also secures gateway technology to reach into on-premises data behind the firewall.
And it enables data professionals to join data from multiple sources; assess its quality; filter out unwanted, low-quality or null sets; and apply functions such as string transformations or unit conversions. They then can sort the data into the right format for their downstream analysis or application. IBM Bluemix Data Connect then transforms the data according to the defined actions and pushes it to the next stage. Read more about it in “IBM DataWorks: Smarter data preparation for the next generation of analytics.”
How will IBM Bluemix Data Connect change the way businesses go about data preparation?
The traditional, tactical approach of having data professionals make iterative requests to the IT department is no longer fit for purpose: neither side can afford all the effort and delay that it involves. Anecdotally, the classic approach results in 80 percent of time spent on data preparation versus just 20 percent on analytics. The usual workaround in the past was for the business to go behind IT’s back and build its own silos. This shadow IT approach is a bad idea in terms of data governance and security, as well as being hugely inefficient. Equally, data preparation shouldn’t be an afterthought; if handled appropriately from the beginning, it can yield richer insights.
The IBM Bluemix Data Connect services enable a new strategic approach to data preparation, acting as a central point of control that can be managed by the business and supervised by IT. It enables the business to go at the speed it wants without compromising vital governance and security around enterprise data. And it frees up IT staff from a huge number of time-consuming, low-value and frankly boring data preparation tasks. Our tagline is IBM Bluemix Data Connect allows you to ACT on your data; ACT is an acronym for these functions:
- Access your data, both on premises and from the cloud
- Clean your data, to resolve any mismatches when you combine different sources
- Transform your data, to get it into the format you need for your applications or downstream analytics
Of course, IBM is not the only player in the data preparation space, but it is the only one that can reach your data across different clouds, whether they are Amazon AWS or Microsoft Azure clouds. We also have the advantage in terms of our ability to provide end-to-end services to businesses, helping them every step of the way from sourcing data to using the full ecosystem of analytical and development tools available in the cloud. We’re also evolving very fast and continually adding new tools to make transforming their data faster and easier for knowledge users—but that topic is for another time.
A new understanding of data preparation and movement
Join us at IBM Insight at World of Watson 2016, 24–27 October 2016, at Mandalay Bay in Las Vegas, Nevada. In particular, attend a presentation and live demo showing how business users can benefit from IBM Bluemix Data Connect and IBM Watson Analytics to produce great business insights. This session takes place Wednesday, 26 October 2016, where you get a deep understanding of what is data preparation, what is the problem it targets and why it is relevant to the business. You also see the service live in the context of a business scenario and how to turn data made relevant with IBM Bluemix Data Connect into business insights with Watson Analytics. The demo showcases how self-service data preparation empowers users to blend data coming from disparate sources and delivers it to Watson Analytics, where it is visualized, to derive predictions about business behaviors. And this process takes place without needing assistance from IT or technical skills to transform raw data into insights.