Productionizing Your Big Data: A Checklist of Key Considerations
Big data can’t prove its business value if it remains in a perpetual proof-of-concept phase, without the availability, security, backup and recovery, and other robust assurances we take for granted on enterprise information technology (IT). How can you prepare your big-data deployment for delivery into a production IT environment such as your corporate data center? And what exactly does it mean to say that big data, or any IT initiative, is truly production-ready?
Production-readiness means that your big-data investment is fit to realize its full operational potential. If you think “productionizing” can be done in a single step, such as by, say, introducing HDFS NameNode redundancy, then you need a cold slap of reality. Productionizing demands a lifecycle focus that encompasses all of your big-data platforms, not just a single one (e.g., Hadoop/HDFS), and addresses more than just a single requirement (e.g., ensuring a highly available distributed file system).
Productionizing involves jumping through a series of procedural hoops to ensure that your big-data investment can function as a reliable business asset. Here are several high-level considerations to keep in mind as you ready your big-data initiative for primetime deployment:
- Stakeholders: Have you aligned your big-data initiatives with stakeholder requirements? If stakeholders haven’t clearly specified their requirements or expectations for your big-data initiative, it’s not production-ready. The criteria of production-readiness must conform to what stakeholders require, and that depends greatly on the use cases and applications they have in mind for big data. Service-level agreements (SLAs) vary widely for big data deployed as an enterprise data warehouse (EDW), as opposed to an exploratory data-science sandbox, an unstructured information transformation tier, a queryable archive, or some other use. SLAs for performance, availability, security, governance, compliance, monitoring, auditing and so forth will depend on the particulars of each big-data application, and on how your enterprise prioritizes them by criticality.
Stacks: Have you hardened your big-data technology stack – databases, middleware, applications, tools, etc. – to address the full range of SLAs associated with the chief use cases? If the big-data platform does not meet the availability, security and other robustness requirements expected of most enterprise infrastructure, it’s not production-ready. Ideally, all production-grade big-data platforms should benefit from a common set of enterprise management tools such as described in this blog. Key guidelines in this respect are:
- Leverage your big-data solution provider’s high availability, security, resource provisioning, mixed-workload management, performance optimization, health monitoring, policy management, job scheduling and other cluster management features;
- Ensure high availability on your big-data clusters by implementing redundancy across all nodes, with load balancing, auto-failover, resynchronization and hot standbys;
- Perform thorough regression testing of every layer in your target big-data deployment prior to going live, making sure your data, jobs and applications won’t crash or encounter bottlenecks in daily operations; and
- Avoid moving big-data analytics jobs to your clusters until you’ve hardened the latter for 24x7 availability and ease of configuration and administration
Scalability: Have you architected your environment for modular scaling to keep pace with inexorable growth in data volumes, velocities and varieties? If you can’t provision, add, or reallocate new storage, compute and network capacity on the big-data platform in a fast, cost-effective, modular way to meet new requirements, the platform is not production-ready. Key guidelines in this respect are:
- Scale your big data through scale-in, scale-up and scale-out techniques, per this blog;
- Accelerate your big data with workload-optimized integrated systems fit for cloud deployment, per this blog;
- Optimize your big data’s distributed storage layer, per this blog; and
- Retune and rebalance your big data workloads regularly, per this blog.
Skillsets: Have you beefed up your organization’s big-data skillsets for maximum productivity? If your staff lacks the requisite database, integration and analytics skills and tools to support your big-data initiatives over their expected life, your platform is not production-ready. Don’t go deep on big data until your staff skills are upgraded. Key guidelines in this respect are:
- Upgrade the skills of DBAs, data integration specialists, data scientists and business analysts to support big-data best practices in deployment, modeling, management and optimization;
- Bring in big-data consultants to help you identify requirements, plan your roadmap, bootstrap your internal competency center, and assist in initial big-data project deployment, development, modeling, optimization and management;
- Recruit experienced big-data professionals to tweak configurations settings to deal with the trade-offs; and
- Connect your team into the worldwide community for your big-data technology or platform in order to learn from emerging best practices.
Seamless service: Have your re-engineered your data management and analytics IT processes for seamless support for disparate big-data initiatives? If you can’t provide trouble response, user training and other support functions in an efficient, reliable fashion that’s consistent with existing operations, your big-data platform is not production-ready. Key considerations in this respect:
- Provide big-data users with a “single throat to choke” for support, service and maintenance;
- Offer consulting support to users for planning, deployment, integration, optimization, customization and management of their specific big-data initiatives;
- Deliver 24x7 support with quick-turnaround on-site response on issues;
- Manage your end-to-end big-data environment with a unified system and solution management consoles; and
- Automate big-data support functions to the maximum extent feasible.
To the extent that your enterprise already has a mature enterprise data warehousing (EDW) program in production, you should use that as the template for your big-data platform. There is absolutely no need to redefine “productionizing” for big data’s sake.
To find out more about managing big data, join IBM for a free Big Data Event