Blogs

Data scientists and sustainable governance

Senior Technical Staff Member, IBM Analytics

The emergence of the data scientist as having a critical role in helping organizations exploit big data-related technologies provides new challenges and opportunities for those responsible for the overall governance of this new data management landscape. While the potential benefits to the organization of harnessing the new insight from data scientists in terms of delivering competitive value is clear, what is less clear is how to practically ensure their long-term governance and management in a sustainable way. 

In many organizations, the traditional view of governance as a necessary bureaucratic evil is anathema to the aura of pioneering innovation surrounding the data scientist. However, in reality, if the activities of the data scientist are to deliver the expected benefits to the business as a whole, it is necessary to ensure they are working within the context of an overall system of governance. 

Cultural considerations

The culture of each enterprise is unique and will influence many aspects of that organization’s approach to governance. The premium placed by senior executives on innovation risk management will directly impact the style and substance of the overall governance program. Regarding the treatment of data scientists, in some organizations they are seen as rock stars who will come up with the golden nuggets of insight if given enough latitude, whereas in others there may be a strong focus on constraining their activities to ensure conformance with existing policies and procedures. While these are probably two extremes, the definition of an appropriate governance approach to data scientists will depend on where along that spectrum their organization lies. 

A balanced approach 

http://www.ibmbigdatahub.com/sites/default/files/governance_embed.jpgIn the end, the output and insights of the data scientist must make their way into mainstream data analytics team development activities. Giving data scientists free rein to analyze and investigate without considering whether such work relates to the real business challenges or determining how to incorporate the work in a managed and orderly way is probably not sustainable over the long term. However, stifling data scientists’ activities with layers of red tape is not likely to result in the levels of insight they were hired to deliver in the first place. 

So the issue comes down to balancing the need for nurturing the innovation and curiosity of the data scientist with a realistic and useful set of governance policies, standards and processes. Like most aspects related to governance, this becomes a more cultural and organizational challenge than a technological one. Many organizations that have recently invested in data scientists have also been investing in governance programs. In fact, it is hard to see how any large organization can build out a logical data warehouse or data lake infrastructure without both of these functions. 

Practical governance 

The emerging best practice around a good sustainable governance program puts a lot of emphasis on a practical and pragmatic approach with an adequate level of communication and feedback from the participants to those managing the process. Applying a similar best practice to the governance of data scientists is critical. 

Fundamentally, the governance program spans three different dimensions, all of which are important to the governance of data scientists: 

  • Focus on productivity: This is the aspect of the governance program that ensures it doesn’t get in the way of innovation—that it is ready to react to change. The data scientist provides a critical viewpoint of the areas of the landscape that are likely to change as the organization evolves, so the governance program is alive to input that is key to its evolution.
  • Focus on collaboration: The data scientist is only one person on a typical data analytics team, and the data governance program must encourage the active participation of all players on that team. The team members must have the right levels of incentive, visibility and awareness so everyone can benefit from the governance program—for example, by making sure data scientists understand and are motivated to ensure their outputs are truly incorporated into the overall data landscape. 
  • Focus on quality: In the end, organizations cannot lose sight of the quality that is essential for any governance program. Once again, the data scientist can be seen as a key contributor here, especially as someone who can assist with the definition of the meaning and classification of any new data sources and data types that must be addressed by the governance program. 

If organizations intend to fully exploit the new data resources available to them, they must determine how to harness the innovation of data scientists in the context of a sustainable and practical approach to governance

Please register to hear me and others speak at IBM Insight 2015, 25–29 October, in Las Vegas.