Are data artists essential for big data success?
This session was one part of a colloquium run by IBM’s Technical Consultancy Group, the UK & Ireland affiliate to the Academy of Technology, which hosted 40 leading academics in big data and analytics from UK universities last Wednesday at the Royal Academy of Engineering in London. Review my post, "Big data is multi-disciplinary" for more background on this event.
Discussion focused around seven widely ranging presentations from both industry and academia. They considered issues that arise in the collection and maintenance and manipulation of large volumes of data the types of applications that can be developed that make use of data from a wide variety of sources and some of the barriers to the adoption of big data technologies that exist.
Vonu Thakuriah and Mirco Muolesi from the recently formed Urban Data Centre at Glasgow University and the Mobile Computing Centre at Birmingham University introduced us to some of the challenges associated with collecting and maintaining representative data sources and some of the potential applications that are being developed using these data sources. Some of the most useful application areas result from combining data from social media and other open data sources (such as weather data) and in particular with telecommunications data, both from mobile phones and call data records. This data is being used to develop applications in areas as diverse as transport planning, predicting the future position of people as they move around cities and the spread of epidemics.
Many of these applications require the development of statistical predictive models and there was a lively debate on the predictive capabilities of these models. In contrast, presentations from Patrick Dantressangle (IBM) and Lorenzo Cavallero (Royal Holloway, University of London) described the challenges of real time analysis of IP traffic and application traffic within networks to provide security and compliance monitoring capabilities. These applications relied heavily on the capability to accurately classify IP network traffic using streaming technologies and highlighted the challenges of developing adaptive classifier models that can react in rapidly changing environments.
Mike Holcombe from the Advanced Computing Research Centre at Sheffield University described a number of applications that are being developed to perform functions such as advanced text analysis for information retrieval on very large data sets. Mike introduced the topic of modelling complex systems using large scale agent based modelling techniques that can be used in such diverse areas as economic modelling and chemical diffusion in blood cerebral vessels. This ensuing debate considered the possibilities for deriving causative models from large volumes of data and enabling the development of better agent models that cover a larger range of observed behaviours.
Various aspects of the barriers to the adoption of big data technologies were covered by Mark Birkin (University of Leeds) and Rupert Ward (University of Huddersfield). These presentations concentrated on the adoption of university research by industry and the development of relevant training courses within academia to meet the growing need for a new generation of data scientists and engineers, and the possibility data artists and even data lawyers. The development of course material from an industrial perspective by IBM was acknowledged to be a valuable contribution to this debate. The industrial perspective on adoption was discussed in relation to experienced gained from deploying big data solutions into a typical enterprise. The issues were not about the technology, but about the integration with existing systems and business processes, specifically in relation to the confidentiality of the data that was being used.
There was simply not enough time to discuss all of the issues raised, but the session was certainly thought provoking and the debates continue. Stay tuned for more posts on this colloquium, right here on the Hub.