Data Open to the Masses

A recently launched W3C working group aims to establish best practices and vocabularies for an open data ecosystem

Chairman, Data Governance Council, IBM

Open data is live data in a database or catalog outside a firewall that is open to every person and potential purpose. The open data movement began in 2005 with the International Aid Transparency Initiative (IATI). The IATI seeks to publish data about all foreign aid donations, projects, people, resources, and impacts worldwide. Its goal is to inject transparency and information sharing into a realm that previously has been opaque and corrupt. The IATI inspired governments around the world to adopt transparency as a tool for improving civil services, and in 2008 the US created to provide an open data portal from which US government public data could be provided on the Internet for civic uses.


Gathering steam was a model for open data movements around the world, and today city and state governments are publishing a dizzying array of public information in open data catalogs. Application developers now gather around local open data fountains of information to create applications that augment social services in their areas. They organize hackathons on workday evenings in which 20 to 40 developers meet to brainstorm uses for a type of data. The hackathons combine political town hall events with workshops and LAN parties. Developers bring their laptops and do rapid prototyping in Ruby, Python, Perl, and JavaScript.

In three hours, they develop rough interfaces and analytical designs. They use knowledge of local problems with civic pride, a desire to make a difference, and ambition to develop an application that gets noticed. Some seek social connections, business networking, and entrepreneurial opportunities, while others show up for food, free beer, and a challenging intellectual environment. Not every idea gets coded, not all code is good code, and many prototypes end up in the bin later.

But these hackathons are transforming enterprise IT in the cities where they are organized on a weekly basis. Just a few years ago, city IT departments could not imagine publishing data without a purpose or application that used it. Now they are putting everything out there with the hope that it inspires others to develop uses that IT could not imagine. Sometimes developers help municipalities understand things about themselves that their IT staffs didn't know because governmental IT groups can be compartmentalized just like the departments they serve. Open data allows people outside the internal structures to identify patterns the people inside were unable to discover.

Some cities such as Chicago, New York, and San Francisco have aggressive agendas to publish all their data in open data formats by 2015, and these new data sets are free resources from which anyone can generate value and improve government services and quality of life in cities and states.

But it isn't perfect. Data published in open data often lacks common quality standards. Its creation, point of origin, age, and internal usage are all mostly absent from open data in catalogs. And even if that information is provided, every city or state has its own methods for identifying it. Without common descriptors of how data was derived, where it came from, and to what degree the publishing authority itself trusts it, a lot of open data is assumed to be authoritative and it just isn't.


Establishing a standard

In December 2013, the World Wide Web Consortium (W3C) launched a new working group to develop common standards to address these issues. The Data on the Web Best Practices working group seeks to build open data best practices and vocabularies to enable cities and states publishing open data to describe data lineage, quality, veracity, and derivation. Using this standard, governmental data published in open data formats should be more reliable, valuable, and comparable than without this standard.

Hadley Beeman, Yaso Cordova, and I cochair this working group. We envision a world in which states and nations can use open data published by towns and cities to gain enhanced understanding about their urban environments and civic governments. We see opportunities for regions to analyze common open data to identify opportunities to cut carbon dioxide emissions, improve traffic safety, speed disaster recovery services, and manage public resources such as water. Open data has the potential to transform how citizens interact with civic government, and we intend the Data on the Web Best Practices working group to provide common open standards and best practices that can empower that transformation.

Our goal is to deliver the Data on the Web Best Practices (recommendation) to develop the open data ecosystem, provide guidance to publishers, and build trust in the data. The working group will build on and extend the work completed in the Government Linked Data working group by taking a domain and technology-agnostic approach to cover the following aspects:

  • Establishing vocabulary rules to enable data sharing, comparability, and interoperability
  • Designing and managing Uniform Resource Identifiers (URIs) for persistence
  • Guiding the provision of metadata
  • Publishing and accessing versions of data sets
  • Making controlled vocabularies accessible as URI sets
  • Providing technical factors for consideration when choosing data sets for publication
  • Offering technical factors that affect the potential use of open data for innovation, efficiency, and commercial exploitation
  • Preserving data

Evidence of implementation will be gathered from national or sector-specific guidelines that reference the best practices. The working group will also develop vocabularies—working group notes—including the following two new vocabularies to support the data ecosystem:

  • Quality and Granularity Description Vocabulary: This vocabulary is foreseen as an extension to the Data Catalog Vocabulary (DCAT) to cover the quality of the data, how frequently it is updated, and whether it accepts user corrections, persistence commitments, and so on. When used by publishers, this vocabulary fosters trust in the data among developers.
  • Data Usage Description Vocabulary: This vocabulary describes how one or more data sets are used. Where data is used in an application, it facilitates a description of what the application does and what problem it helps to solve. This description can improve discoverability of the application. Where data is used in other contexts, such as in research, it facilitates provision of information about which data was used and how it was used during the research. This information can link to and be cited within published papers. In these and other scenarios, using this vocabulary seeks to encourage the continued publication of the data on which the usage depends.


Leveling the playing field

The Data on the Web Best Practices working group’s charter and all the proceedings of the working group are public—and of course, published in open data. Working group participation is open to W3C members, but we will host a series of public workshops around the world in 2014 to encourage open data community participation and input. We want this working group to include all the diverse voices and points of view from the global open data community on every continent.

Supported by common standards, open data can generate tremendous economic and social value for citizens and governments. But transparency is the best tool to fight growing state surveillance and Internet balkanization. We want to live in a world in which information is freely available, and through taxation we already pay governments to collect our data. When a government publishes our data in open formats, it levels the playing field and provides citizens with the opportunity to empower themselves with information to self-determine their own needs and purposes.

Whether you are a data governance professional, a big data scientist, an application developer, or just an interested citizen and public advocate, open data is an important movement and the world needs your skill and attention. If you are a member of W3C, please consider joining the Data on the Web Best Practices working group. If not, stay tuned for public workshop announcements, and consider participating to add your ideas and experiences to the creation of open data standards. And please share any thoughts or questions in the comments.


[followbutton username='IBMdatamag' count='false' lang='en' theme='light']