Monday, March 19, 2012

From open data to useful data

At BarCamp Canberra on Saturday I led a discussion asking how we can help governments take the step from open data (releasing raw datasets - not always in an easily reusable format) towards usable and useful data (releasing raw datasets in easily reusable formats plus tools that can be used to visualise it).

To frame this discussion I like to think of open data as a form of online community, one that largely involves numbers rather than words.

Organisations that establish a word-based community using a forum, blog, wiki, facebook page or similar online channel but fail to provide context as to how and why people should engage, or feed and participate in the discussion, are likely to get either receive little engagement or have their engagement spin out of control.

Equally I believe that raw data released without context  as to how and why people should engage and no data visualisation tools to aid participation in a data discussion are likely to experience the same fate.

With no context and no leadership from the data providers, others will fill the informational gap - sometimes maliciously. Also there's less opportunities for the data providers to use the data to tell good stories - how crime has decreased, how vaccination reduces fatalities, how the government's expenditure on social services is delivering good outcomes.

Certainly there will always be some people with the technical experience and commitment to take raw open data, transform it into a usable form and then build a visualisation or mash-up around it to tell a story.

However these people represent a tiny minority in the community. They need a combination of skill, interest and time. I estimate they make up less than 5% of society, possibly well under 1%.

To attract the interest and involvement of others, the barriers to participation must be extremely low, the lesson taught by Facebook and Twitter, and the ability to get a useful outcome with minimal personal effort must be very high, the lesson taught by Google.

The discussion on the weekend seemed to crystalise into two groups. One that felt that governments needed to do more to 'raise the bar' on the data they released - expending additional effort to ensure it was more usable and useful for the public.

The other view was that governments have fulfilled their transparency and accountability goals by releasing data to the community. That further working on the data redirects government funds from vital services and activities and that there is little or no evidence of value in doing further work on open data (beyond releasing it in whatever form the government holds it).

I think there's some truth in both views - however also some major perceptual holes.

I don't think it necessarily needs to be government expending the additional effort. With appropriate philanthropical funding a not-for-profit organisation could help bridge the gap between open and usable data, taking what the government releases and reprocessing it into outputs that tell stories.

However I also don't accept the view that there was no evidence to suggest that there was value in doing further work on open data to make datasets more usable.

In fact it could be that doing this work adds immense value in certain cases. Without sufficient research and evidence to deny this, this is an opinion not a fact - although the evidence I've seen from the ABS through the census program (here's my personal infographic by the way), suggests that they achieved enormous awareness and increased understanding by doing more than releasing tables of numbers - using visualisations to make the numbers come alive.

Indeed there is also other evidence of the value of taking raw data and doing more work to it is worthwhile in a number of situations. Train and bus timetables are an example. Why does government not simply release these as raw data and have commercial entities produce the timetables at a profit? Clearly there must be sufficient value in their production to justify governments producing slick and visual timetables and route maps.

Some may argue that this is service delivery, not open data (as someone did in the discussion). I personally cannot see the difference. Whenever government chooses to add value to data it is doing so to deliver some form of service - whatever the data happens to be.

Is there greater service delivery utility in producing timetables (where commercial entities would step in if government did not) or in providing a visual guide to government budgets (where commercial interests would not step in)?

Either way the goal is to make the data more useful and usable to people. If anything the government should focus its funds on data where commercial interests are not prepared to do the job.

However this is still talking around the nub of the matter - open data is not helping many people because openness doesn't mean usefulness or usable.

I believe we need either a government agency or a not-for-profit organisation to short circuit the debate and provide evidence of how data can be meaningful with context and visualisations.

Now, who would like to help me put together a not-for-profit to do this?


  1. Surely OpenAustralia already fits that bill for an AU Open Data collective?

    And for the quality vs quantity discussion - surely we just need to re-use concepts like ?

  2. The challenge is that data custodians are asked (told?) by central organisations to publish datasets, within existing resources. So they typically do as little as they can - and then get on with their "real" work. Enthusiastic members of the community can then massage the data to enhance its usability, but who will warrant the correctness of the end product? Or does this really matter, as long as 90% of community-produced data is correct?

  3. Daniel, I agree with the reuse of concepts. OpenAustralia *may* fit the bill - however I don't always like having all eggs in one basket. OpenAustralia has many ideas competing for attention.

    Kerry, you're absolutely right. However I think it currently requires far more than enthusiasm to massage the data - and where do you centrally store it? Drawing from the National Library's newspaper digitalisation program, a system which allows people to reformat and republish data centrally (with whatever caveats are appropriate) would support this. Any mistakes in the data would be caught relatively quickly in my view.

  4. I agree completely with what Craig is saying. Making data accessible in a technical sense is all well and good, but is has to be useable.

    After all, if data is not usable and cannot be easily explored and thought about then what we are doing is putting barriers that prevent people from contributing in a positive way. Crowd sourcing, community participation and innovation suffer as a result.

  5. I don't think anyone is seriously arguing the absolute value of open data. Government policy is for data to be provided ( The issue is the relative value of doing additional work to that data in the face of competition for the same resources.

    This issue is influenced by the ability of third parties to do undertake this value adding for profit or other motivations. While some data does have to be presented in a more advanced form as part of the government's service delivery obligations (a readable bus timetable for example), not all data needs to be presented in other than its raw form.

    The amount of resources expended on adding value to raw data, not required for its original purpose, is ultimately a matter of policy. Policy is about choices. Good choices are based on evidence. This is the issue of relative value. To be convincing, proponents of government adding value to data, beyond that required by current policy, need to provide evidence that such expenditure is a better choice for scarce resources than the other alternatives.

    (This is my personal opinion, not official comment.)

    1. Hi John, I agree with your personal opinion regarding evidence.

      THis leads to a second question: how should agencies go about collecting evidence?

      The National Library has collected a lot of evidence regarding people's willingness to interact with the data they have released.

      Do we have other agencies conducting similar initiatives (or call them pilots) in order to collect the evidence required to define good policy?

  6. Actually, mistakes in the data are probably not going to be picked up very much - if we're talking about toilet locations, addresses of barbecues and businesses that have had health violations. Or if they are picked up, the impact will be minor; in the same way that a miscorrection of text in an old digitised newspaper is unlikely to have much significance.

    What will be more significant is government data where mistakes are potentially harmful, like emergency information about flooded river crossings. If these are modified incorrectly and published, there will be a reckoning.

    So, the question of warrant will be real and needs to be addressed in some way.