Saturday, June 28, 2008

Make government data freely available

An interesting article was released in the Yale Journal of Law & Technology earlier this year discussing a view that government should focus on providing usable data online rather than full-blown websites.

Titled Government Data and the Invisible Hand , the premise was quite simply explained in the abstract:
Rather than struggling, as it currently does, to design [web ]sites that meet each end-user need, we argue that the executive branch should focus on creating a simple, reliable and publicly accessible infrastructure that exposes the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data.
This approach is very much at odds with the current approach both in the US and Australia, where in most cases the
respective governments provide both the data and all the interpretation designed to meet the needs of specific audiences.

Via the current approach, data can becomes difficult to extract, or is presented in a way that is not useful. On that basis these websites are difficult to use. They are also expensive to develop and maintain and difficult to keep current.

The approach in practice

I've encountered both approaches in Australian government websites.

In a past role, managing the website for a private sector water and energy utility, one of the consistently most trafficked areas of our website was local weather. This section had only a peripheral involvement with the main focus of the site, however the level of usage made it important to retain.

We did not run this weather service ourselves. Instead we used a raw data feed provided by the Bureau of Meteorology (BOM) for free. The data was simply customised and represented in an attractive way in our website.

Ours was not the only organisation using this data - a number of other organisations had built businesses though providing weather information - sometimes combined with video, maps, commentary or other feeds. These sites collectively attracted more traffic than the BOM itself.

To my recollection, provided this data was not packaged and directly resold commercially, the BOM had a policy of giving away the data freely.

This approach helped ensure that the public were able to access accurate information, to the public good. It is important to note that BOM data was collected and processed by people and equipment already paid for out of the public purse.

On the other hand, the Australian Bureau of Statistics (ABS) provided a great deal of the data I used in my day-to-day role.

This data was preprocessed by the ABS into tables or excel documents. These were often chunks of information that were not much use to my audience.

My team spent many hours manually deconstruct and reconstruct the ABS data into different forms to make it useful for our corporate needs.

The ABS did not provide data as a raw feed. While the ABS did gave away its data for free online - and this was fantastically useful - the overhead that went into its website inevitably made it less timely, therefore reducing its value in a commercial sense.

So, in comparison:

  • Gave its data away for free online (public access to public data)
  • No data analysis required (lower cost to the agency, faster to market)
  • Referrals from everyone reusing the data (reach)
  • Enormous innovation in how the data was 'mashed' with other sources / analysed and presented (lower cost to agency, transference of risk of misinterpretation to private sector)

  • Gave its reports away for free online - but not the raw data (public value but less timely)
  • Provided intensive data processing (quality assurance but higher cost to the agency, slower to market)
  • Limited online reuse, therefore fewer referrals (lower reach)
  • No innovation in data analysis and presentation (higher analysis/presentation cost to agency, any risk of misinterpretation stays with the agency)
In my view BOM's approach seems to be both lower costs and risks for the agency and delivers greater public benefits, greater data use, innovation and agency reach (referrals).

Bt the way, it's worth pointing out that BOM is the most trafficked government website in Australia. ABS, despite a wider range of statistics, is much further down the list.

Can the data approach be used across other agencies?

I believe it can. Even in my agency we release numbers and resources which could be indexed and provided in a raw data form for reuse.

We also have a website estimator for calculation purposes. There are around a dozen similar estimators that do a similar job - several providing virtually the same result as the official estimator. However those 'fan' estimators cost nothing to the public purse to create or maintain.

So if members of the public are prepared to create these tools, why should the agency?

Granted this last example is a little more tricky than that - the estimation process is time-consuming and maximising accuracy is a key goal.

However there are other government websites and tools which could and would be delivered by private organisations and individuals, if only the government allowed access to the data stream.

Level the playing field

Note that the article does not suggest that government should stop analysing data and presenting this analysis in websites.

What it suggests is to provide the raw data on a level playing field, thereby allowing private and public organisations the same capacity to use it.
The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.
This means that government agencies such as the ABS can continue to provide reports for people who choose not to do their own analysis.

However it opens the field to innovation and the use of various data sources to make connections that government, in a siloed form, is not as able to do.

This levelling is critical - if the government wants to see innovation it should not hold back the 'secret sauce'. The data needs to be available in a way that allows private and other public enterprises to use it in an equal way.

Open systems are available today via standards such as XML and RSS - look at how Google syndicates maps and ads or how Facebook allows the creation and dissemination of applications.

In conclusion

Government has a crucial role to play in the collection of data across the country. This is a task well suited to the public sector as it is in the public interest that this be available.

However government doesn't have the systems or culture to be best suited to interpret and combine this data or make it useful for individuals and organisations.

Government should provide interpretations - however it should not hold an artificial monopoly over this.

By allow other organisations to access the raw information innovation in its presentation can occur more rapidly, providing deeper insights for the public good.

Make government data freely available.

Does anyone have other examples of where government collected data has become freely available? I'd love to blog about the successes.


  1. Craig - great post! I'd like to add a longer comment but time won't permit - will probably blog soon. Formally a Manager of GIS, long time knowledge manager, I know this is a BIG area of opportunity. There are MANY good examples when you look at US spatial data which if created by government is placed in the public domain by default. Result: lots of mapping mashups. See also Paul Ramsey's blog.

    See also WRON set up to demonstrate and encourage sharing water resource info.

    Will follow with interest.....

  2. Craig, this is an excellent post. It reminds me of the arguments that have always swirled around how dependent the US Government is on the private sector for republishing certain types of government-sourced information. I've put some of my thought here:

    Dennis McDonald
    Alexandria, Virginia USA

  3. The Yale paper suggests discussion forum & wikis (which ABS uses on its new betaworks site), visualisation (also avail on betwaworks), google, RSS feeds, email updates, all already in use by ABS. ABS have also piloted the 'National Data Network'. The NDN comprises of a central administrative hub, a centralised metadata catalogue of info from many sources and a participant facility to publish resources and register users (nodes). At 750000 pages, the ABS website has been a maze for users for some time, but the new access facilities are fantastic. ABS will always be constrained by issues of confidentiality, to the point where the dangers of cross-classifying any two data sources
    have to measured before anything can be released...something BOM doesn't have to deal with.

  4. Great points! I've got one to add, I've blogged about it here. The short summary, is that Government also must create national standards for data, for example in terms of controlled vocabularies, etc, to ensure that data can in fact be aggregated nationally. It is no good if each council classifies roads differently, to actually produce a national aggregated dataset of roads with wildly differing definitions. Cheers Gav

  5. Great post!. We are an example of private-sector-distributing-public-info. We provide public procurement information at and make it available via web services and widgets for subscribers worldwide.

  6. In the UK, the government has just kicked off a competition to invite ideas for reusing government data in creative ways, with a 20,000 GBP prize:

  7. Some working examples of what you can do it Govt data is in an open format.