Wednesday, December 11, 2013

How do we avoid the chicken & egg of open data (and the failure of the opendata movement)?

Open data drives economic value - there's been no dispute on this score after a range of reports have indicated the massive value that open data can unlock for an economy.

Cap Gemini estimates that open data was worth 32 billion euros in 2010 to Europe, growing at 7% per year, while McKinsey estimates the global value at US$3 trillion per year and the UK estimated earlier this year (PDF) that the value of releasing their geospatial data alone as open data would be 13 million pounds per year by 2016.

There's been a range of similar reports across the world (as aggregated by the Open Data Institute) - all of which point to a similar conclusion.

However realising this economic value, in productivity, efficiencies and direct revenue, is dependent on governments doing one thing that they've so far failed to do - releasing open data in a planned, consistent and managed way.

Thus far most governments have followed a haphazard route to open data, releasing the 'low hanging fruit' first (data already in releasable form, with few privacy concerns and considered 'low risk' as it doesn't call into question government decisions), and then progressively releasing esoteric and almost random data sets at inconsistent intervals.

Many governments have clear processes for individuals and organisations to request the release of specific data sets - however a clear process which doesn't support the goal is of little value.

These requests have little influence on agency decisions on releasing data and I have yet to see any government mandate that these requests need to be officially considered and actioned or responded to within a set timeframe.

Without any real weight or structure, processes for requesting data sets can't be relied on by people seeking to build a business on open data.

Data consistency is an even bigger issue. In nations like Australia the federal and state governments each have their own open data sites. However there's no agreed national strategy on data release. Every jurisdiction releases different sets of data, with few attempts to aggregate state-level data into national datasets covering all jurisdictions.

Even when similar data sets are released by different states this is complicated at the back-end by different laws, different collection techniques and frequencies and differences in analysis and measurement approaches - not to mention differences in formats and naming conventions. This can make it costly, if not impossible, for businesses or individuals to aggregate data from different states and use it for a national goal.

On top of this, many agencies still resist calls to release data. Some due to a closed culture or a hope that 'open data' is a passing fad, others due to the costs of reviewing and releasing data (without any ability to offset them in fees or additional funding) and some due to concerns around data quality, political impact or reputational damage to the agencies themselves.

My fear is that we're reaching a chicken and egg impasse - agencies and governments are reluctant to do the work and spend the money required to develop consistent data release approaches and mandates without seeing some the economic value from open data realised. Meanwhile individuals and organisations are reluctant to build business models on a resource that is not reliably available or of a consistent quality.

There's no commercial model for open data if governments can turn off specific data, or entire open data initiatives on at a whim (as we saw shut down recently in the US Government shutdown). Businesses need to be able to count on regular publication of the data they use to build and inform their enterprise.

There's also a lot less value for governments in releasing their data if companies are reluctant to use it (due to a concern over the above situation).

So how should countries avoid the chicken and egg issue in open data?

There's two approaches that I have considered that are likely to work, if used in tandem.

Firstly, governments must mandate open data release and take appropriate steps to develop ongoing data release approaches, which clearly and publicly state what data will be released, at what frequency and quality level. This should include a data audit establishing what an agency owns (and may release) and what it doesn't own, as well as the collection costs and frequency of specific datasets.

To maximise the value of this approach for states within a nation there needs to be a national accord on data, with states (or as many as possible) developing and agreeing on a consistent framework for data release which works towards normalising the collection, analysis and release of data so that it can be aggregated into national datasets.

Secondly there needs to be thought put into the difference between open and free data. Individuals and organisations who use government open data for personal, educational or not-for-profit use should be able to access and reuse the data for free. However where they are using open data for profit (at an appropriate threshold level), there should be the scope for financial contracts to be put in place, just as there is for most other resources used to generate profits.

This approach would provide a revenue stream to the government agencies releasing the data, helping offset the collection and publication costs. Contracts should also be structured to provide insurance for the data users that the data will be released on a set timetable and to a defined quality level throughout the life of the contract.

There would need to be significant thought into how these financial contracts would be structured with significant flexibility built in - for example allowing cost-recovery for developers, who may spend many hours developing and maintaining the services they build with government open data and avoiding the upfront fee model which becomes a barrier to new entrants to make profitable use of open data. There would also need to be consistency in these contracts nationally for state data - potentially a major challenge in Australia.

However if implemented thoughtfully and with significant consultation and ongoing review, a combination of rigour in data release and cost-recovery for profitable use of government open data would avoid the emerging chicken and egg issue and provide a solid and sustainable foundation for realising economic value from open data - value that would help support Australia's economy, social equity, education and scientific research into the future.


  1. I feel a third step needs to be present: be open to conversation about how to receive benefits from the community *back* into your data.

    A fantastic example recently.

    We as a mapping community noticed the Adelaide Metro took a bold step to use an open routing engine.

    Beyond taking the open data for bus stops or doing normal tracing, we had no idea how we could make the data more useful for them (and in turn, ourselves as users of the metro system).

    That all changed the moment the DTPI engaged in communication.

    As a result, we're taking open data for roads, spatial boundaries and more; ingesting it into openstreetmap - knowing it's directly useful to a government agency; and ourselves; simply by providing a better, more useful format - service oriented.

    Without government explaining how we can help them help us, there's often a missing piece of the puzzle.

  2. thanks for sharing this insightful post about the open data. Hope the government won't take this lightly instead take action now.

  3. Charging users, whether they be data suppliers or data users, to leverage a platform for 'open' data makes little economic sense.

    The economic good provided by open data is real. The network of supply and demand just takes a time to become established. Initiatives today are paving the way for open data to become an intrinsic part of our social fabric, so a few halted steps along the way during the early years is no big deal.

    I believe it is far too early to point at any failures in consistency or quality and suggest we need to monitise the market to somehow bring about a shift in quality. I would be more concerned that such actions would adversely misdirect the current momentum we are seeing.

    Fundamentally, we don't seek to generate a revenue stream from public parks or suburban roads. Public goods, of which data is one, should not be restricted from providing that good freely. The tax system does an ok job at helping us redistribute our wealth into funding public programs already. We don't need to put in place an expensive, and most likely impossible to manage, restriction in the use of open data in order to somehow monitise its release. Companies which make revenue from free and open data will simply pay taxes.


Note: Only a member of this blog may post a comment.

Bookmark and Share