Tuesday, October 02, 2012

Making APIs for government data - should agencies do this or leave it to third parties?

APIs (Application Programming Interfaces) are a technique for interacting with data (usually on the web) which liberates users from relying on particular applications or having to do complex programming to reuse the data in interesting ways.

Unfortunately few government agencies go the extra distance to release their data with an API, instead using specific data formats which require specific applications to access them.

This is a real shame, as APIs essentially makes data application free - great for accessibility and both easier and faster for any web user or website to reuse the data effectively.

It is often relatively easy for to create APIs from an agency's released data, as demonstrated by the Farmer Market API example from Code for America, which took less than an hour to convert from a spreadsheet into a map visualisation.

Agencies can certainly take the position that they don't want to do the extra work (however little it may be) to provide APIs for their public data and leave it up to third parties to do this - wherever and whenever they wish.

This is a choice, however, that comes with risks.

Where an agency simply 'dumps' data - in a PDF, CSV, Shapefile or other format online, whether via their site or via a central open data site - they are giving up control and introducing risk.

If a third party decides to create an API to make a dataset easier to access, reuse or mash-up, they could easily do so by downloading the dataset, doing various conversions and clean-ups and uploading it to an appropriate service to provide an API (per the Family Market API example).

Through this process the agency loses control over the data. The API and the data it draws on is not held on the agency's servers, or a place they can easily update. It may contain introduced (even inadvertent) errors. 

The agency cannot control the data's currency (through updates), which means that people using the third party API might be accessing (and relying on) old and out-dated data.

The agency even loses the ability to track how many people download or use the data, so they can't tell how popular it may be.

These risks can lead to all kinds of issues for agencies, from journalists publishing stories to people making financial decisions relying on out-dated government data. 

Agencies might see a particular dataset as not popular due to low traffic to it from users of their site, and thereby decide to cease publication of it - when in reality it is one of the most popular data sets they hold, hence a third party designed an API for it which is where all the users go to access it.

As a result of these risks agencies need to consider carefully whether they should - or should not - provide APIs themselves for the data they release.

Open data doesn't have to mean an agency loses control of the datasets it releases, but to retain control they need to actively consider the API question.

Do they make it easy for people to access and reuse their data directly, retaining more control over accuracy and currency, or do they allow a third party with an unknown agenda or capability to maintain it to do so?

Agency management should consider this choice carefully when releasing data, rather than automatically jumping to just releasing that CSV, PDF or Shapefile, or some other file type.


  1. I don't think it's an either/or. There is plenty of scope for both depending on the situation.

    But there are wrong ways to go about it. Take for example Canberra's ACTION buses. The free MyBus app was denied real time data. Instead ACTION spent $12.5 million to have an app developed commercially, which I understand was only available for iOS and cost $2.99 to download. Then the July yhis year the data was released to Google Transit anyway.

  2. That's old school and new school thinking colliding.

    I don't mind a government making that mistake once, provided they learn from their error and improve.