Tuesday, August 06, 2013

Is it easy for non-programmers to reuse government open data?

Opening up data is one thing, but using it in a productive way is another.

Data may be released in formats that are hard to reuse, data may be 'dirty' (with mistakes) or incomplete.

However when organisations release data in machine-readable formats, with a reasonable level of completeness, it can be surprisingly easy for even a novice with no programming experience to reuse it in meaningful ways.

Below are two examples of how I've recently reused very different sets of data, an example of data released directly by a government agency, and an example of how to capture and reuse data that is public but technically not open.

Example 1: Mapping Australian polling places

Earlier today @Maxious tweeted the release of the Australian Electoral Commission's (AEC) expected polling places for the federal election as a CSV file. CSV is a standard format, like a basic spreadsheet, where every value is separated from the next by a comma, making it easy to import into (or export from) Microsoft Excel, OpenOffice Calc, Google Spreadsheet or other spreadsheets or databases.

The polling locations data is valuable, but in the CSV format simply appears as lines and lines of data. I thought it would be interesting and useful to visually map the polling locations on a map of Australia, making it easy for people to find the polling booths nearest to them.

So I downloaded the CSV file from the AEC website (www.aec.gov.au/About_AEC/cea-notices/election-pp.htm) and went to Google Drive, which supports a type of spreadsheet called Fusion Tables which can map geographic data.

Fortunately the AEC was smart enough to include latitude and longitude for each polling location. This can be easily mapped by Fusion Tables. The CSV also contained address, postcode and state information, which I could also have used, less accurately, to map the locations.

I uploaded the CSV into a newly created Fusion Table, which automatically organised the data into columns and used the Lat/Long coordinates to map the locations - job done! Or so I thought....

When I looked at the map, it only showed NSW polling locations - about 2,400 of them - while the original CSV listed over 8,000.

Clearly something hadn't worked properly, so I tried reloading the data into a new Fusion Table - with the same result - it didn't seem to be a problem with the CSV or the import process.

I went into the CSV using Microsoft Excel and studied at the data. There were many columns of data I didn't need for the map, so I deleted them - reducing the size of the spreadsheet by tens of thousands of cells.

I reimported the CSV into a Fusion Table and it worked! All eight and a half thousand expected polling locations appeared on the map. Clearly there had been too much (extraneous) data for Fusion to manage.

From here finishing the map was easy. It was simply a process of making the data a little more presentable by changing datasheet names and editing what appeared in the  information box that appeared when a polling location was clicked on.

I shared my Fusion Table and published the map so people could view and embed it (see below).

You can view (but not edit) my full Fusion Table at: https://www.google.com/fusiontables/DataSource?docid=1kzLZTqNRkXMu1w4eBdsOLRakx3S8FLHziu6PdbU



So job done - map created with useful information when you click a red dot.

However, these are only expected polling places - the AEC may update this information at any time as they confirm or remove specific polling places.

My map is current at 6 August 2013, however may become out-of-date quite fast. How do I ensure my map updates when the AEC updates their CSV?

The short answer is that I can't - using my Google Fusion Table.

Because the AEC has chosen to release the data in a format easy for them (a CSV, straight from their internal systems), it is less useful for outsiders who wish to keep their maps or mash-ups current.

A programmer would be able to write a script that checked the AEC page each day to see if the CSV had updated, download it into a program that updated a map and published it to the web with the changes - even providing a history of which polling stations were added or removed over time.

However the broader community, including me, don't have the programming skills to do this - and shouldn't need them.

To replicate what the programmer could do in a few lines, any non-programmer, such as me, would have to manually check the page, download the updated CSV (assuming the page provides a clue that it has changed), manually delete all unneeded columns (again) and upload the data into my Fusion Table, simply to keep my map current.

Of course, if the AEC had spent a little more time on their data - releasing it as a datafeed or an API (Application Programming Interface), it would be easy even for non-programmers to reuse the data in a tool like Google Maps for public visualisation - or the AEC could have taken the one additional step necessary to map the information themselves (still providing the raw data), providing a far more useful resource for the community.

This is one of the challenges with open data - releasing it in formats useful for the audience, rather than the agency.

Agencies often choose to release data in what they see as the fastest and easiest solution for them, even though it greatly increases the risk that their data will be reused online in out-of-date or inappropriate ways. Imagine the same issue with a listing of illegal drugs, accident hotspots or holiday dates - anyone who relied on old data, because it didn't automatically update in apps or third-party websites, would potentially be at significant risk.

However with a little more effort and thought, agencies can release their data in ways that biase online reuse towards remaining current and accurate - such as via APIs, which automatically update the information whenever a user accesses a mobile app or website which draws from it. With some data, APIs can potentially save lives - as well as reduce the risks to both agencies and developers.

Example 2: Analysing agency tweets

I'm interested in what government agencies say online and have been tracking the use of Twitter by Australian governments, including local, state and federal agencies, for six years. I track these accounts using my @egovau Twitter account, in two Twitter lists (as the maximum list size is 500 accounts):


Now it's great to track these accounts within Twitter, however how can I easily get a sense of which agencies are most active or have the largest following?

Followerwonk.com Twitter report
I use followerwonk.com for this purpose - a tool which can capture a snapshot of the number of followers, tweets and other details of every account at a particular time. In fact it is so good that I actually pay money for it.

These snapshots can be downloaded as CSVs and analysed in spreadsheets - which makes it easy to identify the most and least active government Twitter users (as I've blogged about in an infographic).

However what Followerwonk doesn't do is to capture and archive the actual tweets from the roughly 890 Australian government agencies and councils that use Twitter. If I want to analyse what they actually say in their tweets, rather than simply analyse the number of tweets, I need different tools.

While it is reasonably easy to archive the tweets from an individual Twitter account (you can download your own tweets from Twitter directly), or tweets that use particular terms or hashtags, using a tool like TweetArchivist, which is really useful for tracking conferences, it is harder to capture all the tweets from a large number of Twitter accounts at the same time - even if they are in the same Twitter list.

I've previously captured some Twitter list tweets using paper.li, which turns them into a daily 'newspaper'. In fact I have mapped Australian Federal parliamentarian tweets, by house and party, for those who wish a daily dose of political discussion in a condensed form.

The beauty of this approach is that paper.li updates as I update my @egovaupollies Twitter lists (where I follow Australian federal politicians) - the use of this datafeed ensures the 'newspapers' are always current.

However paper.li only selectively captures and reports tweets and doesn't allow them to be downloaded in a structured way. It doesn't really help me archive my government agency Twitter lists.

I have tried using a number of tools without success, including the fantastic IFTTT (If This, Then That) site, which allows the creation of 'recipes' which perform actions between different online social networks and web 2.0 tools. I have used IFTTT previously to do things such as automate the change of my Facebook profile image when I change my image in Twitter.

However the fantastic Digital Inspirations blog, written by Amit Agarwal, provides useful code 'recipes' that can be adapted to do all kinds of things by non-programmers.

I tried one of Amit's 'recipes' for converting a Twitter list into an RSS feed, however found it didn't work properly as Twitter had changed its own code. I tweeted to Amit (@labnol) and he graciously replied with a link to an updated post, A Simple Way to Create RSS Feeds for Twitter, which did indeed provide a simple way of doing this, with a step-by-step video.

I followed the video and, using the Twitter Widgets page and the Google script that Amit provided, was able to quickly create the RSS feeds I needed for my Twitter lists (one feed per list).

You can view these RSS feeds using the following (unpretty) web addresses:


However I had a new issue. Taking the tweets from the RSS feeds and archiving them in a structured way into a spreadsheet or database for later analysis.

I thought it would be relatively easy to find a free online or downloadable RSS reader which could archive all the tweets from these RSS feeds. I was wrong.

I could not find an RSS reader that was designed to capture, store and archive RSS - only ones designed to format and view them.

So I went back to IFTTT and searched for a recipe that might help.

Here I found the recipe, Backup RSS Feed to Google Spreadsheet by Martin Hawksey.

The recipe was simple. All I had to do was put in my first RSS feed (above) and adjust the name of the spreadsheet in which it would be stored. Then I activated the recipe, which connected to my Google Drive and created an archival spreadsheet that updated every time a government agency or council on the list tweeted.

As I had two lists, I replicated the recipe, using the second RSS feed and a new spreadsheet name. Then I left it to see what happened....

A few hours later, checking back, the spreadsheets were growing, with about a hundred tweets between them.

I am now able to easily analyse this data to build a picture of what government agencies actually talk about, providing insights that otherwise would never be captured (stay tuned!)

In this case study the government data was already public and visible in agency Twitter accounts, however it was not really 'open, - neither easy to capture nor easy to reuse.  No government or local council in Australia I am aware of currently releases its tweets as open data or in any specific format, such as RSS, which could be captured and stored, (even though many use RSS for media releases).

However these tweets are also useful government data. The tweets are able to paint a picture of how government uses social media, what they talk about, how they say it and who they interact with. It has both historic value for the country as well as current value for understanding what different agencies and local governments are focused on today.

Capturing and reusing these government tweets was harder than reusing the data from the AEC. The AEC at least released the poll locations as open data, albeit in an imperfectly reusable form.

However using some ingenuity, but without any coding, it was still possible for a non-programmer to capture all of government's tweets and make them more useful.

Conclusion

There's still a long, long way for agencies to go with open data. Right now the data released around the countries by state and local jurisdictions is often hard to match up, being in different formats, collected in different ways, presented in different ways and often is not directly comparable from jurisdiction to jurisdiction. Federally there's not the same issue, however different agencies use different geographic areas, different terminology and different formats, again, for releasing data.

Much data remains unreleased, and even where data is technically public (such as tweets or Facebook updates), archives of this data are not always easily available to the public.

However there are now many tools online which can help make some of this imperfect public data more usable and useful - and you no longer need to be a programmer to do it.

Read full post...

Tuesday, July 30, 2013

Will Gov 2.0 initiatives be created by individuals or organisations for profit or illegal goals?

As an advocate for Gov 2.0 and open data I frequently encourage government agencies to work more closely with communities, tapping their expertise and experience to improve the operations of government.

However I'm not blind to the risks of community involvement.

Welcoming the crowd risks welcoming individuals, groups and organisations with agendas which may include commercial, criminal or extreme goals, which may not reflect the community at large.

For example, right now there's a major push on to encourage the 10% of adult Australians who are not yet registered to vote to do so before the upcoming federal election. In particular roughly half a million young Australians are not yet registered to vote.

The Australian Electoral Commission (AEC), the government body responsible for managing electoral processes, has a campaign targeting younger voters, sending ambassadors to major music festivals and advertising through appropriate channels to reach this group.

Likewise TripleJ, a publicly funded radio station, is working with the AEC with the RockEnrol campaign to encourage the same goal.

Alongside these government-supported approaches are two independent campaigns, one supporting enrolment directly (Enrol for Gold), the second supporting it indirectly by informing potential new voters (Virgin Voters).

The first is from GetUp, an Australian grass-roots advocacy group. GetUp has created a campaign 'Enrol for Gold' which is giving a total of $40,000 in prize money in a competition for people who enrol to vote after 21 July. It's an interesting approach to encourage enrolment - one that a government could not use, but could be very appealing to elements of the community and support the overall AEC goals to raise the level of enrolment.

This campaign, although independent from the AEC, has clear information in their Terms and Privacy Policy which restrict any reuse of the information collected in the competition for any other purpose.

The second independent enrolment campaign is called Virgin Voters. The campaign is designed around supporting first-time voters to make good decisions with their federal vote.

The Canberra Times has been told that Virgin Voters was created to find and follow eight first-time voters through the federal election campaign to create a program about their experiences.

The site includes information a first-time voter will need to know, such as how Australia's political system works, who gets to vote, how to vote and details on Australia's 35 federal political parties. Very supportively there's information for both eligible young voters and for high school students (and their teachers) who might be following the process, but still ineligible to vote.

The site invites people to participate in television, radio, social media and print as an 'official VirginVoters Voice' through it's voicebox approach, and also encourages first-time voters to sign up to the campaign's Facebook page and Twitter feed.

The site bills itself as the voice of first-time voters and claims to be 'the most innovative social media commentary on any Federal Election'.

Despite the grand promise, the Virgin Voters site (at the time of writing) has little information about who is operating the site, why they are running it, who is funding it or whether the site is for profit.

There's also no privacy policy (at the time of writing). That's right, the site doesn't explain what happens to information submitted by people to VoiceBox, or how it will be used. This is disturbing to any experienced internet user and I hope they fix it soon.

With a little digging, and some twitter enquiries (where I did not get a specific answer) I've discovered Virgin Voters is run by the organisation credited in the site with its design (although there's no link). This is Pineapple Media, a company that specialises in creating programming and promotions for television, radio and print.

The person credited as the contact for Virgin Voters is the principal of PineApple Media, Richard Attieh - although this is currently not explicitly mentioned in the Virgin Voters site.

So is Virgin Voters a genuine Gov 2.0 initiative from a concerned individual and his organisation to support Australian democracy by giving first-time voters a voice in media?

Or is it an attempt to use the federal election and the naiviety of first-time (often 18yr old) voters to make profits for a media company by providing talent for programming?

I think Richard and Virgin Voters mean well, but will leave it up to readers to form their own conclusion.

What I believe this example demonstrates is that while there are many civic minded people and organisations who are using Government 2.0 approaches to help support, influence or improve government transparency in a positive way, there is room for the same or similar approaches to be used for pure commercial goals.

It may even be possible to use the guise of Government 2.0 to seek to achieve extreme or criminal goals.

What will it mean for government in the future if third parties use government data or piggyback on government goals in inappropriate ways?

Will there need to be better citizen education to help the community to make informed choices on who they provide information to, or more policing of online initiatives purporting to support government goals and programs?

Will governments rely on existing laws and frameworks, or need to legislate how and when government programs may be mentioned, leveraged and engaged with?

I think these are questions that most governments have not yet even engaged with.

Read full post...

Monday, July 29, 2013

Complete the 2013 Community Management survey for Australia and New Zealand

Complete the 2013 Community Manager survey
Quiip and Delib Australia have launched the second annual online community management survey for Australia and New Zealand.

The survey aims to help local organisations and individuals better understand the skills required to work in these professions, help uncover role challenges, training and support needs and the actual work and salaries that online community management and social media management professionals can expect.

The results of the survey will be presented at Swarm later this year and released online as a free report.

For more information visit Quiip's site at quiip.com.au/online-community-management-2013-survey.

To complete the survey go to au.citizenspace.com/app/delib-au/cmsurvey2013 or click on the button above.

For a copy of last year's report visit: quiip.com.au/2013/03/26/australian-community-manager-benchmark-report

Note: I'm involved in the design and management and will be involved in the analysis and reporting for this survey. The goal is to provide information that organisations can use to design community management and social media management roles and to help identify the training and support individuals working in these professions require to be most effective.

Read full post...

Thursday, July 25, 2013

Social media impacts on ICT teams - presentation from the Technology in Government conference

Over the last two days I've been down at the Technology in Government conference - an event I thought went very well, with a great group of speakers (including the UK Government's CIO Liam Maxwell).

I gave a presentation this morning, and chaired the afternoon, for the Connected Government stream and have uploaded my presentation for wider access.

In it I discussed the impact of social media on agency ICT teams and some potential approaches they can take to work with business areas to ensure that agency goals are met with a minimum of intra-agency friction.

Overall my message was that social media must be engaged with, not ignored, in government and agency ICT teams have a role to play.

There's several stances ICT teams can take - whether as a leader, supporter or observer of agency social media efforts and, depending on this stance, they could take on a greater or lesser involvement in the various roles required to implement a successful social media approach.

Social media offers benefits for ICT teams, as it does for other areas of agencies - it is simply up to ICT leadership to either step up and work with business areas in a closer ongoing way, or stay out of the way and allow other areas of an agency to move forward.



Read full post...

Tuesday, July 23, 2013

Fantastic article: The more things change: Technology, government and the public sector

Martin Stewart-Weeks, Senior Director, Public Sector, Cisco Consulting Services, has written a fantastic article on the potential for technology to disrupt and create new possibilities for governments and the public sector.

The article discusses how technology is changing the shape and speed of government, as well as many jobs in the public sector, and looks at potential models for reshaping the public service to meet the needs of the 21st Century.

The article was presented at the Australian Government Leaders Network event in July 2013 and, with Martin's permission, I've included a copy below.

It is well worth a read! 



Read full post...

Bookmark and Share