Sunday, July 31, 2011

Overviewing the Government Web Presence

The Government released a list of about 440 current websites listed by department, which was fairly easy to turn into some PHP code. Hoping to release the code soon (once I've got Github working), but so far you can see what's below. Both projects try to get an overview of the Government's total web impression - it'd be interesting to repeat this, say, yearly to see what changes.

Click here for a page showing screenshots for all 400+ homepages

I also scraped the listed pages and did a bit of processing to turn the content into Wordles. OK, not terribly exciting. But I kind of like the idea of turning open data into "art".

Here's the front page content with some common and site-related words removed (click through for original):

Wordle: UK Government websites [unfiltered]

Here's the same data but with more common words removed by

Wordle: UK Government websites [filtered]

As I say, hope to post code and data soon, even though it's not much work to re-create... In the meantime, any suggestions welcome.

Friday, July 15, 2011

"Forget the data."

Holy crap, Emma Mulqueeny's (@hubmum) blog from yesterday on the next challenge for Open Data is possibly the. Best. Thing. I. Have. Read. In. A. Long. Time.

In particular:
Open data? Awesome, and we are making tracks.

Open Government? HARD, and we are not banging on that door yet."
So what’s the next challenge for Open Government data?

Forget the data.

Find a way to enable these revolutionary ideas, apps, websites and widgets that save time, money and mind-numbing frustration from those who have to engage with government.

Do that, and only that.
This is the conversation we need to be having. Why? Not to work out "how to do it", but because it questions what is valuable and necessary in government.

Open data isn't a technical thing. It's about relevance. If you could do everything, what would you do? If you were hungry, would you eat, or would you talk about how to find out what the best thing to eat is and what the best way of eating it is?

"Open data" that lacks a medium for turning creative use into real-world change is irrelevant. It's what bad businesses do - they invent a million great things, but never actually want people to use them. Instead they use them as examples to tout how great and creative they are, in the strange hope that a people will think a symbol of progress is as good as progress itself.

Until, that is, someone comes along and not only has a better idea, but also actually builds it. For everyone to use.

Is that difficult? Of course - building stuff requires foresight, management, flexibility and the wisdom of knowing what your goal is. Do people do it all the time? Look around you.

Open data needs to be about other things now - including how it's funded, what the audiences are, and what the future holds. But none of these are about data. None of these are technical. We already have a society that runs on data, so data itself isn't a new paradigm.

We can't keep thinking of open data - and possibly even our entire creative efforts - as some kind of "continual prototype". We need to apply it like we applied sewage systems and electricity.

We need to understand that this isn't just about making the game easier to play, but about a whole new game.

Wednesday, July 13, 2011

Open data funding - experiments and ecosystems

Paying for the Open

The funding aspect of open data development came up at Open Data Brighton & Hove (#odbh) last night - who should (or shouldn't) pay for it?

While one camp says that there are lots of people who will build on top of open data for free and for passion, the camp at the other end of the hall wants to see return on investment for work paid for. The latter works both ways - people want to be paid to develop, and people want to pay for development. If the payback is enough, of course.

In a sense, both camps are "right" - the model you believe in depends on your daily interests, daily funding models, and where else you get money from. So it's easy to see that some people are fine building free side-projects, while for others it's a day job. Sometimes one person may have a foot on both sides, depending on what's going on that particular day/week/whatever.

This will always be the case. So it's really really important to understand that there is no "correct" model. Any open data ecosystem needs to fundamentally take this into account. Making data available is great - some people will run and play with it. But working out funding and collaboration is also great. Both are essential, even in the context of open-source, cutbacks, austerity and liberal progressiveness etc etc.


The more I think about #odbh, the more I notice how much I'm influenced by the openness of the Bitcoin community. Other open communities exist, of course, and do similar things, I'm sure, but Bitcoin is the one I'm closest to at the moment.

(Background sidenote: Ignore what Bitcoin is, and whether it's a good idea or not. The relevant and important point is how people are organising around it.)

One funding model that seems to work is the "Bounties" model - a kind of funding pledge, but one based on identifying desired functionality rather than, say, group activity or a band's next output. This list of bounties isn't complete, but it illustrates how it works and the kind of work people want done.

Could this work for open data development? If people are serious about wanting an idea turned into reality, shouldn't they put their wallet where their mouth is? Does it offer a "third way" to both working for free or having to "prove" your idea in advance?

I suppose what I'd envision is a bit like the Ideas section on, but with more ... oomph, more "I really want this" instead of "This'd be nice".


To wrap up, what this says to me is that open data is more than just about getting data out there, and even more than just about how we weave data into our everyday lives. It's about how we commission progress, how we organise collaboration, and how we identify needs.

All stuff we've been doing for ages, really. But here's a chance to try new ways of approaching old problems, and to bring all of that experimentation together. To create a very real open ecosystem.

Wednesday, July 06, 2011

Averages in the Tabloids

Comparing the original report on public-vs-private pay to the tabloid coverage just makes me want to give up talking.

Nearly all of the caveats in the report are missed out of the tabloid piece. All of the interesting analysis is omitted. The result is a headline designed to get people angry. The common name for this is "ignorance". Irresponsible ignorance.

As the report points out extremely clearly, many factors affect what is basically a comparison of averages between apples and oranges:
  • The public sector has recently outsourced low-cost jobs, pushing up averages

  • The public sector similarly has more educated people, pushing up averages (at no time does the tabloid ask what value is added by staff)...

  • ... but also, people with a degree earn almost 6% less in the public sector than they would in the private

There are some other interesting points such as whether banks are classified as public or private, age and gender differences, and comparison between highest and lowest earners in each sector.

For me though, this is a reminder that averages are hard - especially for people who "just want to read a newspaper". Understanding evidence is tricky, and presenting is even trickier (something we tried to take into account on the Improving Visualisation project.) It's too easy to fool people, and the tabloids keep. doing. this. all. the. time.

How much stats do people need to know to engage "fairly" with demcoracy? Should they need to know about mean, median, mode? Should they trust the media? Should they trust statisticians?