Monday, January 23, 2012

"Open Data" Needs to Die

Amongst all the UK GovCamp 2012 buzz, point #18 from Tom Sprints' write-up caught me as being one of the more curious:
18. A lot of “open data” sessions just seemed to me to be variations on a theme, and didn’t sell themselves to me at all. I am therefore worried that some of those discussions are either very esoteric, or insufficiently informed by people who understand the issues rather than the tech.

Where have we come from?

As a data geek (I like the word "mechanic" myself), it's been intriguing to see the conversation around "open data" change over successive GovCamps. A few years back, the question was heartily "How can we get hold of data?" - Tim Berners-Lee was starting out on his comeback tour, and mySociety were beginning to show that data could be made useful with some clever tools.

As I remember it (likely in a fairly biased narrative way), the conversation then switched fairly rapidly into "What's the best way to open up data?" - in terms of what data and what platforms were most useful to developers. Suddenly data stores had (experimental) APIs, and the public realm had massive amounts of spending data. There was some loose rhetoric about transparency and accountability, while developers picked things apart with fine Excel toothcombs.

Then things got more interesting, as it turned out everything that had happened so far didn't automagically lead to Amazing Stuff Happening. The question became a necessary "So what?" - as if transparency and accountability weren't enough by themselves! The topic turned to users and reasons and (more often) to interesting examples. Surely, somebody was clamouring for this stuff after all this?

I'm kind of hoping this explains something about why "open data" sessions are a bit fumbly-jumbly now.

Open data got complicated, quickly. Because data is complicated. Jump to the present, and conversations rapidly flit between all of the above either because everybody is involved at the same time, or the people who should be involved, aren't.

"Open Data" is harmful

Or both. The paradox is that it's become difficult to talk about open data firstly because those who were talking about it from one point of view are now talking about it from many points of view. And secondly because those who weren't talking about it before aren't talking about it now. Data silos still exist. Most people still use Excel. Statisticians still output reports.

The term "open data" is meaningless now. Not just meaningless - actively harmful. If you're used to talking about it, then the conversation has begun to fragment and coalesce around more subtle outcrops. And if you're not used to talking about it, then you're put off because nobody can explain what it means - and more importantly, what it means to you. So you carry on as normal.

My session at GovCamp on Data Engagement was, in retrospect, an attempt to get back to the previous question of "So what?". What I really want to do is fence the conversation off from the technical, economic and political aspects of data (although I'm still into all these things) and focus on the why. I desperately tried not to use the term "open data" because I think it would have distracted the discussion. (To be honest, I wanted to find something better than "data engagement" too, hence the phrase "Everyday data".)

And I'm really glad that some of the idea got taken up on day 2 by Tim Davies and others. A "Charter" for engaging with data really starts to delve into how we think about how to make data useful.

I admit I'm a little afraid that the term "Open Data Engagement" just makes the discussion even more vague. What does that mean to you if you have no idea what it is, or what Open Data is supposed to be? Is it all at risk of becoming another buzzword? What about "Data Usability", or "Public Data Engagement"? I'm still aware just how much I hate the terms "Public Understanding of Science" and "Public Engagement with Science". Are we going round in circles?

Should we call a Stats Spade a Stats Spade?

Many people with useful, everyday data and databases really don't think in terms of data. Because the data is about stuff they know, they think of it as "information". Maybe even a "resource". But ask them what "data" they have and they'll probably give you a back-up of their website.

One of the interesting points coming out of the Data Engagement session was that people deal with data all the time - think football, Formula 1, house prices, etc. But do people even refer to this as "data"? Or - more likely - do they call them "stats"? Mention "stats" and people think of tables, averages, and counts.

In a way, "stats" makes sense where "data" doesn't. "Information" makes sense where "data" doesn't. "Data" is tricky because it's all of this and more. It's figures, it's formats, it's visualisations. No wonder even those who understand this get confused when talking to each other. The more you try to take "Data" into the real world, the less the term applies.

Should the "open data" moniker be scrapped instead of more "useful" terms like these? Would this make talking about implementing it more difficult, or easier? After all, any conversation on how to make data useful quickly turns away from talk of even databases and on to other issues (standards, protocols, best practice, comprehension).

Maybe if we talk about our bus times as "public information", and spending figures as "spending figures" then people will be interested in it, and we can stop trying to work out what "open" means.


Sunday, January 22, 2012

UKGovCamp 2012 - 5x5 (plus one)

So Friday and Saturday were host to the indispensable UKGovCamp 2012 - a huge gathering of people interested in making public stuff better with technology, roughly speaking. I got along to the Friday day all about talking (rather than Saturday's doing), and gorged myself on thoughtmeat. Seriously, I was feeling dizzy by lunchtime.

Somehow I think I managed to carry on talking sensibly enough to feel useful. I also took notes and recordings of the sessions I was in, but here are 5 points from each that struck some kind of chord with me. They're a mix of things people said, and stuff I thought, but I think I can remember which was which.

I've put up audio from each session here.

Session 1: Data Viz + maps issues + challenges
- Vicky Sargent

This seemed to focus largely on data quality.
  • Poor quality data can be exposed through openness.
  • Different users/uses want different levels of quality/reliability. 
  • Bringing together those who want reliability with those who want usability is hard.
  • Getting useful "infoporn" is hard.
  • Start by knowing what you're trying to achieve.

Session 2: Open Data as a Business Model
John Sheridan

This went into how to sustainably fund both open data, and businesses based on it.
  • It's not just a choice between "open"/public and "closed"/private.
  • Perceptions of data reliability (including being up-to-date) are inherently linked to data management and its economics.
  • i.e. Some people think you need a tightly controlled team / contract / business model to maintain data quality... Whereas others think openness is a viable form of reliability. (Cf. Wikipedia "vs" Encyclopedia Brittanica)
  • Licensing offers multiple funding models depending on end-user, a la open source software. Chris Taggart doing a lot of this with
  • Does the data business model depend on the size/resources of the dataset/audience?

Session 3: LinkedGov tool to clean up & link data!
- Dan Paul Smith

This was a demonstration of the really impressive work being put together by @LinkedGov (
  • Software that extends Google Refine to let you easily link data structures and tidy data.
  • If letting people edit data ("cleaning", "linking", etc), you have to be careful not to introduce "new" data such as assumed defaults for new values. 
  • Suddenly linked/semantic data is starting to look really powerful. I'm almost converted :-)
  • The ability to "modularise" links to other external data lists has huge implications for data as a Distributed Ecosystem.
  • Metadata for what's been edited needs to be accessible and clear, to understand who's done what over the lifetime of a dataset.

Session 4: Data Engagement / Everyday Data
-  me

This was an attempt to think about how to get data into something everyday and, not perceived as a "technical" thing. "Slides" and audio available.
  • People love data if it's about something they love - e.g. football, F1, sales...
  • Language used is massively important - often 2 groups will talk about the same thing, but in totally different ways.
  • A range of "necessary" precision was brought up again - how can you transform the complexity of data into simplicity without misleading?
  • Does data visualisation have to be 'comprehensively accurate', or can it just be enough to get people to ask more questions?
  • Give data context and it's easier to turn data into feedback and so learn from it.
There were some great examples, and some amazing ideas coming out of this for me, which I'll blogpost properly soon.

Session 5: Network society engagement
- Catherine Howe

This wasn't to do with data at all... This was about the advantages offered by moving to a more networked, more engaging approach to decision-making.
  • Catherine claims that the current system of engagement/consultation is actually a method to mitigate our own ongoing disappointment in political participation.
  • If done better, political participation can be enlightening, rewarding, and fun.
  • Bringing people together as part of the consultation process can mean they understand it/others more, and are less disappointed if they don't get what they want.
  • Feedback as part of the consultation is vital to success. In effect, consultation moves towards conversation rather than just gathering views. (Does this that changing people's views is an objective of a networked approach? Does this raise questions around the accuracy of the final result, or does a more involved process and more post-process feedback negate this?)
  • While I generally agree, I wasn't sure how much success from trials was down to using a networked approach, and how much was down to just using a different approach (i.e. novelty can often be fun in itself) - more consultation iterations needed?

I'm running out of time so won't go over Mike Bracken's speech or the Closing Note. Here are 5 random, general points instead:
  • The data landscape is slowly coming together in my head. I know I know something important about it, but I don't know what it is yet. Like Cooper in Twin Peaks. You know, when he has that dream. I need to mull it over and chuck some rocks at a bottle.
  • It feels weird not having twitter usernames on name tags.
  • GovCampers are a bunch of (mostly beardless) ale-swillers. Much to the surprise of the pub.
  • The engineering/development going into the new single domain is seriously good.
  • The medium T-shirts this year are definitely smaller than the medium T-shirts last year. Or did I get a ladies' one?

Friday, January 13, 2012

Pintless Debate

[In which the debate for/against the regulation of pub companies is ultimately broken down into the futility of arguments.]

The Parliamentary debate on the future of pub licenses has me hooked. Living in Brighton, it's difficult to describe, or even imagine, just what effect local pubs have on every day life - from evening entertainment, to decent food, to convenient meeting and organising places, to Damned Good Beer.

So it was great to see my MP Caroline Lucas weighing in with views from the Landlord of the Greys in Hanover - in fact, this was why I clicked through to the rest of the debate.

Two Pints, Please

In a nutshell, the debate is a classic "is market self-regulation enough?" argument. Most voices in this one argue that large pub companies ("pubco's") have too much power when it comes to setting a) rents for licensees, and b) rules and rates for "guest beers" and other things that help make pubs "interesting" (or affordable).

The motion moves for regulation to free up licensees from this "beer tie" and to review the self-regulatory nature pubs by an independent body.

But as you read through, it becomes clear that the debate is really about:

1. BIS' response to CAMRA's complaint appearing to be taken fairly word-for-word from a BBPA (British Beer & Pub Association) submission without much further input - recently discovered through an FOI request.

2. The Government's apparently "weak" action of apparently rubber-stamping the self-regulatory guidelines as what should constitute the statutory code. (See Adrian Bailey's comment.)

3. What seem to be otherwise fairly "liquid" but one-sided negotiations between tenants/licensees and the pubco's (see here for example).

Brian Binley makes a very interesting point about the unsustainable debt model used by pubco's basically being passed on to landlords - and hence on to consumers, who unsurprisingly either go to a cheaper local pub (if one exists) or the supermarket. Andrew Bridgen goes on to call it "almost feudal".

How [the] debate rages

But over time, the debate threatens to emerge from its pretence of being about the pub model, and into an attack on the political process that is driving it (or being driven by it). At this point, the debate breaks down into 3 types of discourse:

1. Anecdotal/qualitative rhetoric: Stories from constituents, traders, etc. I suggest that the Select Committees' evidence also falls under this as they adopt an "interview" style approach. (Also, here's a good SC report from 2009 on the matter.)

2. Statistical evidence for/against intervention: Ed Davey seems to use stats more than others, for example.

3. Attacks against process and character: With the nature of the BIS response and its apparent "close ties" to the BBPA being thrown open by the FoI request above, this is a third line of argument which seeks to undermine both of the above, on matters of personal principle.

There are also appeals to "external" authority. The OFT, for instance, seem keen not to be involved, which leads some to say they're not relevant, but others to say this merely means regulation has no place in an apparently successful market.

Welcome to politics. What's interesting is how - or if - each of these types of argument "trump" each other. In other words, should we give pubs more choice over beer because 

a) a lot of people say it's a problem?
b) data suggests there is a link between lack of freedom, and pubs closing?
c) the people behind the non-choice have too much economic and political power?

In my mind, this is a bit of a paper-scissors-stone situation. Can any of these really be more important than the others, or do they just lead to a cycle of disagreement? How much do each of these - or all of them combined - duly influence any voting on the matter? And should I really have bought that four-pack of Speckled Hen from Sainsbury's today?

Exit, Stage Left

I also liked the general response to Ed Davey's comment which reads a little like the script for a bad school comedy play:

Brian Binley: Will the hon. Gentleman give way?
Edward Davey: No, I want to make some progress. 
[Hon. Members: “Oh!”]


Martin Horwood: Will my hon. Friend give way?
Edward Davey: No, but I will in a second.
Brian Binley: Will the Minister give way now?