Sunday, January 23, 2011

#UKGC11: Hooking into DIY Data

Following on from data relativity, the second thing that grabbed me at #ukgc11 (UKGovCamp 2011) happened during Will Perrin’s introduction to “Making a Difference with Data”, and followed up in conversation with Helen Jeffrey (@imhelenj) on community-led data. Will talked of comparing his local authorities’ data on lamp-post repair time to his own count for how long a lamp-post had been broken for (encouraging crime). Helen talked about data that a group of volunteers had collected themselves, and then turned into a report to feed into government decision-making.

In both cases, what struck me was that people don’t have such an aversion or fear of data that is often assumed - if they start by generating it themselves.

To go back to data relativity, the most confusing and scary part of data is figuring out the thought processes and assumptions that have gone into a dataset, as well as figuring out what the hell’s important (often around 0.01%-0.5% of the data) and what’s “noise” - to an individual. Self-generated data doesn’t suffer from this, because all the scary background and assumption bits are part of the citizen’s/volunteer’s mindset and experience. Voila, understanding data comes from experience. And as such, successful engagement with data is about creation as well as consumption.

OK, it’s clearly a little more complicated than that, but it’s a principle that’s often far too implicit for some datasets (crowdsourced maps, etc), and far too often forgotten about for others (most centrally-gathered stats). There are also a whole bunch of stereotypes about what “people” “want” from “data”, and often these stereotypes do little except re-establish the status quo. When data comes up against real-world users (yes, even geeks), and the magic “fails to happen”, we’re left wondering if natural engagement is such a given after all.

It’s difficult to get excited about thousands of datasets when you have no idea where to start or where they can be relevant to you. It’s much, much easier to get excited about data that is relevant to to you, that you understand, and that you can see how it will benefit you. I think that’s why I love the idea of Mappiness or Christian Nold’s maps - both involve getting people to create data they find interesting, and through this “personal” data, the need and relevance for other data suddenly becomes instant and appreciated. (For example, if I’m feeling most happy on street X, what other properties does that street have, and how do they relate to me - house prices? Pollution? Streetworks?)

This seems to be an issue that is bubbling along - everyone knows that organisations collect data, for instance, so an open data system worth its salt will take that into account. But there are still assumptions about the scale and complexity of that data, I would argue - whereas really, data can be as simple as counting, and everyone can count.

Saturday, January 22, 2011

UKGC11: A General Theory of Data Relativity

First of a small series of small thoughts coming out of UKGovCamp 2011.

One of the key points emerging for me is just how much “data” is tied to the groups or people using it - not just the content, but the structure of it, the tools used to manage it, the background of the data, the assumptions behind it, and so on and so on. This comes up all over the place - standardised, central taxonomies often fall out of favour for being a jack-of-all-trades, useful to none. File formats are a direct result of people wanting data as an easy-to-edit spreadsheet, an easy-to-email PDF, or an easy-to-parse data file.

More fundamentally, even the understanding of what a dataset means becomes embedded in the structure of the data. If something is being measured , what defines that thing? What assumptions are inherent to the way data is measured? Is a van a form of car? More importantly, why are these definitions in place? Reasons are forgotten long before hard drives expire.

If you assume that all data is “relative” - I.e. a combination of the data itself and the people viewing it - what does this mean for linking it? Do we need more effort on translation? Or do we need more effort on fuzzy inferences between metadata, rather than direct mapping? (I suspect the Semantic Web rears its head here, but to me it always feels like a simpler solution is waiting to be seized on.)

Knowing how and where to ask questions about a dataset is a huge part of this - metadata about origins and background is vital; questions help build up an idea of how to fit a dataset into your own world view, your own data model, or your own database. Perhaps development needs to focus on making these links between contexts as transparent as possible, rather than fixing a single, over-arching context in place to fit all.

Tuesday, January 11, 2011

Why democracy is getting routed around

Recent events have brought me back to thinking about an old topic - what's the best way of making a point? This time round, though, I'm slightly more fatalistic about it all.

What seems more obvious now is that established democracy, in its current form, is being outpaced - traditional representative democracy is no longer a priority, to put it bluntly.

Why? Two reasons:

1. Communication has changed. Everyone knows this. Everyone is routing around voting. Electronic voting is boring - we have electronic memes now. Public meeting videos are boring - we have hashtags now. I am talking to national and international strangers about the future of politics more than I talk to my neighbours. Never mind AV, we need something more than simple, single representation.

2. The topics of politics have changed - or, rather, they change ever faster and faster. We have a while to go before the singularity, but nonetheless, we no longer believe the future is 'distant'. Good sci-fi is becoming rare. We do not dare to imagine what we'll be able to do in 4 years' time let alone know what government party we'll want to cope when we get there. The future is flexible, party politics is boring. Far better to rely on the fluidity of networks and social knowledge, than on the heavy infrastructures of politics.

We have secure and instant comms, so we have Wikileaks. We have flashmobs, so we have street protests. We have crowdsourcing, so we have inspirational projects, both as showcases by individuals and as industry-standard, open source giants. Bit by bit, there are people doing stuff, instead of waiting for politics to change for them.

Screw "Government 2.0". Food prices, climate change, economic sustainability, education, wisdom? The next decade will tell us if modern democracy is even out of beta-testing yet. If government is to survive in any respectable form beyond its current version, it needs to "get" reality - the kind of reality that everything else is now trying to work out how to do better.