Thursday, November 25, 2010

Structured data: Accessible magic?

Magic Numbers (source)

Have you ever dreamt in SQL? Few of us have, and we only ever speak of it in hushed, yet secretly astounded tones. Database development is a weird way of looking at the world. Those who venture into it too deeply may never come back to being "normal".

But at the same time, it's also like any language - reaching a state of fluidity involves a fundamental shift in thinking. To think in French is to adopt a new philosophy, and the same is true of database logic; linking and manipulating rows of data is a world away from editing each row by hand.

To explain the importance of structured data, is it important to first get across this conceptual paradigm shift? Is the ultimate draw of structured data tied inherently to a new way of seeing the world? One in which we, as data/content hackers don't see the data at all, but merely instruct a computer to do stuff with it.

Maybe this explains the "format divide" between those publishing in "closed" formats, and those publishing in open ones. If you do not know how to automate the data-munging process, then you do stuff by hand, you take a long time to do it, and you have absolutely no need for "structures" other than those in your head.

This happens everywhere, all the time: half the world lives in Excel during office hours. At some point, computers became popular as difference engines, but not necessarily good at being them. Human operators became part of the machines, rather than directors of them. In a way, this mirrored the huge factory production lines, and the endless supermarket checkouts, so most humans simply accepted this as the new way of life. Any sufficiently different technology is a form of magic. Processing as a manual task.

This happens today. Never forget that. Seeing data is more important than defining a structure for it, because structure is *hard*. Datasets have peculiarities, errors, and specifics that resist simple structuring. And changing these structures is effort - effort that involves communicating these changes to other. In short, it's easier to stick with "sloppy" data if you're not using the right tools. It's even easier if the other people using the data don't care about the tools either. Content is King.

So how do we bridge this divide between "manual data labour" and "magic"? On the up side, I believe it must happen, as data - and talk of data - becomes a public matter. Those not structuring their data will need to structure it, or face a new kind of exclusion - call it "un-APIness" perhaps.

But this doesn't help us to move into a culture of automation, of magic, which I think is important because it determines what you *believe* you can do with data. Understanding structured data is essential in coming up with new services, new applications, and new answers.

Working with people to build answers will help too. It's not enough to just want "raw data now" - to build bridges, we need to build real things based on data. We, as geeks, need to find out what people actually want. We need to show that questions can be answered with "magic", but also be open enough to demonstrate that structuring data has a direct impact on what can be done, and how quickly.

Change the tools. Rethink processes. It's time to end the conveyor-belt, factory line approach to data.