Monday, July 09, 2012

Kasabi, Nuclear Power, and the Data Developer Dichotomy

Today's big news is that Talis are shutting down Kasabi, their linked data platform, moving away from semantic data, and losing Leigh Dodds. Talis have probably been the most visible entity in the commercial linked-data realm, and with good reason. They showcased an impressive local data portal for the Open Data Cities Conference a while back in Brighton, and drive various high-profile services like Fix My Street.

But it turns out the demand for the linked data platform just isn't enough to support business. As a data developer, I'm obviously intrigued by this move from Talis.
(Spoiler: There's a lot of background here, and I don't even answer my original question. Feel free to skip to the end.)

How I Learned to Stop Worrying And Love Data


Your front room? [img]
Back at Uni, I read that nuclear power isn't just about physics, technology and war. To wield nukes, a state needs certain organisational structures to support their centralised nature - security structures to protect them, political structures to commission, deploy or maintain them, market structures to permit or prevent sales, and so on. In short, technologies and their usage depend on social-political contexts - and vice versa.

At the other extreme, well away from nuclear power, micro-generation of power is slowly taking a footing. Feeding power into the grid from solar and other electricity generators also requires contexts to be put into place, but the nature and distribution of these are different to those of nuclear power. The consultations yield different emotions, the networks require bi-directional structures, as do tariffs and security measures.

Overall, there is a big difference between the "central" nature of nuclear power, and the "decentralised" nature of micro-generation.This maze of contexts is what Open Data is trying to navigate right now. And Linked Data is part of that, often trying to straddle worlds.

Platform Non-Wars


Once upon a time, XML was the saviour of the data world. Up until then, people had put up with CSVs and related 2x2 grid matrices. Data was a person-to-person communication tool, not a machine-to-machine one. Machines kept their own data, and made it available through their own interface. So passing data around was a presentation job, and people liked doing that in 2D.

XML changed that by structuring data differently - the context of XML was a machine-to-machine medium. Some simple XML can be coded by hand, such as HTML. Other XML (and lots of HTML) is really intended only to be generated and read by machines, which will do the validation, parsing and rendering without the aid of human hands.

XML can afford to be complex then, because the context is strictly defined by standards and parsers. Maybe XML is the nuclear power of the data format world.

Then people discovered JSON, a much more lightweight format which was still supposed to be machine-readable, but which also caters more to a second context - person-readable. Because JSON is so lightweight, it's much easier to debug by hand. And while there are huge industries booming around XML, the lure of JSON appeals to those who don't have rigid structures and definitions in place, who need to do something quickly and often by hand, at least at first.

Like nuclear and micro-generation though, XML and JSON aren't a "versus" or a battle. They're both just about what's appropriate for what context. This is an Important Point.

Open Wide, Please


What could possibly go wrong? [img]
This is where we return to Open Data, and the position of Linked Data within.

Right now, I argue that Open Data is trying too hard to cover everything in one breath. It's doing this because it very rapidly became a cultural symbol rather than an engineering symbol. With the agenda being led by "transparency" and its associated technical underpinnings (yes, the 5-stars of open data are a political quest, not a technical one), we tend to overlook the idea that Data - the other part - is not a standard but something which requires context.

Without context, data is just entropy, and entropy is already handled pretty well by our underlying electronic veins: the binary transistor.

In other words, is "Open Data" an oxymoron? If something is to build on its openness, it needs to flow, and if it is to flow from place to place, it needs to acquire meaning - at which point, is it no longer "Data"? Are we better off talking about "Open Information" - and if so, what does that mean for our tools?

So where does Linked Data and the plight of Kasabi fit into this? From the above, we can see that as "data" moves around the ecosystem, the people using it want different things from it, and will do so using different tools. This is a translation task - to adopt someone else's data, not only do you need to know how the data works in itself, you also need to know how it integrates with your own data. (This is also why Data Engagement is so important.)

The data developer's instinct is to build something generic and beautiful. This is further impounded by a commercial instinct that conforms to economies of both scale and scope.

But in reality data often resists the genericism that technical and economic efficiency loves. A generic data handling system would need to be so generic that it could do anything - at which point, is it any different to any database? Any computer? In her Open Data Cities Conference talk a few months ago, Emer Coleman said that "People are messy." And if people are messy, than people talking to other people is even messier - think Tower of Babel. In the real world, data is equally messier than we think it is.


One Chance Out Between Two Worlds


Not your usual armchair auditor [img]
This is why it's difficult to talk about Open Data and Linked Data in the same sentence, or sell all-encompassing data tools, or come up with "universal" standards that try to transcend contexts. The contexts differ, and translating data from one to another also means translating mindsets, working practices, learning processes, and organisational structures. Data Relativity is still being ignored.

Right now, Linked Data and the Semantic Web are in a funny position. They aim big, and are trying to solve an important problem about data quality. But this big aim means big technology and paradigm shifts - putting Linked Data much more into the realm of an "Enterprise" app in the same way that nuclear power and XML require a certain, quite hefty, amount of planning and structuring to achieve.

A lot of large organisations with "heavy", nuclear-style data have a lot of systems already in place, and a lot of knowledge and resources tied into these - in other words, they have a lot of momentum. This is where Linked Data could make a real difference I think, but the inertia that needs to be overcome is a fundamental issue. Not only that, but because the success of Linked Data is inherently tied to network externalities, that inertia is multiplied up through inter-organisational lines. To be good, many people need to adopt it at the same time.

On the other hand, the economic agenda of Open Data wants armchair auditors and clever freelance developers to innovate quickly for low cost. And the tools, knowledge and skills these kind of people have are geared up not toward "Enterprise-level" data, but toward quick-fire, loosely-knit, human-readable innovation. Find a small problem, and solve it quickly.

These two worlds are currently circling each other. All too often "Open Data" becomes confused with "Big Data" because there's a lot of it. Some of the challenge involves filtering out "Open Data" so much that it's invisible - background noise - leaving just the "Small Data", the useful stuff. Learning a 5-page API reference doesn't help this.

The other challenge is to make the link between "Big Data" and "Small Data" bi-directional. Or omni-directional. There's still massive amounts of work to do around microgeneration of data and feeding user-generated data into "the grid". There's still no real work around standards for data sharing, as far as I know, other than those that exist for content such as RSS, etc.

Right now though, the main challenge is just getting out of the Open Data mindset that data needs to be Big and Open in order to work. It's really important not to get too caught up in the lure of simplicity and easy wins for quick economic and political gain. Ecosystems are not built on top of economies of scale and scope.


Remember Babel.