Data as an Ecosystem
A year ago, I asked some questions around the relativity of data. It feels like I have some vague answers now, even if they're shifting about still.
The BBC are currently showing an awesome series about the history of plants on Earth, called "How to Grow a Planet" - but it's not just about plants, it's about how plants developed alongside rocks and Sunlight, and then how they developed alongside animals. It's a story of opportunity and of symbiotic relationships - in other words, of how co-dependence can lead to vitality and huge, thriving ecosystems.
The idea of data as an ecosystem isn't a new one. But it is one that's easy to forget, especially when we're staring at technical formats or CSV files, or when we're on a crusade for a much larger philosophy. However, understanding how different data users interact is key to a data itself thriving. During the session, I drew something like this:
At the time, I wanted to capture the important thing - the links from one group to another. In other words, data flows from user to user, from group to group, and gets transformed as it does so. This is the fundamental reason why data is good - not because it is easily automate-able, or easily packageable into binary, but because it can and does mean different things to different people. And yet we are also intrinsically linked through that flow of data at the same time. All different, yet all the same.
Each group has different needs and different backgrounds, and so each link in the diagram, at a data-level, will involve different...
- Access needs
- Numerical understanding
- Contextual, real-world understanding
- Quality needs
- Reliability needs
- etc.
A systemic/network view of data moving around means we can move from a "one size fits all" perspective to a narrower, more specific viewpoint. This means we can ask questions such as:
- What is our role and our position in the system?
- Who are our audiences (current and potential) and what do they want?
This gives us a less scary starting point than the idea that every data-handling party needs to cater to every other data-wanting party. Not everyone needs to do everything. We've come to grapple with this the last few years, with the shift from central authorities specifically providing data to an "end user" (if such a user exists) to providing "raw" data for a different audience. But the debate is still murky and needs to be made more explicit.
Data Engagement Charter - A Star Map?
Which is where a Charter can really help, I think. Although not just in terms of giving people an easy-to-follow guide, but also as a way to really map out the different uses and users of data. Si Whitehouse's blogpost on the subject has a 4-way use-case diagram positing 4 different types of people who want to find data:
Can we use this to start working out what the most important needs of each are, and more importantly, who could be filling those needs? And vice versa - if you have an existing audience who are struggling to engage with your data, what can you do to make it meet their needs?
I wanted to use this image to illustrate this post, for some reason:
I wanted to use this image to illustrate this post, for some reason:
At some level it represents an opening up of barriers, which can summarise the ongoing efforts to open up data. But at another level it's a great image because it mirrors both the links in the first image above, and the two dimensional axis sitting nonchalantly in the Use Cases diagram. It not-so-neatly sums up the way in which data flow changes according to what the data is being used for right now. Sometimes cars need to flow. Sometimes trains need to flow. With the right barriers and the right flexibility, we can have both. Or all, depending on how many users we're talking about.
So there's no "right" answer to how data should be opened up, any more than there's a "right" answer to which programming language I should use. There is what's most appropriate, perhaps, and certainly what's most fun. Linked data and SPARQL are good in some cases. Excel and Word reports are fine in other cases.
How to Have a Healthy Data Ecosystem
Having said that, it's not quite true. To keep an ecosystem alive means sticking to some rules, otherwise we end up with fragmentation, infestation, and/or the possible collapse of large parts of that ecosystem.
The 2 main components of a healthy system are, for me, openness and feedback.
Openness allows for flexibility and the ability to rapidly make new connections - vital when the environment changes (climate-wise, but also economically, politically, and technologically). Without openness, things can exist for a while, sure, but there is little resilience as time goes on.
Which standard gets used for data doesn't matter as much as whether we can interface with it in the future. We need to avoid data lock-in (or lock-out, depending on where you're standing) - whether that comes from legal, technical or economic aspects. There may be reasons for certain barriers (such as privacy), but these shouldn't translate into a general attitude of making data difficult to obtain. Legal barriers should not prevent technical access from being as easy as possible, for example.
Feedback is also vital, as it provides a way to negotiate the value of data without resorting to a "survival of the fittest" regime. Feedback in a data sense means that the fruits of Data Engagement benefit both parties involved - and that the link between parties gets stronger as a result of it being used.
In other words, we need to consistently and continually make sure that data is useful, otherwise the disadvantages of providing it and acquiring it will outweigh the benefits.
A Data Engagement Charter is an awesome first step to understanding and realising this. Such a Charter, alongside technical prowess and legal openness (another post), can be a very real call-to-arms for this recipe for an ecosystem. To make data successful, we have to understand that we are co-dependent on each other, and that it cannot be a one-way flow, even if we try to make it one.