Data Determinism
March 20th, 2009 Rafael Alvarado[Note: this is a post from a now defunct blog. I’m republishing it because of this recent article about Google, “Google designer leaves, blaming data-centrism”. It is not a defense of what appears to be Google’s data fetishism (but who can blame them?); it just got me thinkin’ …]
I want to say that the usual practice of beginning with the user interface, as the artifact that guides the conversation between clients and developers, is wrong. It looks good and sounds right: The client will be using the interface, right? The code is a black box that ought to be subject to intense refactoring, right?
True all, but we also know that the stack of application development rests on the database layer, moves into the various layers of code and so-called business logic, and ends with the user interface. The layer that has the most effect on what is possible to code or display downstream is the database layer. That is, what you choose as a database format and model will constrain what you can do on the presentation end, whereas presentation technologies rarely have effects in the opposite direction (a possible exception is Flash before Flex). This stack holds even if you’re not using MVC; it’s more or less encoded in the way our current software tools work. For example, even if you code in pure JavaScript, your stack is going to begin with JSON or XML or the DOM, and end with CSS.
Call this view “data determinism.” As such, it probably suffers from the same criticisms that have been leveled at other forms of determinism, such as historical materialism, which holds that infrastructre (work behaviors, technologies, etc.) determine superstructire (religious beliefs, laws, etc.). I’ll accept that, if you (the critic) accept that the data level at least constrains the other levels, and that the other levels, to have an effect, must be able to modify the data level. Well, then, there’s the rub: once the data model is written, it doesn’t get changed a whole lot. The web designers don’t have a lot to talk to the DBAs about, and the two groups rarely know how to have a conversation. In fact, with enterprise databases, you get the tail-wagging-the-dog effect: “We can’t do that because the database only accepts this kind of data.” So there.
Anyway, I believe that the conversation with clients should begin with the data model, using perhaps simplified E‑R diagrams, but ultimately getting at a kind of ontology. What are the salient categories and relations and processes that describe the domain in question? This is a conversation that clients can have with developers, and usually it is a great conversation, and not constrained by an arbitrary visual artifact that can direct conversation along a false groove.
A couple of principles follow from data determinism:
- The requirements process should be preceded (replaced?) by an ontology-discovery process. And the proper method to use here is ethnography.
- The database should be designed with flexibility in mind. I prefer very simple semantic web structures (triples, graphs, etc.) that can be filtered by more specific ontology layers. This process I want to discuss in another post or two.
- After this is in place, then the discussion should move to the level of visual artifacts, such as interfaces. Both the client and developer will have a better idea of what is possible.
This is essentially a codification of the application development process I’ve developed with clients in academia, where I have developed several web-based applications for humanities computing projects.
Looking back at this post, I see that it has significance for two things that currently occupy my mind: Edpunk and the RAW DATA NOW movement (both, interestingly, represented by folks at UMW …)
1. Data determinism provides an under-the-hood rationale for Edupunk: Enterprise apps tend to hide data and the database, reifying it into a natural, immutable condition that interfaces and behaviors have to conform to. Think of Blackboard. Also consider that it was over the data model that they took Desire2learn to court.
2. Data deteminism also helps explain why data, among all the things we call “information,” needs to be free (in spite of the fact that it apparently does not want to be). RAW DATA NOW, as Tim Berners-Lee recently exhorted the auidence at TED. Because if it determines everything, we need to have access to it — raw, and without undue mediation by nice-looking interfaces or toothy EULAs.