I was flipping through Chris Date’s SQL And Relational Theory and came across this little gem, which paraphrases “Codd’s own stated objectives in introducing his relational model.” I think this bears repeating today because I have felt for awhile that the NoSQL movement has a significantly different set of goals–which is fine–but seems to be ignoring some of the things that make the relational model nice to work with. I wonder if it is necessarily either-or, or if perhaps some of these NoSQL systems can work toward satisfying more of the needs that relational database systems satisfy, without sacrificing the speed and ease of distribution that has made the NoSQL concept popular.
Here are the stative objectives, quoting Date:
- To provide a high degree of data independence
- To provide a community view of the data of spartan simplicity, so that a wide variety of users in an enterprise, ranging from the most computer naive to the most computer sophisticated, can interact with a common model (while not prohibiting superimposed user views for specialized purposes)
- To simplify the potentially formidable job of the database administrator
- To introduce a theoretical foundation, albeit modest, into database management (a field sadly lacking in solid principles and guidelines)
- To merge the fact retrieval and file management fields in preparation for the addition at a later time of inferential services in the commercial world
- To lift database application programming to a new level–a level in which sets (and more specifically relations) are treated as operands instead of being processed element by element
I want to ponder on these objectives for a bit before drawing too many conclusions, but a few things seem starkly obvious.
The need to build indexes by hand in NoSQL systems in order to search (efficiently or not) by different criteria is a step away from the relational model’s goals of data independence because these indexes are likely to be built with a particular application in mind, sometimes (often?) to the disadvantage of other applications requiring a different view of the data.
To further that point, if the indexes designed into the database are insufficient, it will probably be the case that applications will have to drop back to the level of processing one record at a time, rather than working with data sets as units, unless all application developers have enough control over the database system to be able to make the needed changes.
The job of the database administrator is no doubt at a disadvantage today with NoSQL systems, though this is more of a tools issue than a fundamental design issue. The “how do I query the database?” comic sums up the current situation amusingly.
A theoretical foundation of NoSQL systems is hard to find. Most of the theory seems to be in regard to eventual consistency and other issues related more to distributed systems than data modeling in the abstract. This will surely come with time, though as soon as you get into the details of data modeling in NoSQL systems, you really have to specify which one, as they are more different than they are similar. A theory of data management with key-value stores seems, to me, unenlightening at first glance.
Whatever the base model is, if NoSQL databases are here to stay, I think we are going to see a need for some theoretical foundations to manage the growing complexity of our data models given the new strengths and limitations of NoSQL systems.