Those Who Don't Know History…
Category Theory for DBAs — Part 1, Why?

Mathematics and information technology have had a symbiotic relationship from the very start. Many useful tools and concepts were born as a result of this relationship. This cross-pollination started before the first computer and lay the foundation for programming, the internet and the first ‘modern’ databases.

However in recent years the mathematical foundations of (relational) databases have started to be forgotten a bit. Meanwhile mathematics has gone its own way and databases still only support the bare minimum of relational algebra. And while the popularity of NoSQL suggests a consensus that it’s time for a more modern approach, any attempt that fails to properly recognise what relational databases are, risks reinventing relational databases before getting to the real problem.

Take for example graph databases, an attempt to modernise database by basing them on graph theory (invented by Euler in 1735). Not only are relations and graphs similar in many ways, the obvious generalisation from graphs to hypergraphs (analogous to moving from binary to $n$-ary relations) results in much the same model as the relational databases described by Edgar F. Codd (though the equivalence requires the use of NULLs, another concept that seems to keep getting reinvented).

While such efforts do bring some much needed fresh air, they would be more effective if the effort was not spent reinventing the parts that are already there. And while Codd’s work in the 1970s was important, very little of the mathematics developed since then has seen much use in databases. This suggests several areas of improvement.

Firstly awareness of the mathematical notions behind databases needs to be improved. In order to avoid reinventing concepts simply because they are not part of ‘ordinary’ SQL, or simply forgotten. Even if those concepts have been known to mathematicians for centuries.

Secondly it is time to bring the state of mathematics used in databases to at least somewhere in the later half of the 20th century. As important as the relational algebra of Codd was, it didn’t exactly represent the cutting edge of mathematics at the time. Of the cutting edge mathematics of that time the field of category theory in particular has since found many uses in other areas (including information technology) so it is time to bring it to the world of databases as well.

There is, however, one small problem which will need to be tackled first.

Potential Audience Reads articles about databases Reads articles containing Mathematics Figure 1. The problem.

The healthy cross-pollination between mathematics and databases can not happen without people interested in both. Therefore anyone still reading this is cordially invited to read the other posts in these series as they come online. The first couple of these will necessarily focus on increasing the potential audience by reviewing the mathematical notions that relational databases were built on (e.g. relations) and by showing some useful applications as bait to interest more people in mathematics (or databases).

More in this series:

  1. Those Who Don't Know History…
  2. Putting the Relational back in Relational Databases