Saying the above out loud is almost like publicly admitting you’ve got lepra today, because all the hype seems to be geared towards NoSQL and document based database systems, while RDBMS types of systems doesn’t seem to get much love. However, somebody needs to say it out loud, and it might as well be me. And what I’m saying is that SQL is (almost) always superior to NoSQL. To illustrate my point let’s examine what you can do with SQL that’s simply not possible with most NoSQL systems.
- You can create join statements
- You can extract meaningful statistics
- Partial record updates
- You’ve got referential integrity
- Etc …
Let’s go through the above features one at the time to see the power of such features. To make this as much “hands on” as possible, please do the following first please.
- Register a Hyperlambda cloudlet here
- Install the SQLite Chinook DB plugin from the “Management/Plugins” section – Screenshot below
Then go to “Tools/SQL Studio” and click the load button, for then to choose the SQL script called “chinook_aggregate_profitable_countries”. Make sure you choose the “chinook” database and execute the script. The result should resemble the following.
Doing something such as the above with MongoDB, Couchbase, CosmosDB or DynamoDB could easily turn into a 6 months long software development project by itself. We did everything in less than 5 minutes, assuming you followed the hands on parts. And the result was that we now know which countries are the most profitable countries in regards to music buyers according to the Chinook database. Below is the SQL for reference purposes.
select BillingCountry, round(sum(Total),2) as Revenue from Invoice group by BillingCountry order by Revenue Desc;
Statistics such as the above often depends upon combining joins with group by constructs, and aggregates. Now try to perform a join statement in your CosmosDB installation to extract similar statistics. I’ll just eat my popcorn in the meantime …
OK, this is only a “partially true argument” since (some) NoSQL systems actually supports partial record updates – However, the last time I checked, CosmosDB did not support it, and unless you’ve got it, you’ll need to lock database records somehow as you’re updating them. There are 3 different techniques for locking database records, and they’re all disastrous in regards to implementation details, and/or the resulting software.
- Optimistic locking
- Pessimistic locking
- Ignore locking
To start out with the worst, ignoring locking implies you are 100% destined to experience race conditions in the future. Having race conditions on your database, is a guarantee of making sure your data becomes garbage over time.
Going for optimistic locking at best explodes the complexity of your codebase. Going for pessimistic locking ensures that you’ll go insane if one of your employees goes on holiday with a web form open on his computer, in addition to that it explodes the complexity of your code of course. I wrote about database locks a year ago at DZone in case you’re interested.
With partial record updates, the problem simply vanishes more or less. To understand why, read the above article for details. Some NoSQL database systems do support partial record updates, such as CouchBase if I remember correctly, but most struggles with these kind of constructs.
This one is the big one (pun!). Most NoSQL database systems poses restrictions upon you in regards to the size of your documents. I think the maximum size in Cosmos is 2MB. The mantra for NoSQL believers of course is as follows …
You don’t need referential integrity because you can just store everything in one document
“Good luck with that when the maximum document size is 2MB” – Is my answer to these people 😀
Maybe this is why Twitter only allows you to use 140 characters …? 😉
Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time.
Below is a list of SQL statements that are more or less impossible to execute in NoSQL database systems.
select ar.Name, count(*) as count from Album al, Artist ar where al.ArtistId = ar.ArtistId group by al.ArtistId order by count desc limit 25
The above counts how many records all artists have released.
select mt.Name, count(mt.MediaTypeId) as Amount from MediaType mt inner join Track t on mt.MediaTypeId = t.MediaTypeId group by mt.MediaTypeId order by Amount desc;
The above calculates how much profit was made from each audio format.
select distinct c.Email, c.FirstName, c.LastName, g.name from Customer c inner join Invoice i on c.CustomerId = i.CustomerId inner join InvoiceLine ii on i.InvoiceId = ii.InvoiceId inner join Track t ON ii.TrackId = t.TrackId inner join Genre g ON t.GenreId = g.GenreId where g.Name = "Rock" order by c.Email
The above finds the names and emails of all customers that are listening to rock music.
select ar.Name AS ArtistName, count(t.TrackId) as TrackCount from Track t inner join Album al on t.AlbumId = al.AlbumId inner join Artist ar on al.ArtistId = ar.ArtistId inner join Genre g on t.GenreId = g.GenreId where g.GenreId = 1 group by al.ArtistId order by TrackCount desc limit 10
The above counts how many songs each band has created, and orders such that the most productive bands are listed first.
Every now and then you really need a NoSQL database system. There are problem domains where document based databases are better. However, 99.99999999% of the time a relational SQL based database is simply a bajillion times better. My rule of thumb is as follows …
Am I to create Twitter, Google or Facebook? If the answer is no, I’m using SQL … 😉