
Saying the above out loud is almost like publicly admitting you've got lepra today, because all the hype seems to be geared towards NoSQL and document based database systems, while RDBMS types of systems doesn't seem to get much love. However, somebody needs to say it out loud, and it might as well be me. And what I'm saying is that SQL is (almost) always superior to NoSQL. To illustrate my point let's examine what you can do with SQL that's simply not possible with most NoSQL systems.
- You can create join statements
- You can extract meaningful statistics
- Partial record updates
- You've got referential integrity
- Etc ...
Let's go through the above features one at the time to see the power of such features. To make this as much "hands on" as possible, please do the following first please.
- Register a Hyperlambda cloudlet here
- Install the SQLite Chinook DB plugin from the "Management/Plugins" section - Screenshot below
Then go to "Tools/SQL Studio" and click the load button, for then to choose the SQL script called "chinook_aggregate_profitable_countries". Make sure you choose the "chinook" database and execute the script. The result should resemble the following.
Doing something such as the above with MongoDB, Couchbase, CosmosDB or DynamoDB could easily turn into a 6 months long software development project by itself. We did everything in less than 5 minutes, assuming you followed the hands on parts. And the result was that we now know which countries are the most profitable countries in regards to music buyers according to the Chinook database. Below is the SQL for reference purposes.
select BillingCountry, round(sum(Total),2) as Revenue
from Invoice
group by BillingCountry
order by Revenue Desc;
Statistics such as the above often depends upon combining joins with group by constructs, and aggregates. Now try to perform a join statement in your CosmosDB installation to extract similar statistics. I'll just eat my popcorn in the meantime ...
Partial record updates
OK, this is only a "partially true argument" since (some) NoSQL systems actually supports partial record updates - However, the last time I checked, CosmosDB did not support it, and unless you've got it, you'll need to lock database records somehow as you're updating them. There are 3 different techniques for locking database records, and they're all disastrous in regards to implementation details, and/or the resulting software.
- Optimistic locking
- Pessimistic locking
- Ignore locking
To start out with the worst, ignoring locking implies you are 100% destined to experience race conditions in the future. Having race conditions on your database, is a guarantee of making sure your data becomes garbage over time.
Going for optimistic locking at best explodes the complexity of your codebase. Going for pessimistic locking ensures that you'll go insane if one of your employees goes on holiday with a web form open on his computer, in addition to that it explodes the complexity of your code of course. I wrote about database locks a year ago at DZone in case you're interested.
With partial record updates, the problem simply vanishes more or less. To understand why, read the above article for details. Some NoSQL database systems do support partial record updates, such as CouchBase if I remember correctly, but most struggles with these kind of constructs.
Referential integrity
This one is the big one (pun!). Most NoSQL database systems poses restrictions upon you in regards to the size of your documents. I think the maximum size in Cosmos is 2MB. The mantra for NoSQL believers of course is as follows ...
You don't need referential integrity because you can just store everything in one document
"Good luck with that when the maximum document size is 2MB" - Is my answer to these people :D
Maybe this is why Twitter only allows you to use 140 characters ...? ;)
Besides, storing everything in one document implies zero normalisation, making it almost impossible to update stuff, where multiple documents are having the same value, and that value needs to be changed over time.
Things you can't do with NoSQL
Below is a list of SQL statements that are more or less impossible to execute in NoSQL database systems.
Counting records from artists
select ar.Name, count(*) as count
from Album al, Artist ar where al.ArtistId = ar.ArtistId
group by al.ArtistId
order by count desc
limit 25
The above counts how many records all artists have released.
Calculate audio format profitability
select mt.Name, count(mt.MediaTypeId) as Amount
from MediaType mt
inner join Track t on mt.MediaTypeId = t.MediaTypeId
group by mt.MediaTypeId
order by Amount desc;
The above calculates how much profit was made from each audio format.
Get info about genre consumers
select distinct c.Email, c.FirstName, c.LastName, g.name
from Customer c
inner join Invoice i on c.CustomerId = i.CustomerId
inner join InvoiceLine ii on i.InvoiceId = ii.InvoiceId
inner join Track t ON ii.TrackId = t.TrackId
inner join Genre g ON t.GenreId = g.GenreId
where g.Name = "Rock"
order by c.Email
The above finds the names and emails of all customers that are listening to rock music.
Count songs by artists
select ar.Name AS ArtistName, count(t.TrackId) as TrackCount
from Track t
inner join Album al on t.AlbumId = al.AlbumId
inner join Artist ar on al.ArtistId = ar.ArtistId
inner join Genre g on t.GenreId = g.GenreId
where g.GenreId = 1
group by al.ArtistId
order by TrackCount desc
limit 10
The above counts how many songs each band has created, and orders such that the most productive bands are listed first.
Conclusion
Every now and then you really need a NoSQL database system. There are problem domains where document based databases are better. However, 99.99999999% of the time a relational SQL based database is simply a bajillion times better. My rule of thumb is as follows ...
Am I to create Twitter, Google or Facebook? If the answer is no, I'm using SQL ... ;)