I’m a fairly opinionated guy, to the point where it often gets me in trouble. In 2009 I wrote an article for a Norwegian IT magazine where I was warning software developers about lockin on the web. In the article I was referring to Silverlight and Adobe Flash/Flex as ActiveX2. The article generated 1,500+ comments, and became the most commented article the magazine had ever published through their 30+ years of existence or something. 98% of those commenting believed I was, and pardon my French here; “Bat sh*t crazy” at the time. My main argument was that spending resources on Silverlight and other ActiveX types of technologies was money wasted, and people should embrace open standards and the web instead.
3 years later Steve Jobs publicly said on stage that he would never support Silverlight or other ActiveX types of products on any of Apple’s products. Over the next 5 years, every single Silverlight product was canned, and the industry collectively probably lost half a trillion dollars in wasted efforts. The project nearly snuffed Microsoft, and I suspect Silverlight was a lot of the reason why Steve Ballmer left the ship. Microsoft alone had probably spent billions of dollars on Silverlight. A former colleague of mine used to be Silverlight MVP. Today he doesn’t even put it on his CV because it’s shameful for him I suspect.
I’ve also got a lot of other “hall of fame types of articles”, such as the one were I’m arguing that OOP is a mass psychosis. I’ve also been a vivid advocate for pulling back LOC count to performance performance. It could be argued that I’m for weird reasons consistently in disagreement with whatever everybody else seems to agree upon. Not sure what to say here really, besides maybe …
I was born like this …?
Anyways, I’m going sideways here. Let’s get back to the subject, and the subject is SQL or NoSQL …
NoSQL has some advantages that SQL does not have. For insane scalability requirements, a database with consistency simply doesn’t cut it. It’s a matter of physics really more than software development. However, that’s it, end of debate. This is where NoSQL’s advantages ends. And 99.9999999999% of all software systems ever created doesn’t have insane scalability requirements. Sorry MongoDb, CouchBase, and Cassandra, I’m not building Google, Facebook, or Twitter, so I don’t need NoSQL. The statistical probability of that you’ll need insane scalability is also roughly 0.00000001%.
I’ve written about this before. However, all of these things are close to impossible to achieve with NoSQL.
- Joining data
- Grouping and aggregates (any type of statistics really)
- Partial record updates (most NoSQL databases doesn’t allow for this)
- Consistency, as in making sure that once a record is updated, consecutive selects returns the updated data, and not stale or old data
- Etc, etc, etc
As to use cases for SQL, where NoSQL is arguably madness to use, are systems resembling the following.
- Any type of accounting
- Any types of system requiring statistics
- Any types of systems with complex filtering of data such as data mining types of systems
- Any system requiring consistency
To understand the difference between NoSQL and SQL is to understand the CAP theorem. For all practical concerns, SQL focuses on Consistency, while NoSQL focuses on Availability. And you can’t have both here. This was more or less proven by the guy who wrote the CAP theorem. To reduce it down to a simple question, ask yourself the following.
Do I need high quality data?
If the answer is no, go for NoSQL. You’re basically building Twitter 2.0, and availability is probably more important than high quality data. However, if you’ve got important data where quality is king, go for SQL. Having high quality data is not even possible in theory with NoSQL because of its focus on availability, sacrificing consistency in the process. To illustrate why this matters, imagine having a jar of cookies. You pass the jar around to friends at a party. When the jar has made it half way through the table, it’s empty, and some guy is left without a cookie. However, he passes the jar on to the next guy, who puts his hand into the jar, and magically gets a cookie, even though the jar was in fact empty just some few seconds before.
To translate this into a more relevant data use case, imagine you’re creating a bank system. One of your clients is sharing his account with his wife, and they have two cards, both cards can be used at ATMs to deduct money, and both cards are connected to the same bank account. As your client’s wife goes to the ATM and empties the account, your client does the same, and they both succeed deducting the maximum amount of money that was in the account. You’ve now given out twice as much money as the client had in the account to the client. Welcome to NoSQL and “availability”. You’ve basically created a banking software system that allows your clients to legally steal your money, because you swallowed the cool aid about “performance” from vendors delivering you some NoSQL database.
At Twitter or Google consistency is irrelevant. Who cares if somebody gets to read your tweet 2 seconds before some dude in China gets to read it? However, for everything else consistency is king. There are use cases for NoSQL, but the question you need to ask yourself is as follows.
Am I OK with garbage data, stale data, and lack of consistency?
If the answer is yes, go for NoSQL. If it’s not go for SQL. SQL versus NoSQL is a question of data quality versus availability. Every time you read consistency, mentally translate it in your head to “data quality”. If you can handle garbage data, NoSQL is OK. If you need high quality, go for SQL.
And for the record, I don’t care if you’re the Emperor of Rome or the CEO of JP Morgan. YOU need to make the decision about which database to use. If you let the developers make choices like this, you’re putting too much faith in their ability to choose correctly for you. NoSQL versus SQL is not a technical question, it’s a question about data quality!
NoSQL versus SQL is a question the CEO of the company should be making! Even if that CEO has 500,000 employees, he still needs to make that decision himself!