NoSQL and MongoDB

What is NoSQL

NoSQL means "Not Only SQL". The term came about because databases were thought to be the solution to every problem. Database vendors included the ability to store blobs (images, documents, and other types of unstructured files). Data types such as large text fields and support for XML documents were included with most relational database systems. Unfortunately most databases store this data inefficiently and have included limited api's to search and query this unstructured data. As more unstructured data is inserted into databases, the database bloated in size and performance suffered substantially. The ability to scale out these systems was also difficult and extremely expensive. Supporting unstructured data in transactions was also difficult and causes transaction log bloat. Insert/update/delete/select performance was extremely inefficient and usage of memory was not done correctly as these large objects caused the buffers to be flushed with large documents. Mixing these workloads just didn't work!

It was only recently that database vendors such as Oracle and Microsoft have started including columnar storage types and table compression options. Although this helped resolve the problems with efficient storage and retrieval, the problem of performance, mixed workloads and limited api's were not resolved. Database vendors were slow to respond to the performance demands of unstructured data. Meanwhile, several other options have existed for more than a decade that solve these scalability problems. Storage of unstructured were more efficient when stored in a different storage structure than relational database systems. Smart companies utilised these specialised Systems for performance reasons.

Another issue which companies have to deal with is the volume of data that they need to process. Raw data such as web logs stored in file systems are difficult to manage and query. A solutions was required to store billions of unstructured log files and documents which spanned several petabytes of storage. Such volume was almost impossible to store in a normal relational database without investing millions of dollars in expensive SAN (Storage Area Network) storage systems. With NoSQL, you can use low-end servers with relatively cheap storage options but still get the performance of these large multi-million dollar systems.

So NoSQL is the term used for using Unstructured Database Systems along with Relational Database Systems. It does not mean that relational databases are not used. In-fact, relational databases are here to stay and no organisation that deals with transactions (which includes any company that makes money) will ever get rid of a structured database that supports ACID properties. Even if you are a small company that uses Paypal for processing orders, your transactions will be stored in a relational database system (with Paypal, it is stored in MySQL databases).

Why should I use NoSQL?

NoSQL databases are optimised for unstructure data and complimented existing traditional database systems. Unstructured databases were designed with scalability in mind and for the storage of documents and because they don't need to support all the ACID (Atomicity, Consistency, Isolation and Durability) properties of a normal database system, they were easier to implement and had much better performance for unstructured workloads. Scalability is also easier to implement and you can create clusters of hundreds of nodes with very little or no effort... so you can scale out your systems. The best thing is that most NoSQL databases are open source and free to use.

Several large companies such as Facebook, Linkedin, Google, Netflix, Twitter, and many others have open sources many of their projects which include the NoSQL databases that they have developed or enhanced internally. Apache also has a number of open source NoSQL databases available which have been tried and tested for decades and have hundreds or thousands of developers contributing to those projects. These systems are extremely reliable with redundancy built-in and replication is transparently handled by the system so experience in networking and sophisticated storage replication technology is not required.

In summary, NoSQL databases are proven unstructured database systems started mostly by large organisations that ran into performance bottlenecks with relational databases. They allow you to store petabytes of data with ease and most are available at no cost. They are easy to implement and to develop against with simple api's. If you have unstructured data in your organisation, then NoSQL is the best place to store your documents with many benefits that you will not get by storing them in the file system.

What solutions are available?

There are a number of NoSQL solutions are available. They vary in terms of features, ease of use, scalability and performance. Higher performance systems tend to have a limited feature set while those with extensive features tend to be slower because of the internal data structures needed to support those features. One of the leading NoSQL databases is MongoDB which strikes a good balance between performance and scalability with support for scaling out, redundancy, indexing, and many functions such as map-reduce being built into the product.

For basic large scale key/value stored with limited features, you can use solutions such as HBase (Apache), DynamoDB (Amazon), Cassandra (DataStax), VoltDB, Redis, Riak, or Couchbase. These vendors have fewer features than MongoDB and may have better performance for some workloads. Most of the api's for these databases are simple get/put operations with some allowing a batched version of these commands for better performance. Not all these databases have the ability to create secondary indexes... of them only allow simple operations against keys. Some of them support versioning of data (such as HBase).

For memory caching, Memcached is one of the leading NoSQL solutions available free of charge with many mature api's for almost any language. Alternatives to memcached include CacheDB and proprietary solutions such as Infinispan from Redhat/Jboss and EastiCache from Amazon.

How can we help?

You need to assess your requirements before selecting the right NoSQL solution. Dewlock can help you select the right NoSQL solution and implement a NoSQL in your organisation to meet your cost, performance and business requirements. Contact us for a free quote. We are also have several staff who are ready to provide MongoDB services and we are a MongoDB ready partner.