Tuesday, November 3, 2015

comparision of SQL and NoSQL databases. 

The following is literature derived from slides at the Metadata Open Forum. 
NoSQL databases are known to be horizontally scalable and provide an alternative to Relational databases.  We explore the differences between both in terms of standards, characteristics and examples.  
In SQL databases, data is stored in columns and tables. Relationships are represented by primary and foreign keys There is language for data manipulation and data definition.  Transactions are facilitated and generally there is an abstraction from physical layer. This means that the physical layer can change without requiring changes in the application.  Applications specify what they want and not how they want it. Generally the following stages are involved: 
Parsing – The query string is converted into an Abstract Syntax tree 
Binding  This resolves the column or table names such that they have ids as represented in their metadata. 
Optimization - This applies a set of tree transformations to the tree so that the best possible execution plan is drawn. 
Code generation – The generated tree may be stored as is, or transformed partly into code (scalar expressions) or fully into byte code and saved in the plan cache. 
Query execution – While the code generated may be directly run, some of them may be converted into physical operators  
Cleanup – Transactions may be rolled back, locks may be released, logging as appropriate and memory may be reclaimed. 
 A NoSQL Database – on the other hand, is non-relational and distributed by nature. Its characteristics include schema-free representation of data, easy replication support, simple API and a different transaction model called BASE ( Basically available, soft state, eventually consistent ) They are able to handle data akin to warehouses because they scale based on nodes not merely horizontal or vertical partitions.  Inserts and updates are asynchronous. They are versioned and enable optimistic concurrency. Queries start returning answers quickly because because they approximate answers, return stale data, ensure availability first,  provide best effort, and are aggressive, simpler and faster. 
Although the consistency is weak, all node see the same data at the same time and the client perceives that a set of operations has occurred all at once. In other words, they are atomic but error tolerant and can be called multiple times. Node failures do not prevent survivors from continuing to operate. Every operation must terminate in an intended response. Operations will complete , even if individual components are unavailable.  NoSQL databases can appear in many forms but fundamentally the map-reduce operations are similar.  NoSQL databases can be Column Store,  Document Store, and Key-Value Store. A key-value store has a hash table of keys and values stored with keys. It provides fast access to small data values. The Map-Reduce operation consists of two phases :  
Map – extracts the sets of Key-Value pairs from underlying data and potentially in parallel from multiple nodes  
Reduce – merges and sorts sets of key-value pairs and the results may be useful for other searches. Map-Reduce operations are implemented by the application developers not the database. 


No comments:

Post a Comment