Extending NoSQL Databases with User Defined document types
and key values
The structure and operations on the NoSQL databases
facilitate querying against values. For example, if we have documents in XML,
we can translate them to JSON and store as documents or key-values in the NoSQL
databases. Then a query might look like the following:
Db.inventory.find(
{ type: “snacks” } )
This will return all the documents where the type field is
“snacks”. The corresponding map-reduce
function may look like this:
Db.inventory.mapReduce(
Map à
function() {emit(this.id, this.calories);},
Reduce à
function(key, values) { return Array.sum(values) },
{
query à query: {type: “snacks”},
output à
out: “snack_calories”
}
)
This works well for json data types and values. However, we
are not restricted to the builtin types. We can extend the key values with user
defined types and values. They will just be marked differently from the builtin
types. When the mapper encounters data like this, it loads the associated code
to interpret the user types and values. The code applies the same query
operators such as equality and comparision against values that the mapper would
have done if it were in native JSON format. This delegation of interpretation
and execution allows the NoSQL databases to be extended in forms such as
computed keys and computed values.
Let us take the above example where the calories have to be
computed from ingredients.
In this case, the code would look like the following
Function (ingredients, calories){
var total_calories = 0;
ingredients.forEach(ingredient,
index, ingredients){
total_calories
+= calories[index];
}
Return total_calories;
}
While this logic for computed key –values can be written
outside the database as map-reduce jobs, this logic can stay as close to the
data it operates on and consequently be stored in the database.
Moreover logic can be expressed with different runtimes and
each runtime can be loaded and unloaded to execute the logic.
One advantage of having a schema for some of the data is
that it brings you the seamless use of structured queries to these specific
data. As an example, we can even use XML data itself given the XPath queries
that can be run on them. Although we will load an XML parsing runtime for this
data, it will behave the same as other data types for the overall Map-Reduce.
Another example of user defined datatype is tuples. Tuples
are easier to understand both in terms of representation and search. Let us use
an example here:
We have a tuple called ‘Alias’ for data about a person. This
tuple consists of (known_alias, use_always, alternate_alias). The first part is
text, the second Boolean and the third is a map<text, text>
The person data consists of id, name, friends and status.
We could still insert data into the person using JSON as
follows:
[{"id":"1","name":"{"firstname":"Berenguer",
"surname": "Blasi",
"alias_data":{"know_alias":"Bereng",
"use_alias_always":true}}", "friends":"[{"firstname":"Sergio",
"surname": "Bossa"},
{"firstname":"Maciej", "surname": "Zasada"}]"}]'
However, when we search we can explicitly use the fields of
the type as native as those of the JSON.
There are standard query operators of where, select, join,
intersect, distinct, contains, SequenceEqual that we can apply to these tuples.
The reason tuples become easier to understand is that each
field can be dot notation qualified and the entire data can be exploded into
their individual fields with this notation as follows:
<fields>
<field indexed="true"
multiValued="false" name="id" stored="true"
type="StrField"/>
<field indexed="true"
multiValued="true" name="friends" stored="true"
type="TupleField"/>
<field indexed="true"
multiValued="false" name="friends.firstname"
stored="true" type="TextField"/>
<field indexed="true"
multiValued="false" name="friends.surname"
stored="true" type="TextField"/>
<field indexed="true"
multiValued="false" name="friends.alias_data"
stored="true" type="TupleField"/>
<field indexed="true" multiValued="false"
name="friends.alias_data.known_alias" stored="true"
type="TextField"/>
<field indexed="true"
multiValued="false"
name="friends.alias_data.use_alias_always" stored="true"
type="BoolField"/>
:
:
</fields>
The above example is taken from Datastax and it serves to
highlight the seamless integration of tuples or User Defined types in NoSql
databases.
In addition, Tuples/UDTs are read/written in one single
block, not on a per field basis, so they are read as a single block read and
write. Tuples/ UDTs can also participate in a map like data model although the
are not exactly map values. For example, a collection of tuples/UDTs have a
type field that represent what would have been the map key. We just have to
declare a UDT type that includes tuples as well and for this UDT we specify the
search the same way as in a map like data model but using dot notations for the
fields . For example we can search with {!UDT}alias.type:Bereng AND alias.use_alias_always:True
#coding question
Determine the maximum gradient in a sequence of sorted numbers
Int GetMaxGradient (int [] numbers)
{
Int max = 0;
For (int I = 1; I < numbers.length; i++){
Int grad = Math.abs (numbers [i] - numbers [i-1]);
If (grad > max) max = grad;
}
Return max;
}
#coding question
Determine the maximum gradient in a sequence of sorted numbers
Int GetMaxGradient (int [] numbers)
{
Int max = 0;
For (int I = 1; I < numbers.length; i++){
Int grad = Math.abs (numbers [i] - numbers [i-1]);
If (grad > max) max = grad;
}
Return max;
}
No comments:
Post a Comment