Cluster computing

Tuesday, July 23, 2013

Technical overview of OneFS file system continued

Each cluster creates a single namespace and file system. This means that the file system is distributed across all nodes. There is no partitioning and no need for volume creation. Each cluster can be accessed by any node in the cluster. Data and metadata are striped across the nodes for redundancy and availability. The single file tree can grow with node additions without any interruption to user. Services take care of all the associated management routines such as tiering, replication etc. Files are transferred parallely without any regard for the depth and breadth of the tree.
This design is different from having different namespaces or volumes and aggregating them to make it appear as a single. There the administrator has to layout the tree, move files between volumes, purchase cost, power and cooling etc.
OneFS uses physical pointers and extents for metadata and stores files and directory metadata in inodes. B-Trees are used extensively in the file system. Data and metadata are redundant and the amount of redundancy can be configured by the administrator. Metadata only takes up about 1% of the system. Metadata access and locking are collectively managed in a peer to peer architecture.
The OneFS blocks are accessed by multiple devices simultaneously so the addressing scheme is indexed by a tuple of {node, drive, offset}.Data is protected by erasure code and is labeled by the number of failures it can tolerate simultaneously such as "N+2" that can withstand two failures when the metadata is mirrored 3x.
OneFS controls the placement of files directly down to the sector level on any drive. This allows for optimized data placement I/O patterns and avoids unnecessary read-modify-write operations. As opposed to dedicating entire RAID volumes to a particular performance type and protection setting, the file system is homogeneous and can be used to optimize spindle usage with configuration changes at any time and online.
Every node that participates in the I/O has a stack that comprises of two layers. The top half is the initiator that serves client side protocols such as NFS, CIFS, iSCSI, HTTP, FTP etc. and the bottom layer is the participant that comprises of NVRAM. When a client connects to a node to write to a file, it connects to the top half where the files are broken into smaller logical chunks called stripes before being written. Failure safe buffering using a write coalescer is used to ensure that the writes are efficient.
OneFS stripes the data across all nodes and not simply across all disks.

OneFS file system overview:
The EMC OneFS file system is a distributed file system that runs on a storage cluster. It combines three layers of traditional storage architecture namely, a File System, Volume Manager, and data protection.
It scales out because it relies on intelligent software, commodity hardware, and distributed architecture.
OneFS works with a cluster that consists of multiple nodes and starts out with as few as three nodes. In the clustrer, nodes could provide different ratios of throughput and Input/Output operations per second. OneFS combines these into a whole - RAM is grouped together into a single coherent cache, NVRAM is grouped together for high throughput writes and spindles and CPU are combined to increase throughput, capacity and IOPS
There are two types of networks associated with a cluster : internal and external. Intra-node communication in a cluster is performed using a proprietary unicast node to node protocol. IP over InfiniBand network is used with redundant switches.
Clients connect to the cluster using Ethernet connections, each node provides its own ports. File system protocols such as NFS, CIFS, HTTP, iSCSI, FTP and HDFS are supported.
The Operating system of OneFS is a BSD-based operating system that supports both windows and Linux/Unix based semantics such as hard links, delete-on-close, atomic rename, ACLs and extended attributes.
To communicate with the client, a variety of protocols as mentioned above are supported. The I/O subsystem is split in two halves - the top half is the initiator and the bottom half is the participant. Any node that the client connects to acts as the initiator.
Cluster operations involve checking and maintaining the health of the cluster. These jobs are run through a job engine and have priority associated. Jobs might include balancing free space in a cluster, scanning for viruses, reclaiming disk space, associating a path and its contents with a domain, rebuild and re-protect the file system, gathers the file system, scrubs disk for media level errors, and revert an entire snapshot to disk.
The granularity of the jobs ensures that OneFS performs adequately and appropriately for every impact interval in the customers environment.

Monday, July 22, 2013

Barry Wise on five progressive steps to database normalization
We start out with an example where we store user's name, company, company address, and personal urls - say url1 and url2
The zero form is when all of this is in a single table and no normalization has occured.
The first normal form is achieved by
1) eliminating repeating groups in individual tables
2) creating a separate table for each set of data
3) identify each set of related data with a primary key
so this yields a table where the user information is repeated for each url so url field limitation is solved
The Second normal form is achieved by
1) Creating separate tables for a set of values that apply to multiple records
2) Relate these tables with a foreign key
Basically, we break the url values into a separate table so we can add more in the future
The third normal form is achieved by
1) eliminating fields that do not depend on the key
Company name and address have nothing to do with the user id, so they are broken off into their own table
The fourth and higher form depend on data relationships involving one-one, one-to-many and many-to-many.
The Fourth normal form is
1) In many to many relationship, independent entities cannot be stored in the same table.
To get many users related to many urls, we define a url_relations where they user id and url id are paired.
The next normal form is the Fifth normal form which suggests that
1) The original table must be reconstructed from the tables into which it has been broken down. This is a way to check that no new columns have been added.
As always, remember that denormalization has its benefits as well.
Also, Litt's tips additionally mentions the following :
1) create a table for each list. More than likely every list will have additional information
2) create non-meaningful identifiers.
This is to make sure that business rule changes do not affect the primary identifier

Sunday, July 21, 2013

Database normalization and denormalization rules are discussed here.
Codd described the objectives of normalization as follows:
1) To free the collection of relations from undesirable insertion, update and deletion dependencies.
2) To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increases the lifespan of application programs.
3) To make the relational program more informative to users
4) To make the collection of results neutral to query statistics, where these statistics are liable to change as time goes by.
The undesired side effects of insert, update or delete may include the following:
- Multiple rows for the same information which are updated independently and get out of sync
- no rows such as when a new user is added but not assigned to anything
- inconsitent deletes such as where one table deletion implies and requires a deletion in a completely different table.
If the addition of new data, requires changes to existing structure, then such changes can cause regressions
Tables when normalized are immediately correspond to real world concepts and their relationships.
The normalized tables are suited for general querying across any set of tables.
Some common normalization terms:
Functional dependency : FD: X->Y where Y attribute has a functional dependency on a set of X attributes if and only if each X value is associated with one and only one Y value.
Trivial Functional dependency : is a FD of an attribute on a superset of itself.
Full Functional dependency : when an attribute is FD on X but not on any subset of X
Transitive dependency : X->Y and Y->Z implies X->Z
Multivalued dependency : presence of some rows implies presence of some others
Join dependency - table can be recreated with joins
Superkey : A superkey is a combination of attributes that can be used to identify a database record.
Candidate key is a minimal superkey such as the Social Security Number
Non-prime attribute is one that does not occur in any candidate key. A prime attribute is one which occurs in some candidate key
A candidate key may be designated as a primary key but is unsually not talked about in with respect to other candidate keys.
Normal forms include the following:
1) First normal form - table has one candidate key
2) Second - no non-prime attribute is FD on a proper subset of any candidate key
3) Third - every non-prime attribute is non-transitively dependent on every candidate key in the table ( no transistive dependency is allowed)
4) EKNF - Every non-trivial FD is either the dependency of an elementary key or a dependency of a superkey
5) BCNF - Every non-trivial FD in the table is a dependency on a super key
6) Fourth - Every non-trivial multivalued dependency is a dependency on a super key
7) Fifth - Every non-trivial join dependency in the table is implied by the super keys of the table.
8) Domain key - Every constraint is a logical consequence of the tables domain constraints or key constraints.
9) Sixth - no non-trivial join dependencies
Denormalization - OLAP is denormalized as compared to OLTP. The redundant data is carefully controlled during ETL. Normalized snowflake schema becomes denormalized star schema. Non-first normal form is formed by nesting 1NF and the reverse is unnesting.

The KMP algorithm for string pattern matching proceeds like this:

//
#include "stdafx.h"
#include <iostream>
#include <cstdio>
#include <cstring>
#include <vector>
using namespace std;
int* PreProcess( string pattern) {
int patternLength = pattern.length();
if (patternLength == 0) return 0;
    int * next = new int[patternLength + 1];
    if (next == 0) return 0;
    next[0] = -1;  // set up for loop below; unused by KMP

    int i = 0;
    int j = -1;
    while (i < patternLength) {
  next[i + 1] = next[i] + 1;
  while ( next[i+1] > 0 &&
    pattern[i] != pattern[next[i + 1] - 1])
   next[i + 1] = next[next[i + 1] - 1] + 1;
  i++;
}
    return next;
}
void KMP(string pattern, string text, vector<int> *positions) {
    int patternLength = pattern.length();
    int textLength = text.length();
    int* next = PreProcess(pattern);
if (next == 0) return;
    int i = 0;
    int j = 0;
    while ( j < textLength )
{
  while(true)
   if (text[j] == pattern[i]) //matches
   {
    i++;   // yes, move on to the next state
    if (i == patternLength) // maybe that was the last state
    {
     // found a match;
     positions->push_back(j-(i-1));
     i = next[i];
    }
    break;
   }
   else if (i == 0) break; // no match in state j = 0, give up
   else i = next[i];
  j++;
}
}
char CharCode(char chr) {
    if ( 'a' <= chr && chr <= 'z' )
        return 'a';
    if ( 'A' <= chr && chr <= 'Z' )
        return 'A';
    if ( '0' <= chr && chr <= '9' )
        return '0';
    if ( chr == '.' || chr == '?' || chr == '!' || chr == ',' || chr == ':' || chr == ';' || chr == '-' )
        return '.';
}
string CodeText(string text) {
    string ret = text;
    for (int i = 0; i < text.length(); ++i) {
        ret[i] = CharCode(text[i]);
    }
    return ret;
}
void FancyOutput(string pattern, string code, vector<int> *positions) {
    cout << "Matched positions: ";
    for (int i = 0; i < positions->size()-1; ++i)
        cout << (*positions)[i] + 1 << ", ";
    cout << (*positions)[positions->size()-1] + 1 << "." << endl;

    std::cerr << "Text: " << code.c_str() << endl;
    for (int i = 0; i < positions->size(); ++i) {
        printf("%5d ", i+1);
        for (int j = 0; j < (*positions)[i]; ++j) cout << " ";
        cout << pattern.c_str() << endl;
    }
}
int _tmain(int argc, _TCHAR* argv[])
{
    string pattern, text, code;
char pattext[1024];
char txttext[1024];
    cout << "Input pattern:" << endl;
    cin.getline(pattext, 1024, '\n');
pattern.assign(pattext);

    cout << "Input text:" << endl;
    cin.getline(txttext, 1024, '\n');
text.assign(txttext);
    cout << endl;

    code = CodeText(text);
    cout << "Processed text:" << endl;
    cout << code.c_str() << endl;
    cout << endl;

    vector<int> *positions = new vector<int>();
    KMP(pattern, code, positions);

    if ( positions->size() )
        cout << "Y" << endl;
    else
        cout << "N" << endl;

    if ( positions->size() )
        FancyOutput(pattern, code, positions);
    return 0;
}

Saturday, July 20, 2013

Kimball architecture

Kimball architecture:
The Kimball architecture is based on dimensional modeling. The following rule of thumbs are observed in this modeling
1) Load detailed atomic data into dimensional structures. i.e do not load summarized data into the dimensional tables.
2) Structure dimensional models around the business processes. these business processes have performance metrics that often translate to dimensions or facts. Combined metrics could also be additional dimensions.
3) Ensure that every fact table has an associated date dimensional table. The business processes and performance metrics mentioned above are often associated with measurement events which are usually periodic with holiday indicators
4) Ensure that all facts in a single fact table are at the same grain or level of detail. The measurements within a fact table must be at the same grain or level of detail such as transactional, periodic snapshot or accumulating snapshot.
5) Resolve many to many relationships in fact tables. The events stored in a table are inherently associated with many places on many days. These foreign key fields should never be null. Sometimes dimensions can take on multiple values for a single measurement event, in which case, a many-many dual keyed bridge table is used in conjunction with the fact table.
6) Resolve many to one relationships in dimensional tables. Hierarchical fixed depth many to one relationships are typically collapsed into a flattened dimensional table. Do not normalize or snowflake a M:1 relationship but denormalize the dimensions.
7) Store report labels and filter domain values in dimension tables. The codes, decodes and descriptors uses for labeling and query should be captured in dimensional tables. Again such attributes should have no nulls.
8) Make certain that dimension tables use a surrogate key. Meaningless sequentially assigned surrogate keys can help make smaller fact tables, smaller indexes and improved performance.
9) Create conformed dimensions to integrate data across the enterprise. Conformed dimensions also referred to with common, master, standard or reference dimensions are defined once in the ETL system and deliver consistent descriptive attributes across dimensional models and support the ability to drill across and integrate data from multiple business processes.
10) Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision making. User requirements and underlying realities of the associated source data needs to be reconciled.
Dimnesional modeling, project strategy, technical ETL/ BI architectures or deployment/maintenance all require balancing acts.

Friday, July 19, 2013

Corporate Information Factory Architecture

CIF operational systems topology:
The CIF architecture comprises of the data warehouse and operational data store as the core of the architecture. This is surrounded by metadata management plane where the operational systems reside Operational systems can include external operational systems, distribution, product, account and customer operational systems. These operational systems add data into data management core through transformations and integrations. Exploration warehouse datamart and data delivery also extract information from the data management core. The decision support interface may use these data marts while the transaction interface may use the operational data store directly. CIF consumers acquire the information produced via data delivery, manipulate it in datamarts, and assimilate it in their own environments. Outside this metadata management is the Information Services as well as Operations and administration.
The Information Services comprises of groups of items such as library and toolbox as well as the workbench. The Operations and administration involve systems management, data acquisition management, service management and change management.
Producers are the first link in the information food chain. They synthesize the data into raw information and make it available for consumption across the enterprise.
Operational systems are the core of the day to day operations. The operational systems are organized by the product they support. However, businesses are more customer oriented than product so that they can differentiate their offerings. CIF provides facilities to define how this data relates to a customer using rules that form the metadata.
Operational data usually stand alone so they have not been integrated. CIF synthesizes cleans and integrates the data before it is usable. CIF also acts a history store for the enterprise.
Next, integration and transformation consist of the processes to capture, integrate, transform, cleanse, re-engineer and load the source data into the data warehouse. Typically data is pulled from the operational systems as opposed to pushing. Configuration management and scheduling play a large role here. Typically these packages can be written with the knowledge of the source data and once written don't change too much. So they are good candidates for scheduled jobs.
Next the data warehouse plays the big role in CIF. It is often described as "subject-oriented, integrated, time-variant(temporal) and non-volatile collection of data" Sizes of up to terabytes of data are not uncommon for a data warehouse.
Next, the data management extends the data warehouse with archival/restoration, partitioning and movement of data based on triggering and aggregation. This is often in house development tasks.
Finally, the consumers extract information from this collection of data. The data-marts support these analytic requirements and they are made available via the decision support interface.
Also, the metadata enables metrics for measuring support.
Courtesy : Imhoff