Cluster computing: August 2015

Monday, August 31, 2015

We continued reviewing the book Rookie Smarts by Liz Wizeman. We read the description of the different kinds of the Rookie smart mindset which included the backpacker, the hunter-gatherer, the firewalker and the pioneer. The author notes that this does not eliminate the role played by a savvy veteran mindset. And that we could toggle between the latter and the former in unclear murky situations and changing times respectively. This she notes is the perpetual rookie mode.

Let us take a look at these mindsets in more detail. The backpackers are the ones who see new possibilities and explore new terrain. Like a hiker who trudes obviously past a snake lying quietly in the bush, rookies simply walk past the obstacles they don’t see and because they don’t know what they are supposed to be looking for, they can end up seeing what others fail to notice. They ask fundamental questions because they are unencumbered by rules and rituals. They see patterns and logical flaws in conventional wisdom with pattern recognition. They explore new terrain by taking shorter paths, making bigger asks, and acting erratically. They act wholeheartedly because without reputation, rookies can operate without ego or fear of falling. They act boldly and recover quickly and they work passionately.

The hunter-gatherer emphasize the outward venturing forth, harvesting game and foraging for resources to bring back to the tribe. They scan the environment, connect the dots and in this hunger mode, they seek out experts to teach and guide them. Seeking out experts is such a trait that if you ask a smart rookie, you get access to a team of experts. They also mobilize ideas and resources. In complex environment, winning organizations tap into the greatest number of brains. Crowdsourcing is growing as a viable and efficient alternative to in-house or expert led solutions to complex problems.

The firewalkers demonstrate a trait that’s both cautious and quick. They see a glaring gaping hole between what they have previously done and what they must now do. They don’t necessarily lack self-confidence. The author says this rookie’s anxiety is not the clinical kind but a productive paranoia and an urge to break out of the comfort zone. Their way is to take small calculated steps by testing the waters. Since she operates in the dark, she minimizes the risk. Rookies use every available sense to navigate the way and they calibrate their movements to minimize the risks. They deliver quickly and tend to operate in bursts.They sometimes get off track. They solve problems quickly but doesn’t mean they know which problems are the most important. They seek feedback and coaching. In fact smart managers provide continuous stream of feedback and regularly receive vital information. These rookies lack confidence but with the feedback, they translate it to intelligence.

The pioneers are the last in the different categories enumerated by the author. Their practice involves building new tools and structures because they are in a perpetual survival mode. They improvise because they didn’t have the resources or the skills they need. They work relentlessly because they lack another choice and are motivated to push forward.

Finally the author calls out that a perpetual rookie is one who is curious, humble playful and deliberate. She takes each of these qualities into consideration as how it contributes to the rookie smarts. Some of this is straightforward but what the author shows us is that even when we are not a rookie and we exercise these traits, we see the results brought about. The rookie revival as she calls is about learning to relearn.

At a personal level, organizational level and a team level, we can utilize the rookie smarts.

int GetCount (node root)
{
If (root == null) return 0;
Int lc = GetCount (root.left);
Int rc = GetCount (root.right);
Return lc + rc +1;
}
}

Sunday, August 30, 2015

A software called Reminders:

An alarm clock and a calendar are very useful tools to remind us of a task and to execute it at the scheduled time. Its great for planning and its very personal. But the latter is what we want to change when we work on a list of resources loaned out to many people. If we had a software that could send out emails based on templates for the various resources that are currently used by others, the chore of maintenance, reuse and reclamation becomes all the more easy. But this is just one of the usages.

A software utility like Cron is also a time-based job scheduler however in an non-interactive Unix like computer operating systems. People who setup and maintain software environments use cron to schedule jobs, commands or shell scripts to run periodically at fixed times, dates, or intervals. It’s hard for the Cron job users to use it visually in a web browser that can be accessed from a variety of devices. How then can they use it when they are not near a machine ?

Somewhere between these two extremes of a highy interactive albeit personal productivity tool to more business oriented non-interactive background job scheduler, there is a niche space for a software I like to call Reminders that can print labels that can be slapped on to any thing we want to remind ourselves about.

First this software can send reminders as affable to us as emails so that we have a variety of information and the context to complete the task. It involves hyperlinks and many other features. It works along the principles of a birthday alarm but provides a placeholder for a variety of reminders.

Secondly this software can send reminders not just to ourselves but to millions others. Hence its not just a little note for ourselves but one that can be our butler for informing many others. This provides the kind of white glove treatment that may set the tone for not just an event as some calendars do but for adding the kind of information that anyone and not just us can work with at their pace.

Thirdly this software can schedule any kind of job that can be exercised without intervention. Just like the commercial Cron can schedule jobs in the background, this can serve a variety of tasks.

Fourth, it prints labels that can be attached to just about anything that we want to remind us something about. Given the label, we can look up an inventory of reminders to see what the purpose and tasks are. The labels are almost nothing more than barcodes and therefore have the ability to be included in a variety of rich information.

Such is the description of the software that can be both personal, professional and efficient with fun tags or labels to attach to resources, tasks or even calendars or visiting cards to improve the productivity, reuse and value discoveries in cycles.

Saturday, August 29, 2015

Today we review the book Rookie Smarts by Liz Wiseman. She argues that in the game of work, learning beats knowing and there is no age limit for it. Even if you are underqualified or novice, we can be at our best. Similarly even after years of experience, we can show the same zeal and dare to dream as the children.

The question is this really true where the organizations pride in the ways of the world and the lion eats his share, where fresh blood is demanded on the floors and many exit because they couldn’t adapt to the culture and ways. No this book does not address any of those but it does make its case that there are different modes of rookie mindset and they can come in very valuable in the path to success.

These mindsets she calls Backpacker, Hunter-Gatherer, Firewalker and Pioneer. Rookie smarts addresses the questions every experienced professional faces. Will my skillsets no longer be valued, will I keep up with the changing times ?

She says lets see some rookie smarts in action and then wonder if we do or do not practice it. The author says seldom do we find the right people at the right job at the right time. More often than not, we face a situation out of our comfort zone. As we look at what we got to meet the challenge, do we pay attention to exploring and opening avenues for ourselves with the curiosity that drives someone fresh out of college. Yes even the most experienced might see something new in this rookie state.

Rookie smarts are about living on the learning curve.We see it on the athletic field and in the workplace. With 400 workplace scenarios studied, the author draws these four conclusions:

First rookies are strong performers. They outperform veterans in the innovation and time to completion.

Second rookies have a unique success profile. The highest performance rookies connected the dots, experimented and learned from mistakes.

Third rookies aren’t always what they seem. They aren’t clueless and bumbling and not always high risk takers. They actually bite just as much as they can chew.

Finally, the advantage rookies have with regard to their more experience counterparts is that the latter usually have blind spots.

So let us take a look at the categorizations of the rookie mindset the author presents us with.

First the backpacker is the rookie that has nothing to weigh them down and nothing to lose. Their mindset is unencumbered as opposed to someone resting on their laurels who demonstrate a protecting mindset.

Second the hunter gatherer mindset is demonstrated when the rookies don’t know the lay of the land and are forced into a sense making mode. They are alert and seeking.

Third the firewalker mindset is displayed when we have to operate cautiously because we have to close a knowledge or performance gap quickly.This mindset is cautious and quick.

The pioneer mindset also comes with uncharted territory and is one of hunger and relentless pursuit.

The author argues that with so many things changing so fast around us, we have to be in a perpetual rookie mode.

Friday, August 28, 2015

In continuation of the previous post on the overfished ocean strategy from the book by Nadya Zhexembayeva, we describe the five principles mentioned by the author.

Principle one : line to circle

The circular economy is “generative by design” says the Ellen MacArthur foundation where nothing is wasted and everything is going around in a circle. As an example, Gamestop that was originally a software and videogame retailer, started accepting hardware that was tossed away only to recycle them into new ones. This reverse-engineering practice became a strong skills set for Gamestop and sales of the rebuilt devices were upto $200 million.

There are some practicalities to be considered says the author.For example, not everything can be done at once. And so the options for consideration are reuse, refurbish, and recycle.

Reuse has been around for a long time. The example given here is that of cotton in textiles to rotation as fiber-fill in upholstery to insulation for construction.

Refurbish is where the product is made just like new. This is usually cost effective.

Recycle is where the product changes into something else. From line to circle is the fundamental way you do business – the most important shift needed to thrive and survive in the resource-deprived world.

Principal two: vertical to horizontal

Companies like apple are trained to satisfy their customers and in addition to look at their direct competitors. When we are looking downstream tracking the flow, competition is the vertical cut in the chain. But this is the first thing we should leave behind in the strategy we are considering. As the mining companies have discovered, the use of Caterpillar in-loaders produces a core that makes the castings needed to make the engine blocks for a tractor and the spent sand from this process goes to increase the yield of corn that gets sold to a company which makes biofuel which in turn powers the Caterpillar in-loaders.

Principle three : growth to growth

To secure future growth, we shift from context of collapsing linear economy to moving from products to all-round solutions which becomes the easiest way to find and secure future growth. An effort to sell services rather than products is a shift in perspective along this principle. This addresses sustainability issues and becomes an opportunity for efficiency and profit.

The author argues that the first three principles of the Overfished Ocean strategy have focused on the factors and forces outside the company. But in order to be successful, we must have a principle internally to facilitate such a change.

Principle four : Plan to model

The convention to launch something is to formulate a detailed plan often looking five years into the future. This is about how to get from point A to point B. A model on the other hand is about the vehicle you use for traveling. The American company Safechem transformed its revenue from one accrued by the volume of selling cleaning products to one that depends on the volume of cleaned surface thereby gaining rise to much more resource savvy business model.

Principle Five: Department to mindset.

Historically, deparments were seen as lending organizational efficieny but it became so compartmentalized that it failed against the rising trends in the last two decades – which is change management. Similarly this strategy is not a department but a mindset. – one that is built on a range of distinct capabilities involving

Systems thinking - that focuses on how things interact

Stakeholder management – people and relationships that we never knew existed come into the picture.

Design thinking – This one is new and the goto capability for the managers using this strategy. Its about the easy choices between difficult to create alternatives as opposed to finding the hard choices between tradeoffs

The author concludes with scenario planning to come up with that one compelling story. This planning begins by asking “what if” cases and then projecting long term trends and changes involved. Scenario planning asks you to define the most material of the risks., brainstorm the possible implications and imagine proactive responses to the future that might be. As we explore everybody’s future, we might find a way to create one. The author asks when the collapse of the linear economy is given, how far are we from taking recourse in this strategy.

#codingexercise
Node getAvg (node root)
{
If (root == null) return 0;
Left = root.left ? GetAvg(root.left);
Right = root.right ? GetAvg (root.right);
Return (left + root.val + right ) / 3;

}

Thursday, August 27, 2015

Today we review the book Overfished Ocean Strategy: Powering Up Innovation for a Resource Deprived World by Nadya Zhexembayeva

This is a book that puts the spotlight straight on resource scarcity because demand is high and supply is low. Scholar and Entrepreneur Nadya teaches how to discover resource conserving efficiencies up and down the value chain and rapidly try and define these new business models.

The extraction of resources, its transformation and creation of products is getting slower because resources are drying. How do we innovate in a resource deprived world ? This book describes the five principles that define the Overfished Ocean strategy.

She actually begins with the dwindling fisheries in the ocean. A research study cited in the book shows that this amounts to nearly fifty billion in economic losses – enough to make a case that this is indeed a grave situation. Earlier we used to grow the population, increase the consumption and keep cutting prices all at the same time.Now the commodity prices have increased by 147percent, three billion more middle-class consumers are expected to be in the global economy, all putting new pressures on resource demand.

Our throwaway economy assumed production is easy. But environmentalists have proven that “waste equals food”. Mining, growing or raising something has been our only options for raw materials. The other end of this rainbow of businesses is the ones consuming waste which by itself is a lucrative business. In other words we have the entire global economy as one giant supply chain. The author argues that this is linear, throwaway, and collapsing.

Therefore the five new rules of the trade begin with just that:

From line to circle: Waste of one process becomes food for another. This is the “Cradle to Cradle” approach.

From vertical to horizontal: Mainstream strategic thought invites us to pay attention to the whole of five forces in the business ( competitors, consumers, suppliers, new entrants, and substitutes) but in reality most dig into the competition. It’s the potential from the untapped forces that we could add to our repertoire

From growth to growth: Most managers will vouch that growth comes from selling more. That is now possible with creating more with less and therefore it’s a question of what we want to grow.

From four plan to model: Plans become obsolete in no time. The only way to make the new reality work is to constantly adapt your business to the new reality. Hence the advocation for modeling

From department to mindset: Generally department heads can become scapegoats. Instead the new market reality demands a new mindset, a new way for the entire company to look at values where previously none were available.

With the above principles, we find the overfished ocean strategy to find value cycles.

Wednesday, August 26, 2015

How do we search the object store ?

There have been a couple of mentions in my previous post but not a solution. Let us face it, the S3 apis only permit search on the prefix-included-name for objects. And even that is limited to a PCRE regular expression search. Consequently, a lot of time may be spent in coming up with naming conventions. The trouble with naming conventions is that it static and may even require changing of the names when there is an attribute change or some conflict arises.

On the other hand, metadata for each object is available on an iterator basis. This means that we can iterate one object after another and match its metadata to that of the query. For example, if we want to find out all objects created by a certain owner, then we scan all the objects and match its owner or created_by field to the value in the query.

Arguably, the name and the metadata of an object are smaller in size than the average object size. In other words, we could keep a mirror of the object store with empty files for each of the corresponding object in the object store. We therefore have the name and metadata of each.

Another way to do this would be to create an index over the prefix names. A SQL table with the object identifier and its metadata attributes as columns or relation to key-value table would suffice. The table will have an index on prefix and even the metadata values for those field that are common to all.

With an index on the prefix name, searching and sorting in TSQL becomes far more easier. A clustered sequential index on the name or object key would even help reduce the disk access.

Moreover adding a table for the name and metadata lends itself to standard query operators. Operators like Select, Join, Union, intersect, Except, Distinct, Range, SequenceEqual, Skip, SkipWhile, Where etc can be seamlessly performed on the object keys which makes it easier to come up with a final result set of the objects of interest. Moreover, aggregator operations can also be performed in addition to different kinds of positional access.

Lastly, the object store enables such index to be an object itself in the object store. So we don’t need to keep another database for this purpose.

#codingexercise
bool GetMax (node root) {
If (root == null) return false;
if (root.right == null) return root;
while (root && root.right)
root = root.right;
return root;
}

Tuesday, August 25, 2015

1 / 2

In the previous post, I discussed that the notion of buckets as a container for similar items is erroneous. Although we are trained to think of a bucket as a collection of folders ( which by itself is a misnomer andshould be read as prefixes), a bucket is essentially a pot-pourri of objects.

Objects can have prefixes in their keys and name-value pairs in their metadata. Hence there is no organizational structure needed for them and they appear as a flat list within the bucket. This means that we us retrieve collections of objects based on their prefix or metadata. Note that the absence of a hierarchical organizational data structure is more than made up for by the conventions used in the prefix and in the metadata.

Let us take a look at retrieving a dynamic collection of objects search or metadata query. This is usually done with the help of an iterator.

First is a prefix based search which we have seen from the browsing of folders. This is possible because we narrow down the objects retrieved by specifying the prefix.

For example, to get all objects using PHP we say:

$iterator = $client->getIterator('ListObjects', array(

'Bucket' => $bucket

));

foreach ($iterator as $object) {

echo $object['Key'] . "\n";

}

To get all objects in a folder, we say :

$iterator = $client->getIterator('ListObjects', array(

'Bucket' => $bucket,

'Prefix' => 'foo'

));

foreach ($iterator as $object) {

echo $object['Key'] . "\n";

}

2 / 2

Limit and page_size can also be specified with the iterator.While the prefix limits results to only those that begin with the specified prefix, and delimiter causes list to roll up all keys that share a common prefix into a single summary list result.

The purpose of prefix and delimiter parameters is to help you organize and then browse your keys hierarchically. To do this, first pick a delimiter for your bucket, such as slash (/), that doesn't occurin any of your anticipated key names. Next, construct your key names by concatenating all containing levels of the hierarchy, separating each level with the delimiter.

For access control visit:http://docs.aws.amazon.com/AmazonS3/latest/dev/s3-access-control.html

The way to do metadata search is by listing keys. Note we are not listing objects. When adding an object to a bucket, you can attach custom headers into the key. This lets you fetch the key names and headerswithout fetching the contents. Then as you retrieve all the keys sequentially, you can filter.

There is also a built-in pcre regular expression pattern search available on the keys:

// Instantiate the class

$s3 = new AmazonS3();

$bucket = 'my-bucket' . strtolower($s3->key);

// Get all filenames that match this pattern

$response = $s3->get_object_list($bucket, array(

'pcre' => '/pdf/i'

));

// Success?

var_dump(gettype($response) === 'array');

For details, refer here:http://docs.aws.amazon.com/AWSSDKforPHP/latest/index.html#m=AmazonS3/get_object_list

Monday, August 24, 2015

In the world of cloud computing and cloud based object storage, how to use buckets ? Are they the same as folders ? How does an object storage differ from a file system ?

Buckets cannot be simplified to folders although objects can be organized with folders. A bucket cannot have another bucket. A folder can nest another folder. A folder is a file system artifact. A bucket is not. A bucket can have many different objects within it. A bucket is an object storage artifact. Let us see what it means.

If we take the example of a folder called logs that has say log1.txt, log2.txt and log3.txt, then they exist in the bucket as three different objects with a prefix “logs/” and key names “logs/log1.txt”, “logs/log2.txt”, “logs/log3.txt”. The folder we see in our console is merely for logical organizational convenience within the bucket. It doesn’t have a data structure per se unlike the folder in a file system. Think prefixes when working with an object store directly instead of folders.

Objects can be moved from one folder to another but not from one bucket to another. If we are creating too many buckets, we are introducing artificial restrictions beforehand. It’s the same as partitioning a disk multiple times when a single undivided partition would do. Moreover, it would eliminate the need to copy or migrate data from one to others. A single seamless bucket is also better positioned to meet future needs than a dozen of them when all else remains the same. There is no loss of organization when there is one bucket or many.

Buckets are wide and deep. They sprawl over several thousand storage devices and span datacenters. In the world of Openstack, they power file systems. For the example when users are using Ceph, they may have no idea that the data they are putting in a filesystem is actually getting stored in the cloud with RADOS object store. The filesystem meets the users convenience and the habits. We organize based on files and folders and that habit is difficult to change. Moreover that habit serves us well across paradigms such as this. When using an object store directly, we should be mindful of the rich metadata associated with objects and metadata based access of objects.

As with metadata, permissions are also fine grained and available at the object level as well as the folder level. Folders can be made public or private. In a public folder, all the objects that appear within a public folder are available for viewing or downloading to anyone on the internet. If you want to browse a folder that is made public, you will get access denied because the folder is just a naming prefix for an object or group of objects. It doesn’t exist. A folder can be made public but it cannot be reverted to private. Instead you can individually mark the objects private within a public folder. Buckets have access permissions and versioning status and you can specify which region a bucket should reside in. Logging, versioning and event subscriptions can be turned on for a bucket. Cost allocation tags (key value pairs) can be assigned at the bucket level. You could create many buckets but you should exhaust the possibilities with something more granular than a bucket. For example, you could use naming conventions for objects.

Many different objects can exist within a bucket. They can be queried with autocomplete like jump feature of bucket browsing instead of paging through millions of items. Moreover, metadata attributes are available that can be used to label the objects in different ways.

An object store was designed with one of the founding principles to reduce the storage admin’s task of provisioning and maintaining storage. Anytime we are increasing those tasks, we are not doing something right.

Some more resources on object storage include the following:

A Beginner's Guide To Next Generation Object Storage - DDN

http://blogs.aws.amazon.com/security/post/Tx1P2T3LFXXCNB5/-Writing-IAM-span-class-matches-policies-span-Grant-access-to-user-specific-fold

The caveat is that even AWS documentation doesn't explain the usage and goes so far as suggesting that buckets be created homogeneous when in fact they are designed to hold heterogeneous content. Their blogs on the other hand are far more revealing

#codingexercise
Node GetMin(Node root)
{
Node cur = root;
while (cur && cur.left)
cur = cur.left;
return cur;
}

Sunday, August 23, 2015

RADOS and ViPR

In continuation of the document I wrote about Object store here, I want to bring up two emerging technologies and their role. RADOs is an object store which also allows BlockIO(iSCSI) gateway and NAS gateways. Together with Ceph for Openstack, it eliminates the storage silos that usually form with different protocols. In addition, it can support S3 and Swift APIs. This unified approach also helps with maintainability. Essentially the nature of any object store is to export a server’s local disk space into a single large datastore. Software that maintains this functionality saves data in binary form. As in a cluster computing, the number of nodes participating in this technology is arbitrary and hence the perception of seamlessly scalable storage. This can support billions of objects that are identified by their ID.

ViPR addresses a different point of unification. It brings the legacy gear typical of a datacenter into a single unified storage. It provides more management convenience that VMWare is known for. Behind its façade, it connects different legacy servers that maintains their protocol centric behavior. By protocols, I mean ftp, http, cifs, nfs, etc. Centralized automated management reduces storage provisioning tasks against each of these protocols.

How does RADOS work?

RADOS stands for reliable autonomic distributed object store. It consists of three different layers:

Object Storage Device: An OSD in RADOS is always a folder within an existing filesystem. Together the OSD form the object store. RADOS generates binary objects from the files to be stored and stashes it in the store

Monitoring Servers (MON) These form the interface to the RADOS store and support access to the objects within the store.

Metadata servers (MDS) MDS provides POSIX metadata for objects in the RADOS object store for CEPH clients.

Ceph is the file-system that accesses the object store in the background. Ceph is closer to the user while RADOS powers it. Without Ceph the data is not accessible to the user. However RADOS is not limited to being the backend of a filesystem. It can work with others.

ViPR can work with a variety of systems and a variety of services. It’s a one stop management utility for software as a service. While it is widely recognized that each storage platform can handle specific workloads, each comes with its own specific APIs, management and monitoring tools. ViPR eliminates this overhead and provides a common automatic provisioning and management portal.

ViPR enables automatic storage management for use by VMWare, Openstack and Microsoft clients. It provides access via a unified OPEN REST APIs across a dozen types of virtual storage arrays. Moreover, it’s multivendor support is 11extensible.

Both ViPR and RADOS seem to provide seamless unlimited storage. While RADOS provides new form of storage, ViPR can transform the existing storage infrastructure into a simple and extensible platform.
Bool hasNode ( node root, node target) {
If root == null return false;
If target == null return false;
If root == target return true;
Return hasNode (root.left, target) || hasNode (root.right, target);

}