Cluster computing

Sunday, July 28, 2013

Let us look at an implementation for the map showing the coffee stores in your neighborhood and your friends who had visited these same stores in the past day, month or year.

The APIs to retrieve store information in your neighborhood is given by the Starbucks location API. The friends information is given by the Facebook API. The location updates by your friends are made on Facebook through automatic push notifications including a proposed Starbucks application.
Different applications make Facebook posts on their customer's wall using the permission requested from them. These show up under Location updates on the FB profiles. The same can be queried for other facebook friends for a given location. A list of such friends who have the given location on their wall is collected and this should be rendered alongwith the store mentioned.
The Starbucks API provides the following services:
1) OAuth APIs : provides access tokens based on one or more methods to acquire them, relying on
a) password b) client id and user id c) authentication code, d) client credentials. Refresh tokens are also provided.
2) Loyalty API : The loyalty API provides Licenses with a way to integrate their Siebel loyalty systems
3) Location API : Provides methods to list all stores, stores near a geographic point, stores by specific ids, stores within a region.
4) Account API: such as to create US Starbucks account and non-US Starbucks account, to validate and recover credentials, to create account profiles and to update social profile data.
5) Card API: Provides customer's card information to keep track of their purchases. This gives information such as Card URL, Card details, Card balances, and the ability to reload a card.
6) Payment methods : Enables collection of sums from the customer using a payment method such as a credit card or PayPal
7) Rewards API: provides rewards summary and history information based on application user, points needed for next level, points needed for next free reward, points needed for re-evaluation etc.
8) Barcode API: The barcode API generates Barcodes to communicate with in-store scanners and applications.
9) Content API: These are used to retrieve localized contents for any web client. Content is stored and retrieved in site core.
10) Transaction History API: These are used to retrieve the history data for a specified user, for all cards.
11) eGift API: These are used to retrieve the themes associated with eGifts.
12) Device Relationship API: These are used to insert or update a device, to get reporting actions such as touch-to-pay to display the barcode for scanning and touch-when-done to report when completing the barcode for scanning.
13) Product API : to browse food, beverage, Coffee, eGifts etc.

Saturday, July 27, 2013

API Explorer for Starbucks and a Graph API implementation

Starbucks API are OAuth enabled. This means that they don't just grant access based on api keys but require an access token that is provided by an OAuth Provider. Starbucks APIs are available from Mashery that provides a redirect to Starbucks Authorization endpoint and this is where API users get their access token. OAuth enables one of four different workflows to get access tokens.
Implicit Grant - such as when a mobile application tries to get an access token from the authorization endpoint based on client id and user id.
Authorization Code grant - such as when a user login to an IIS hosted site and the user's browser is redirected to the Starbucks' authorization endpoint to get a one time short lived authorization code. The client can then exchange the code for an access token.
Credentials Grant - such as when a user provides his or her username and password for a token.
Client Credentials Grant - such as when an application from a secured kiosk or site provides context regardless of the user.
In building an explorer for Starbucks API, we will need to get an access token to make the API calls. Since this application that we call the API explorer enables API users to try out the different APIs based on input parameters and responses, we will choose either the client credentials grant or the implicit grant to retrieve an access token at push button demand. Both XML and JSON responses can be displayed in the text area panel of the API explorer. This is conceived to be very similar to the Graph API Explorer from Facebook.

Another application of Starbucks API could be a deeper integration with the FaceBook's location data. For example Starbucks customers would like to know which of their friends from FaceBook frequented the same Starbucks store the same day as the one they are at currently. Starbucks mobile application today maintains card history and rewards on their application. If they could push FaceBook location updates on purchases that they track with their mobile application at the store that they visit, then Facebook friends could see where each other have been on a given day. This could encourage more sales at the Starbucks store as friends try to catch up with each other and at the very least provides useful knowledge to the Starbucks coffee customer of who else has been doing the same at this store. Additionally Starbucks mobile application need not take the user to their Facebook page to view or post this data, but offer a tip or balloon notification of which of the application user's friends had been at this store and when, if any. Such tips are non-invasive, information only and enables the coffee experience to be an avenue for social networking. Interested users could be taken to a map that displays not just the stores but the Facebook friends that have visited that store in the past day, week or month.

Localizatioon and globalization testing of websites

usually referred to by their notations L10N and I18N, locale specific website rendering is a significant test concern both in terms of resources required for the testing and the time it consumes. The primary considerations for this testing is the linguistic, cosmetic or basic functionality issue in displaying information in the culture specific manner. Some languages such as German require around 30% more space while Chinese for instance requires around 30% less. Morever, right to left languages such as Arabic and Hebrew require alignments, proper indentations and layout. Since UI resources for a website are typically collected and stored in resx files their collation and translation is made easy with tools such as resgen.exe. However the content alone does not guarantee their appropriateness to the website rendering, hence additional testing is required. As with any variation of the website, a full test pass using functionality test and load test is incurred. These sites also require significant environment resources to be allocated, including culture specific domain name registrations and associated servers. Each such resource requires setup, maintenance and constant tracking in various measurement and reporting systems. Such tasks increase the matrix of the web testing. Fundamentally, these testings are rigorous, end to end and repeated for each locale. What would be desirable is to unify the testing for the common content and factor out the testing specific to the locale. By unifying the tests upstream for much of the content and their display, there are significant savings made in the test cost. Consider the steps involved in the culture specific testing today as depicted below. Each of them is a full iteration of a common content with repeated functionality and load tests even though the locale specific testing is focused on linguistic translation and cosmetics.
test-en-us : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
test-uk-en : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
test-de-de : setup ->deployment->functionality testing->load testing->translation and cosmetics->completion
If there were a solution that enables a common test bed for much of the redundancies such as below
-> linguistic and translation tests
test-neutral: setup->deployment->functionality testing->load testing -> layout, width, height, indentation checks from static resource checking
-> variations of locale for repeating the above.
This way, the redundancies are removed, testing is more streamlined and focused on explicit culture specific tasks.
Moreover, in the earlier model, test failures with one locale environment could be different from other local environment on a case by case basis. By unifying the resources and the operations, much of this triage and variations can be avoided. The blogposts on Pseudoizer can be very helpful here.

Friday, July 26, 2013

Technical overview OneFS continued

Software upgrade of the Isilon cluster is done in one of two methods:
Simultaneous upgrade - This method installs the software updates and reboots the nodes all at the same time. This does cause a temporary interruption of service in serving data to clients but it is typically kept under two minutes. The benefits are that system wide changes can be made without any data operations. This enables us to make changes without impacting the customer and can be considered safer even though the service is interrupted albeit temporarily.
Rolling upgrade - This method upgrades and restarts into each node in the cluster sequentially. The cluster remains online and there is no disruption of service to the customer. This is ideal for minor revisions but for major revisions of say OneFS code, it may be better to perform a simultaneous upgrade so that version incompatibilities are avoided.
The same holds true for an upgrade. Additionally, a pre-verification script is run to ensure that only supported configuration is permitted to upgrade. If the checks fail, instructions on troubleshooting the issues are typically provided. Upgrades can be invoked by the administrative interfaces mentioned earlier such as the CLI or the web admin UI. After the upgrade completes, the cluster is verified with a heatlh status check.
Among the various services for data protection and management in the OneFS, some are listed below:
InsightIQ : This is a performance management service. It maximizes the performance of your Isilon scale out storage system with innovative performance monitoring and reporting tools. A backend job called the FSAnalyze is used to gather the file system analytics data and used in conjunction with InsightIQ.
SmartPools is a resource management service which implements a highly efficient automated tiered storage strategy. It keeps the single file system tree in tact while performing the tiering of aged data. Recall that the SmartPool subdivides the large set of homogeneous nodes into smaller Mean Time to Data Loss (MTTDL)- friendly disk pools. By subdividing a node's disks into multiple, separately protected pools, nodes are also significantly more resilient to multiple disk failures.
SmartQuotas: is a data management service. This assigns and manages quota that seamlessly partition the storage into easily managed segments at the cluster, directory and sub-directory levels.
SmartConnect: is a data access service that enables client connection, load balancing and dynamic NFS failover and fallback of client connections. Connections target different nodes to optimize the use of cluster resources.
SnapShot IQ is a data protection service that takes near instantaneous snapshots while incurring little or no performance overhead. Recovery is equally fast with near-immediate on demand snap shot. Snapshot revert and delete are separate services.
Cloud management such as Isilon for VCenter is a software service that manages Isilon functions from VCenter. VCenter also comes with its own automatable framework.
SyncIQ is a data replication service that replicates and distributes large, mission critical data sets, asynchronously to one or more alternate clusters. Replication can be targeted to a wide variety of sites and devices and this helps disaster recovery. The replication has a simple push-button operation.
SmartLock is a data retention service that protects critical data against accidental premature or malicious alteration or deletion. It is also security standards compliant.
Aspera for Isilon is a content delivery service that provides high performance wide area file and content delivery.

Thursday, July 25, 2013

Technical overview OneFS continued

OneFS is designed to scale out as opposed to some storage systems that scale up. We can seamlessly increase the existing file system or volumes by adding more nodes to the cluster. This is done in three easy steps by the administrator:
1) adding another node into the rack
2) attaching the node to the Infiniband network
3) instructing the cluster to add the additional node
The data in the cluster is moved across to the new node by autobalance feature in an automatic coherent manner such that the new node will not be a hot spot and existing data gets benefit with the additional performance capabilities. This works in a transparent manner so that storage can grow from TB to PB without any administration overhead.
The storage system is designed to work with all kinds of workflows - sequential, concurrent or random. OneFS provides for all these workflows because throughput and IOPS scale linearly with the number of nodes present in the system. Balancing plays a large role in keeping the performance linear with capacity. Each node is treated the same as they are added and it's a homogeneous cluster. Since each of the nodes have a balanced data distribution and there is automatic rebalancing and distributed processing, each additional CPU, network ports and memory is utilized as the system scales.
Administrators have a variety of interfaces to configure the OneFS.
The Web administration User Interface ("WebUI")
The command line interface that operates via SSH interfaces or RS232 serial connection
The LCD panel on the nodes themselves for simple add/remove functions.
RESTful platform API for programmatic control of cluster configuration and management.
Files are secured by a variety of techniques :
Active Directory (AD)
LDAP Lightweight Directory Access Protocol
Network Information Service
Local users and groups.
Active Directory which is a directory service for the network resources is integrated with the cluster by joining the cluster to the domain. The nodes of the cluster are now reachable via the DNS and the users can be authenticated based on their membership to Active Directory.
LDAP provides a protocol to reach out to other directory services provider. So many more platforms can be targeted.
NIS is another protocol that is referred to as the yellow pages and provides a way to secure the users
And finally the local users and groups of a node can be used to grant permission to that node.
Cluster access is partitioned into access zones. Access Zones are logical divisions comprising of
cluster network configuration
file protocol access
authentication
Zones are associated with a set of SMB/CIFS shares and one or more authentication providers for access control.

Technical overview of OneFS continued

OneFS manages protection of its data directly by allocating data during normal operations and rebuilding data after recovery. It does not rely on hardware RAID levels. OneFS determines which files are affected by a failure in constant time. Files are repaired in parallel. As the cluster size increases, their resiliency increases.
Systems that use a "hot spare" drive, use it to replace a failed drive. OneFS avoids the use of hot spare drives and instead uses available free space to recover from failure. This is referred to as virtual hot spare and guarantees that the system can self-heal.
Data protection is applied at the file level and not the system level, enabling the system to focus on only those files affected by failure. Reed Solomon error codes are used for data but metadata and inodes are protected by mirroring only.
Further the data protection is configurable and can be appliedy dynamically and online. For a file protection involving N data blocks protected by M error code blocks and b file stripes, the protection level is (N + M ) / b, When b = 1, M members can fail simultaneously and still provide 100% availability.As opposed to the double failure protection of RAID-6, this system can provide upto quadruple failure protection.
OneFS also does automatic partitioning of nodes to improve Mean Time to Data Loss (MTTDL). If a 80 node cluster at +4 protection level is partitioned into four twenty node pools at +2, then the protection overhead is reduced, space is better utilized and there is no net addition to management overhead.
Automatic provisioning subdivides the nodes into pools of twenty nodes each and six drives per node. Furthermore, the node's disks are now subdivided into multiple, separately protected poolsand they are significantly more resilient to multiple disk failures than previously possible.
Supported protocols for client access to create, modify and read data include the following:
NFS Network file system used for unix/linux based computers
SMB/CIFS (server message block and Common Internet File System)
FTP : File Transfer Protocol
HTTP : Hypertext Transfer Protocol
iSCSI : Internet Small Computer System Interface
HDFS: Hadoop distributed file system
REST API : Representational state transfer Application Programming Interface.
By default only the SMB/CIFS and NFS are enabled. The root system for all file data is ifs Isilon OneFS file system. The SMB/CIFS protocol has an ifs share and the NFS has an /ifs export.
Changes made by one protocol is visible to all others because the file data is common.

Wednesday, July 24, 2013

Technical overview of OneFS continued

Locks and Concurrency in OneFS is implemented by a lock manager that marshals locks across all nodes in a storage cluster. Multiple and different kinds of locks referred to as lock "personalities" can be acquired. File System locks as well as cluster coherent protocol-level locks such as SMB share mode locks or NFS advisory-mode locks are supported. Even delegated locks such as CIFS oplocks and NFSv4 delegations are supported.
Every node in a cluster is a coodinator for locking resources and a coordinator is assigned to lockable resources based upon an advanced hashing algorithm. Usually the co-ordinator is different from the initiator. When a lock is requested such as a shared lock for reads or an exclusive lock for writes, the call sequence proceeds something like this:
1) Let's say Node 1 is the initiator for a write, and Node 2 is designated the co-ordinator. Node 3 and Node 4 are shared readers.
2) The readers request a read lock from the co-ordinator at the same time.
3) Coordinator checks if an exclusive lock is granted for the file.
4) if no exclusive locks exist, then the co-ordinator grants shared locks to the readers.
5) The readers begin their read opeations on the requested file
6) An exclusive lock for the same file that is being read by the readers is now requested by the writer.
7) The co-ordinator checks if the locks can be reclaimed
8) The writer is made to block/wait while the readers are reading.
9) The exclusive lock is granted by the coordinator and the writer begins writing to the file.
When the files are large and the number of nodes is large, high throughput and low latency become important. In such cases multi-writer support is made available by dividing the file into separate regions and providing granular locks for each region.
Failures are tolerated such as in the case of power loss. A journal is maintained to record changes to the file system and this enables fast, consistent recovery from a power loss or other outage. No file scan or disk check is required with a journal. The journal is maintained on a battery backed NVRAM card. When the node boots up, it checks its journal and replays the transactions. If the NVRAM is lost or the transactions are not recorded, the node will not mount the file system.
In order for the cluster to function, a quorum of nodes must be active and responding. This can be a simple majority where one more than half the nodes are functioning. A node that is not part of the quorum is in a read only state. The simple majority helps avoid split-brain conditions when the cluster temporarily splits into two. The quorum also dictates the number of nodes required in order to move to a given data protection level. For an N+M protection level, 2*M+1 nodes must be in quorum.
The global cluster state is available via a group management protocol that guarantees a consistent view across the entire cluster of the state of the other nodes. When one or mode nodes become unreachable, the group is split and all nodes resolve to a new consistent view of their cluster. In the split state, the file system is reachable and for the group with the quorum, it is modifiable. The node that is down is rebuilt using the redundancy stored in the cluster. If the node becomes reachable again, a "merge" occurs where the two groups are brought back into one. The nodes can rejoin the cluster without being rebuilt and reconfigured. If the protection group changes during the merge, files may be restriped for rebalance. When a cluster splits some blocks may get orphaned because they are re-allocated on the quorum side. Such blocks are collected through a parallelized mark and sweep scan.