Cluster computing

Friday, July 19, 2013

Corporate Information Factory Architecture

CIF operational systems topology:
The CIF architecture comprises of the data warehouse and operational data store as the core of the architecture. This is surrounded by metadata management plane where the operational systems reside Operational systems can include external operational systems, distribution, product, account and customer operational systems. These operational systems add data into data management core through transformations and integrations. Exploration warehouse datamart and data delivery also extract information from the data management core. The decision support interface may use these data marts while the transaction interface may use the operational data store directly. CIF consumers acquire the information produced via data delivery, manipulate it in datamarts, and assimilate it in their own environments. Outside this metadata management is the Information Services as well as Operations and administration.
The Information Services comprises of groups of items such as library and toolbox as well as the workbench. The Operations and administration involve systems management, data acquisition management, service management and change management.
Producers are the first link in the information food chain. They synthesize the data into raw information and make it available for consumption across the enterprise.
Operational systems are the core of the day to day operations. The operational systems are organized by the product they support. However, businesses are more customer oriented than product so that they can differentiate their offerings. CIF provides facilities to define how this data relates to a customer using rules that form the metadata.
Operational data usually stand alone so they have not been integrated. CIF synthesizes cleans and integrates the data before it is usable. CIF also acts a history store for the enterprise.
Next, integration and transformation consist of the processes to capture, integrate, transform, cleanse, re-engineer and load the source data into the data warehouse. Typically data is pulled from the operational systems as opposed to pushing. Configuration management and scheduling play a large role here. Typically these packages can be written with the knowledge of the source data and once written don't change too much. So they are good candidates for scheduled jobs.
Next the data warehouse plays the big role in CIF. It is often described as "subject-oriented, integrated, time-variant(temporal) and non-volatile collection of data" Sizes of up to terabytes of data are not uncommon for a data warehouse.
Next, the data management extends the data warehouse with archival/restoration, partitioning and movement of data based on triggering and aggregation. This is often in house development tasks.
Finally, the consumers extract information from this collection of data. The data-marts support these analytic requirements and they are made available via the decision support interface.
Also, the metadata enables metrics for measuring support.
Courtesy : Imhoff

OAuth

OAuth is a feature by which users can sign in with a third party intermediary authorization. For example, if you were at a online printing store and wanted to print photos from an album in another website, the printing store can retrieve the photos from that website on your behalf and you can continue with your print order, if you sign in to that website.It differs from login in that you don't have to repeatedly provide username and password for different websites. The sign in on one website can be reused for others.
The way this works is by providing users access tokens based on third party referral. This token grants a user access to resources. The token may expire every few minutes and may need to be refreshed. So a refresh token can be requested. Admins to OAuth provider can see which users have registered from which clients. They can also revoke access to users and clients.
OAuth has four different workflows for granting access: These are :
Implicit Grant – such as when the mobile application follows a redirect to the client application.
Credentials Grant – such as when the user provides username and password for a token
Client Credentials Grant – such as when admin applications from secured kiosks provide context regardless of the user
Refresh Grant – a client can retrieve an access token by exchanging a previous refresh token.

An access token request consists of the following
- a grant type aka authorization code
- code
- redirect URI
- client Id
while the response consists of
- a return type ( data or redirect )
- Access Token
- token type
- refresh type
- expires in how many seconds
- refresh token
- scope

The authorization request consists of response type, client id, scope and state. The response consists of code and state.

Thursday, July 18, 2013

This post talks about some business intelligence practices :

OLTP and OLAP processing have different systems due to several differences:

The first difference is that an online transaction processing is one in which the response time is critical and touches existing data. The Online Analytical Processing on the other hand is based on data accumulated over time such as for decision support.

The second difference is that the latter require a schema where they have a star or multi-level star or snowflake like design.

The third difference is that the latter require very different data structures such as bitmaps and conjunctive filters while the former uses B+ trees.

The fourth difference is that the databases in the latter required fast load of large data periodically.

The fifth difference is that the joins on the data were very costly to perform and the views need to be captured and persisted till the next update.

OLAP systems could be implemented with a CIF architecture or a Star-Kimball model:

The CIF architecture uses Entity Relationships and database normalization rules to build a normalized data model in its data warehouse. In the Star-Kimball model, a dimensional model is used where the transactions are split into facts or dimensions and arranged in a star like schema and if they are hierarchical in a multi-level star like schema.

The advantage of the dimensional model is that it is easy to understand and use. The joins are simpler and the users don’t need to know the source of the data or the data structure but can work off materialized views although the operating systems have to handle the complex transformations to maintain the dimensions.

The advantage with the normalization model is that it holds a lot of information and is flexible to changing business needs.

A day to day task may involve the task of preventing orphaned data. We prevent this by adding constraints and checks to the data. For example in a table that maintains employees and their managers, the column to denote the manager points back to another employee in the table. For the top of the organization, there is usually no manager. We represent this by making him his own manager. This is useful now because none of the employees can be orphaned when all the data in the manager column is checked for consistency. In general, we use constraints, checks, primary and foreign keys, surrogates and natural keys to enforce integrity of the data.

Wednesday, July 17, 2013

A quick look at some of the web technologies
web crawling : start with a list of URLs, visit them according to policy, find more and add them to the list.

Tuesday, July 16, 2013

Memory Manager

The memory manager components include a set of executive system services for allocating, deallocating, and managing virtual memory, a translation-not-valid and access fault trapper for hardware exceptions, and a set of kernel mode system thread routines as listed below:
The working set manager to handle global policies such as trimming, aging, and modified page writing.
The process/stack swapper that handles inswapping and outswapping.
The modified page writer that writes dirty pages in mapped files to disk.
The dereference segment thread which is responsible for system cache and page file growth shrinkage.
The zero page thread that zeros out pages on the free list/
The memory manager provides reserving and committing pages, locking memory, allocation granularity, shared memory and mapped files, protecting memory, copy on write, heap functions and address windowing extensions. A look at these in brief:
reserving and committing: reserved pages are not mapped to any storage. This is useful for large contiguous buffer. For example, when a thread is created, a stack is reserved. The committed pages are pages that when accessed, ultimately translate to valid pages in physical memory. Committed pages are either private and not share-able or mapped to a view of the section.
locking memory: Pages can be locked in memory in two ways:
1) Many pages can be requested by say a device driver and locked until it releases.
2) User mode applications can lock pages in their process working set.
Allocation Granularity This defines the system page size so that the risks associated with the assumptions about allocation alignment would be reduced.
Shared memory and mapped files Each process maintains its private memory area in which to store private data, but the program instruction and unmodified data pages could be shared. Shared memory is implemented via section objects
Memory is protected first by accessing all systemwide data structures and pools using kernel mode system components which user threads can't access.
Second, each process has a separate private address space which others can't access.
Third, in addition ti implicit protection, some form of hardware supported protection is also done.
And finally, shared memory section objects are protected via the standard access-control lists.
Copy on write page protection is an optimization the memory manager uses to conserve physical memory. Each new process that writes to a page will also get its own private copy.
A heap is a region of one or more pages of reserved address space that can be subdivided and allocated in smaller chunks by the heap manager.
Address windowing extensions is done by allocating the physical memory to be used, creating a region of virtual address space to act as a window to map views and mapping views of the physical memory into the window.

AWS implements REST API security with apikey and encrypting parameters.

Monday, July 15, 2013

WCF and WPF Fundamentals review

This post mentions the important contracts for each aspect of the communication
1. Service has three main parameters : Address Binding and Contract. Bindings can be of three different types: TCP/IP binding, http binding, net msmq binding.
2. MEX endpoint for metadata exchange. API Versioning using a standard workflow resulting in different actions taken.
3. Hosting can be of three types : IIS, Windows service, and Windows activation service. T
4. Use attributes such as ServiceContractAttribute, DataContractAttribute, and OperationsContractAttribute. Define exception handling via Fault attributes.
5. Use appcmd.exe to determine which instance is hosting your code.
6. Use appcmd.exe to configure site, application, virtual directory or URL.
7. Use TransactionScope for reliability.
8. Use MSMQ for queued calls. Use posion messages and dead letter queues to handle retrying.
9. Security can be none, transport or message.
10. Role based security features of .Net can be reused for service
11. Use PrincipalPermission and UseWindowsGroups for windows role based security.
12. Use certificates for encryption. Certificates have to be in the right store and not have expired.

ASP.Net page life cycle events
The following are the page events during life cycle
1. PreInit : check the IsPostBack
2. Init : raised after all the controls have been initialized.
3. InitComplete : signals end of completion
4. PreLoad : raised after the page loads view state for itself and all controls.
5. Load : The page object calls the OnLoad Method and for each child control
6. Control events: to handle individual controls
7. LoadComplete : Raised at the end of the event handling state.
8. PreRender : Raised after the page object has created all the controls
9. PreRenderComplete : Data binding occurs here
10. SaveStateComplete : Raised after view state and control state have been saved for the page and for all controls
11. Render: the Page object calls this method on each control.
12. Unload : Raised for each control and then for the page. In controls, use this event to do final cleanup for specific controls.

Reviewed from MSDN