Saturday, February 28, 2015


1 / 1
We review some of the tools and technologies in the stack used by DevOps Engineers. These include the following: 
  • 1) Linux  - This operating system is favored both because of its composability and flavors as well as the support from vendors. Today Ubuntu, Debian and Linux are used in many opensource development. Virtual Machines ship with this operating system as base for almost all end users. 
  • 2) Apache – This is a web server that has made it easy to serve applications. For a long time, LAMP (or Linux Apache MySql Python ) was considered the base for most web development. 
  • 3) MySQL- is a database server. Although its been a staple, its popularity among dev ops engineer grew on integration with python. 
  • 4) ZooKeeper - This is a software from Apache meant to aid the configuration, synchronization and naming of large distributed systems.  
  • 5) HDFS - is a distributed file system meant to run on thousand of nodes using commodity servers.  It tolerates hardware failures and is oriented for batch processing. It provides high throughtput and streaming data access. 
  • 6) HBase - is the NoSQL database typically used when hashing is more important than indexing.It's fault tolerant, flexible and supports near realtime lookups. 
  • 7) Storm is yet another Apache free and open source software. It is used for distributed realtime computations because it processes streams of data that can be unbounded. 
  • 8) Kafka This is also an Apache software and is used for publish-subscribe messaging. It can handle hundreds of megabytes of read and writes per second 
  • 9) Spark - is used for large scale data processing and is another software from Apache. It can combine SQL, streaming and complex analytics. 
  • 10) Oozie - is a workflow scheduler for Hadoop. It can manage Apache Hadoop jobs. It is essentially a Directed Acyclic Graph of jobs. 
  • 11) Cassandra is a database server and often used when deploying to datacenters. It also supports column indexes. 
  • 12) OpenTSDB is a scalable time series database that's very helpful to record events as they flow into the system and  for searching later. It runs on Hadoop and HBase. 
  • 13) NginX is another web server and helpful for high traffic websites. It comes with HTTP server, HTTP proxy and a load balancer. 
  • 14) Etcd is similar to zookeeper in that it is used for configuring and service discovery. Its considered to have high availability. 
  • 15) Puppet is also a configuration management utility but is considered more popular for interoperability between Windows and Unix flavors.  
  • 16) Mesos enables abstraction of CPU, memory and storage and runs on different machines which can be used for resource management and scheduling with its APIs. 
  • 17) Marathon - is a cluster wide initialization and control system for services that are deployed in containers such as Docker. 
  • 18) Docker - provides container technologies so that applications can be build, shipped and run. Dockerized applications are considered to be portable. 

Friday, February 27, 2015

We continue our discussion on the WRL research report on the Swift Java compiler.  We discussed building the IR. We now discuss the inter procedural analysis and Optimizations. The Swift compiler has numerous such optimizations which can be individually turned on or off. Customary analysis including class hierarchy analysis, type-propagation, type and field based analysis, escape analysis are applied and in addition, a new form of analysis called the field analysis is also used.  Recall that when building the SSA graph, we already came across common sub expression elimination. Values that load the field of an object will be candidates for CSE only if they access the same field of the same object and they have the same global store input. This simple rule was applied to reduce the number of nodes created. Besides the analysis, some optimizations include method resolution, method inlining, elimination of runtime checks, removal of unnecessary data dependence edges, stack allocation, and synchronization removal.
Several inter procedural optimizations are combined into a single pass and they are applied repeatedly because one optimization can enable further optimization For example, by inlining a method we may be able to tell the type of receiver of another method call, which in turn may enable that method to be inlined as well.
We first review class and method analysis. Swift contains modules that compute and cache useful pieces of information about classes and methods. The basic information about classes is the class hierarch.  It is built only as necessary to help resolve method calls. However, the Swift compiler can optionally use the Class Hierarchy analysis in resolving method calls. The compiler assumes that it is aware of the entire set of classes and that when it encounters a virtual method call, it can resolve to one of the classes or subclass implementation. If the CHA is turned on, all the known classes are loaded and a representation for the hierarchy is built. Then the compiler can scan all the subclasses of any class. If CHA was not being used, the compiler can only tell the subclasses from the classes with the usage of the keyword final.
#codingexercise
Double GetAlternateEvenNumberRangeSumTenthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumTenthPowerRootSquare();
}

Thursday, February 26, 2015

We continue our discussion on the WRL research report on the Swift Java compiler.  We were listing the various analysis and optimizations by the Swift compiler. We now go through each of these one by one. Let us first take a look at building the IR.
Like other compilers, Swift generates the IR from a method's byte code. First the byte code is scanned to determine the number of basic blocks and the edges between the basic blocks.  Then the variables that require phi nodes in the basic blocks are determined. This logic is called the phi node placement algorithm. The byte codes of each of the basic blocks are executed via an abstract interpretation. This allows us to maintain an association between the JVM local variables and the SSA values. It also allows us to determine the appropriate inputs for the new values and build the SSA graph.
During the creation of the SSA graph itself, various simple optimizations are performed. These include It replaces the array length operation with the allocated size of the array. It can eliminate bounds check if the index and length are constant in the current method. Profile information generated from earlier runs can also be used to annotate the edges of the CFG to indicate the execution frequency.  This is helpful for code layout and choice of traces by the trace scheduler. Without profile information, Swift has to guess these values from detecting loops.
Swift checks and maintains the invariant that there are no critical edges in the CFG graph. A critical edge is one with multiple successors and whose destination is a block with multiple predecessors. If these are not eliminated, they hinder the register allocations.  Moreover, it allows global code motion to the greatest freedom in placing values so as to minimize unnecessary computation in the final code.  An empty block is placed between the source and the destination to remove a critical edge.
#codingexercise
Double GetAlternateEvenNumberRangeSumEighthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumEighthPowerRootSquare();
}

Wednesday, February 25, 2015

We will continue our reading of the WRL Research report on Swift Java compiler. We now look at the organization of Swift and its optimizations. Specifically we look at the sequence of passes that convert the Java bytecode to Alpha machine code.The IR for a method is built based on the bytecode and annotated with the available profile information. A variety of interprocedural optimizations such as method inlining are applied. This is then followed by machine independent optimizations  A tree matching algorithm is then used to convert many operations to a lower and/or machine depenent form and to do peephole optimizations. Then further optimizations are applied to the machine dependent form.  These mostly include global CSE and code motion. Finally alpha code is generated by a sequence of passes that do instruction scheduling, register allocation, and code generation. All of these passes are applicable to the representation of the method in Swift IR.To summarize, the method bytecode is translated to IR and annotated with profile information, then with the information available from other methods and classes, interprocedural analysis and optimizations are carried out, then the machine independent and machine dependent optimizations follow  in that order. These include global CSE and code motion and finally instruction scheduling, register allocation and code generation in terms of alpha code is done. As we discuss each of these passes in details below, perhaps a list of all the optimizations in each stage could be enumerated so we know which pass and at what stage are we discussing the concerned optimization benefits. Interprocedural analyses involve alias analyses, class hierarchy analyses, escape analysis, field analysis, type propagation etc. Inter procedural optimizations  include method resolution, method inlining, method splitting, object inlining, stack allocation, synchronization removal. Intraprocedural optimizations involve bounds check removal, branch removal, conditional constant propagation, dead code elimination, global CSE, global code motion, loop peeling, null check removal, peephole optimizations,strength reduction, type test elimination. Machine dependent passes include conversion to machine dependent operations, peephole optimizations, sign-extension elimination. trace scheduling, register allocation, - live range splitting, biased graph coloring with rematerialization, block layout and code generation.

Tuesday, February 24, 2015

We will continue our reading of this WRL research report. SSA improves the effectiveness of peephole optimizations. SSA form exposes all the users of a value and not just a limited window.The Swift compiler therefore looks for patterns and applies the peephole optimizations where the peephole is the whole method. The values in the pattern can be spread anywhere in the control flow. As an example, the length property of an array can be replaced by the size argument to the allocation operation. This optimization can be applied no matter where the allocation and the length operations are used in the method. When the array is allocated at the beginning of the method and the length operations are generated for bound checks, this optimization is quite effective.
Let us now look at what constraints the SSA form may put on the Swift compiler. First and most importantly the register allocator must be quite effective and general purpose.It should minimize copies and spills.because they can overshadow the gains from SSA.Similarly the passes that do the final placement of values in blocks and the scheduling within blocks are quite important. The SSA form gives a lot  of freedom to do this final schedule so it must be made use of most.
Lastly one limitation mentioned by some groups is that the SSA form is difficult to use for restructuring optimizations, since it may be difficult to add required phi nodes correctly as code is restructured and new join points are created. However the authors did not encounter any such difficulty for all their optimizations. including many that modify the CFG. These included method inlining, method splitting, conditional constant propagation etc. Loop peeling, Loop unrolling or Loop interchange may be the only exceptions where it may be difficult.
#codingexercise

Double GetAlternateEvenNumberRangeSumSixthPowerRootSquare (Double [] A)

{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSixthPowerRootSquare();
}

Monday, February 23, 2015

We will continue our reading of this WRL research report. We now discuss the results of the Swift IR. Among the several optimizations facilitated by Swift IR, the rule for determining if values are equivalent and therefore can be eliminated as a common subexpression, works for all operations. Further, identical operations with equivalent inputs are equivalent because the global store inputs conservatively represent the memory dependences and all other true data dependences are explicit in the inputs to the values. Together these rules imply that the common subexpression elimination (CSE) can be applied to all values including memory reads. Values will be candidates for CSE only if they access the same field and they have the same global store input. This principle is applied even at the time of building the SSA graph.These rules are applied over and over again in many passes of the compiler.
The SSA graph makes all data flow and ordering dependences explicit. This is a tremendous advantage because optimizations now follow a single data structure. Although anti-dependences are not explicitly represented and must be generated during scheduling passes, the Swift IR enforces them via explicit checks instead.  In addition, Swift IR pins various operations to their basic blocks. All other values are explicitly moved around subject to scheduling constraints. Like CSE, scheduling is still simple enough that several passes in Swift do limited form of scheduling to see if some other optimization would be useful.
Method inlining is applied by requiring dependences between values in the inlined method and the values in the calling method. It just produces a slightly larger SSA graph. All the code optimizations such as CSE, code motion, and dead-code elimination, apply after method inlining without any artificial restrictions.
One of the findings from this study is that SSA improves the effectiveness of peephole optimizations.

#coding exercise
How can we differentiate if a given email alias is that of an individual or a distribution list:
def validateByLDAP(email, password, alias = None):
     ldap.set_option(ldap.OPT_X_TLS_REQUIRE_CERT, ldap.OPT_X_TLS_NEVER)
     ld.set_option(ldap.OPT_REFERRALS, 0)
     ld.set_option(ldap.OPT_PROTOCOL_VERSION, 3)
     ld.set_option(ldap.OPT_X_TLS,ldap.OPT_X_TLS_DEMAND)
     ld.set_option(ldap.OPT_X_TLS_DEMAND, True)
     ld.set_option(ldap.OPT_DEBUG_LEVEL, 255)
     ld.simple_bind_s(email, password)
     if not alias:
         raise "Invalid email alias"
     base_dn = "cn=users,dc=abcnet,dc=global,dc=abc,dc=com"
     filter = "(samaccountname="+ name +")"
     results = ld.search_s(base_dn,ldap.SCOPE_SUBTREE, filter, ['sn'])
     print repr(results)
     return results

returns '' for an alias that does not belong to an individual otherwise
returns an array with a match found 

Sunday, February 22, 2015

Today we continue our discussion on the method calls from the WRL research report on the Swift Java compiler. The Swift IR has operations to represent the different method invocations in Java such as class method calls, instance method calls, and interface calls. Class method calls are like procedural calls. Values representing method calls  all take the method arguments as inputs as well as the current global store. A null check on the receiver object is included before any instance method invocation,
Most method calls produce a  program result, as well as, producing a new global store. In order to foliow the rule that  each value in the SSA graph produce a single result, method calls produce a tuple which consists of the method results and the new global store. The two components are individually projected as required Values representing the method calls are followed by one or two select calls in the SSA graph.
If a method can cause an exception that is caught in the current method, the call actually produces a distinct global store for each normal and exception successor edge, since the state of memory be different in either case. In the most common case, when no exception of the method call is caught, no extra store is required since the exception exit will not access the global store.
There are arguments in the entry block for the incoming global store of a method call and in the case of an instance method, the first argument is the receiver. The compiler therefore automatically uses the null-check argument when a null-check of the receiver is required so there is no machine code corresponding to this check.
The handling of the store and null check arguments above is conducive to method inlining. To do so , the compiler builds the IR of the original method and the called method The CFG and the SSA graph of the called method are inserted into the CFG and the SSA graph of the calling method at the point of the call, with the arguments of the method body of the inlined method replaced by the corresponding arguments to the method call. These arguments include the global store and the null check of the receiver object This matching process establishes the correct dependences so that all the values within the inlined SSA graph can now be moved anywhere allowed by the usual dependence rules.
If a method has no write operations, then the operations that were below this call will be using the store from above this call because the method returns the original store in the return operation. Therefore there will be optimizations given that the method has no side-effects.

.
#codingexercise
Double GetAlternateEvenNumberRangeSumFourthPowerRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumFourthPowerRootSquare();
}

Saturday, February 21, 2015

Today we continue our discussion on WRL research report titled Swift Java compiler.To handle stores correctly, there are a number of other modifications to perform.The return value at the end of the method takes two values, the actual return value and the latest global store. This ensures that all the stores are correctly kept alive especially during dead code elimination passes. Also a throw value takes a value to throw and a global store as input. A method call takes a global store as input and produces a new store as output.
To synchronize an object, monitor_enter and monitor_exit operations were used. Similarly reads of volatile fields can be restrained by requiring that they produce a new store as output. In fact, the only time where the original store is reused and a new store is not produced is the set of operations that store into an object or an array example put_field, put_static and arr_store.This was done mainly because there is a control dependence on the immediately preceding control operation that determines whether it is executed such as an 'if' operation. Instead of representing these control dependences , a choice was made to instead pin these operations to their original block.
When describing a full method, there may be several edges and nodes to mention. Instead if we take an example where all the values are labeled and the inputs to the values are listed as named arguments.The exception path is not shown instead a block is shaded if it has an exception path. This is a convenience for readability.

Thursday, February 19, 2015

Today we continue our discussion on WRL research report titled Swift Java compiler. We saw the use of blocks with SSA graphs. We now look at the Swift type system. The type  of each value of a graph is computed as the SSA graph is built from the byte code. However, it is not always possible to recover the exact types in a Java program. Instead it is always possible to assign a consistent set of types to the values such that the effects of the method represented by SSA graph is the same as that of the original method. For some operation, the actual value types further specifies the operation. For each type T, Swift can keep track of these additional properties:
1) the value that is known  to be an object of exactly class T
2) the value is an array with a  particular constant size
3) the value is known to be non-null
By incorporating these types into the type-systems. we can therefore describe the properties of any value in the SSA graph by its type. Moreover, there can now be properties for different levels of recursive types, such as arrays.
We now review representing and ordering memory operations. Local variables are strongly scoped within the current method. Java has strong typing and no pointers, hence these variables cannot be altered by another method in the same thread or by another thread.  The SSA form comes useful to represent these.Values in the SSA graph can be considered to be temporary variables stored in a large set of virtual registers. It is only near the end of the compilation process that the register allocator chooses which values are stored in actual registers and which are spilled to the stack.
On the other hand, reads or writes of global variables or locations allocated from the heap must be represented explicitly as memory operations, since there may be many hidden dependencies among these operations.
Regardless of local or global, the compiler may have to ensure that the original ordering among memory operations is maintained even thought they may not be connected on the SSA graph.  As an example a store followed by a load may not have any scheduling dependence if the store does not produce a result that the load reads, however their ordering may still need to be preserved. In such cases, the ordering is enforced by Swift IR by having the store actually produce a result. This is called a global store and it represents the current contents of global memory. Since the store operations  modify the contents, it executes a copy on write.Additionally, each load operation takes the global store as an extra input. These data dependences between memory operations allow many optimizations to immediately generalize to memory operations. The exception is that the SSA store does not include the required dependences called anti-dependences between load and store operations. The Swift compiler enforces these dependences via explicit checks during scheduling.

#codingexercise
Double GetAlternateEvenNumberRangeSumSquareRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSquareRootSquare();
}

We continue to look at the improvements introduced by the Swift compiler. The explicit checks that the Swift compiler adds are for example that the load operation which takes a global store S as input is not moved down past a store operation that modifies. While the constraints are added, it is done to avoid the anti-dependences. The anti-dependences would require a new kind of edge in the SSA graph and avoiding them reduces the number of edges in the SSA graph.
If the current global store is different on two incoming edges of a block, then a merge block called the  phi node is inserted into the graph to merge the two global stores into a  new current store. Similarly the phi node of the global store when inserted at the top of a loop ensures that memory operations do not incorrectly move out of the loop. The addition of phi nodes to the SSA graph increases the number of nodes in the graph but they are not as drastic as the increase in the number of edges discussed earlier. Moreover, the benefit of adding a phi node is that memory operations are treated like all other values in the SSA graph.

Sunday, February 15, 2015

Today we read from the WRL research report - the SWIFT java compiler.
 Swift is an optimizing java compiler for the Alpha architecture. Swift translates java bytecodes to optimized Alpha code, and uses static single assignments form for its intermediate representation.  The IR allows for all standard scalar optimizations. The Swift compiler also implements method resolution and inlining, interprocedural alias analysis, elimination of Java runtime checks, object inlining, stack allocation of objects and synchronization removal. Swift is written completely in Java and installs its generated code into a high performance JVM that provides all of the necessary run-time facilities.
In this paper the use of SSA form which is a part of the SWIFT design was beneficial to many optimizations and there were hardly any that were difficult to implement in SSA. The paper also discusses properties of the intermediate representation, some optimization passes and several overall performance results. The Swift design is quite effective in allowing efficient and general implementations of numerous interesting optimizations.
Java is known for its object oriented programming and type-safe design. In addition, it has automatic memory management. Commercial Java systems do Just in Time compilation which can only do limited optimizations but there are more expensive optimizations that can improve the relative efficiency of code generated from Java. Java presents many challenges for a compiler attempting to generate efficient code because runtime checks and heap allocations introduce considerable overhead. Many routines in the standard Java library include synchronization operation but these operations are unnecessary if the associated object is accessed by a single thread. Virtual method calls are quite expensive if they cannot be translated to a direct procedural call. Moreover, they need to be resolved and there can be exceptions at runtime. The possibility of these exceptions further constrain many optimizations.
#codingexercise
Double GetAlternateEvenNumberRangeSumCubeRootSquare (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumCubeRootSquare();
}

We continue our discussion on the SWIF  Java compiler. We were discussing the challenges for a java compiler but Java also provides type safety. This simplifies much of the analysis for example local variables can only be assigned to, the field of an object cannot be modified by an assignment to a different field. etc. The Static Single Assignment graphs also makes most or all of the dependencies in a method explicit. Swift makes use of SSA and is written in Java. It installs into a high performance JVM that provides most of the necessary run time facilities.

First we look at some of the intermediate representation. A method is represented by a static single assignment graph, embedded in a control flow graph. The SSA graph consists of nodes called values that represent individual operations. Each value has several inputs from previous operations and a single output. That's why the node can be identified by this output. In addition, each node indicates the kind of operation that the value represents as well as any extra information required for the result. There are about 55 basic operations which include the usual arithmetic, bitwise logical and comparision operators etc. They also include operation that manipulate program flow, merging values, accessing the contents of arrays and objects, and allocating new objects and arrays from the heap. Some object oriented computations include instance of computations, virtual method calls and interface calls and java specific such as lcmp and fcmpl bytecodes.

The Swift IR breaks out the various runtime checks such as null checks, bound checks, cast checks etc. These operations cause a runtime exception if the checks fail. Since these checks are distinct, other operations allow the Swift compiler to apply optimizations to the checks, such as using common sub-expression elimination on two null checks of the same array.  Swift also has a value named init_ck that is an explicit representation of the class initialization check that must precede some operations.

We will continue our discussion on this WRL research report. Swift will eliminate redundant initialization checks during optimization. In addition, Swift's underlying JVM replaces an initialization check by a NOP after the first time that it is encountered.The proposal of static single assignments enables more checks than would have been previously possible.

#codingexercise
Double GetAlternateEvenNumberRangeCubeRootSumCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeCubeRootSumCube();
}

Today we continue our discussion on this WRL research report. The Swift IR has about a 100 machine dependent operations which can be mapped to Alpha instruction set. When the swift makes two passes, it first converts the machine independent instructions into machine dependent in the first pass and in the second pass does instruction scheduling, register allocation and code generation to operate directly on the SSA graph.

The SSA graph represents the use-def chains for all variables in a method, since each value explicitly specifies the values used in computing its result. When building the SSA graph, the def-use chains are formed as well and kept up to date. Thus the graph knows that the optimization can at any stage directly access the users of a particular value.

Each method is an SSA graph embedded within a control flow graph. Each value is located in a specific basic block of the CFG, although various optimizations may move values among blocks or even change the CFG. Each method's CFG has a single entry block, a single normal exit block, and a single exceptional exit. Parameter information is in the entry block and the return value is in the exit block All blocks end whenever an operation is reached that has two or more control exits. The use of extended basic blocks although common in some Java compilers is discouraged because we don't want special processing for these blocks and moreover they can become quite complex.

#codingexercise
Double GetAlternateOddNumberRangeCubeRootSumCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeCubeRootSumCube();
}

Saturday, February 14, 2015

Today we continue reading from the WRL research report. This time we review the benchmark for opening and closing a particular file. The tests were performed with flat file and nested path file.  The UNIX derivatives were consistently faster than Sprite. The results also showed that in the remote case Sprite is also faster than SunOS, but slower than Ultrix. In any case, this benchmark did not explain the previously discussed anomaly. Another benchmark that was studied involved create-delete  of a temporary file. Data is written to the file on create and read from the file prior to delete. Three different sizes of data were transferred, 0, 10kbytes and 100Kbytes. This benchmarks highlighted a basic difference between Sprite and UNIX derivatives In Sprite, short lived files can be created, used and deleted without any data ever being written to disk. Information only goes to the disk after it has lived at least 30 seconds. In Unix and its derivatives,  the file system appears to be much more  closely tied to the disk.  Even with no data written in the file, the UNIX derivatives create and delete the file.  This benchmark also explains the anomaly involving the poor performance of DS3100 Ultrix on the Andrew Benchmark. The time for creating an empty file is 60% greater in DS3100-Ultrix-local than in 8800-Ultrix-local, and the time for a 100K byte file in DS 3100-Ultrix-Remote is 45 times as long as for DS3100-Sprite-Remote. This relative poor performance may be attributed to slower disks possibly due to NFS writing policy, which requires new data to be written through to the disk when the file is closed.  Note that the DS3100-Ultrix-Remote achieves a write bandwidth of only about 30Kbytes/sec.
Lastly SunOS 4.0 showed a surprising behavior. The time for  a file with no data is 66ms but the time for a 10Kbytes is 830 ms. It seems that when the file size jumps from 8 to 9 kbytes, it jumps to 800-ms range.
This concludes our discussion on this WRL Research report.
#codingexercise
Double GetAlternateEvenNumberRangeSumSqrtCube (Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumSqrtCube();
}

Friday, February 13, 2015

We continue our study of the WRL research report on the evaluation of the slowness of Operating systems as compared to the improvements in the hardware. We were evaluating one benchmark after another on a set of chosen RISC and CISC machines and their corresponding OS flavors. The last benchmark we were talking about was the read from File Cache. This benchmark consists of a program that opens a large file and reads the file repeatedly in 16kbyte blocks.The file size was chosen such that kernel copied the data from the file cache to a buffer in the program's address space.  It would be large enough that the data copied over was not resident in any hardware cache. Thus it measured the cost of each copy and entering the kernel. The receiving buffer was the same buffer and it was likely to stay in cache.The overall bandwidth of the data transfer was measured across a large number of kernel calls. The results indicated that the memory bandwidth was the same as in the previous benchmark involving the block copy. The M2000 RISC/os 4.0 and the 8800 Ultrix 3.0 had the highest throughput and the Sun4 and Sun3 had the lowest. Even though the latter had faster processors, it didn't seem affect any increase in memory bandwidth.  This was clearly visible by computing a metric as the uncached throughput and dividing by the MIPS rating . The memory intensive applications do not scale on these machines. The memory copying performance actually strictly drops with faster processors, both for RISC and CISC machines. The only other observation from the results that was significantly different from the previous file operation benchmark was that the Sun4 does relatively better due to its write back cache.
The next benchmark is modified Andrew Benchmark. This benchmark involves copying a directory hierarchy containing the source code for a program, stating every file in a new hierarchy, reading the contents of every copied file, and finally compiling the code in the copied hierarchy. In this case the program was compiled with the same compiler for a machine called SPUR and then run on target systems.The benchmark runs in two different phases - the copy phase which comprises of the copying and scanning operations and the other phase is the compilation phase. Since this tests the filesystem, it  matters whether the disk is local or remote. Hence the test was run in both configurations. In each of the remote cases, the server was the same kind of machine as the client. The file system protocol for remote file access was NFS.
The results from the benchmark runs showed several notable trends. First, no operating system scaled to match the hardware speedups. Second, Sprite comes out consistently faster than Ultrix or SunOS for remote access. NFS based RISC workstations slowed down 50% relative to local access.  Sometimes remote access was faster for some machines than local although this could not be explained. The DS3100-Ultrix combination was also somewhat slower than would have been expected.
#codingexercise

Double GetAlternateOddNumberRangeSumsqrtCube (Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSumSqrtCube();

}

Thursday, February 12, 2015

Today we continue to read from WRL Research report.
The fourth benchmark uses the bcopy (block copy) procedure to transfer large blocks of data from one area of memory to another. This doesn't necessarily involve the operating system but each may have their own implementation of the procedure. Generally they differ on cache organization and memory bandwidth and hence this is a good metric to evaluate the performance. The tests were run on configuraitons with two different block sizes. In the first case, the blocks were large enough and aligned properly to use bcopy in the most efficient way but small enough that both the source and the destination fit in the cache. In the second case, the transfer size was bigger than the cache size, so cache misses would occur continuously. In each case, several transfers were made between the same source and destination, and the average bandwidth of copying was measured.
The results showed that the cached bandwidth was largest for M2000 RISC/os and the 8800 Ultrix an d it progressively reduced for the Sun4 and Sun3 machines.  The uncached bandwidth also shows the same gradation on the bandwidth.This implies that even with faster processors had no role in improving memory bandwidth.  Thus memory intensive apploications are not likely to scale on these machines. In fact, the relative performance of memory copying drops with faster processors across both RISC and CISC machines.
The next benchmark we consider is the read from file cache.  the benchmark consists of a program that opens up  a large file and reads the file repeatedly in 16 kbyte blocks.For each configuration, a file size was chosen such that it would fit in the main memory file cache.Thus the benchmark measures the cost of entering the kernel and copying the data from the kernel's file cache back to a buffer in the benchmark's address space.

We will continue with this post shortly.

Wednesday, February 11, 2015

Today we continue to read from WRL research report.We were reading the report on why operating systems werent getting fast with improvements in hardware. We were comparing M2000, DS3100, Sun4, 8800, Sun3, MVAX2 etc.The operating systems were Ultrix, SunOS, RISC/os, and Sprite.
The first benchmark measures the cost of entering and exiting the kernel. By making repeated getpid calls, it was found that the Sun3 and MVAX2 ultrix took the most time. M2000 RISC/os and DS3100 flavors took the least time. There was also a metric called the 'MIPS Relative speed'. What this metric indicates is how well the machine performed on the benchmark, relative to its MIPS rating and to the Sun3 time for kernel entry-exit. A rating of 1.0 on this scale would mean that the machine ran as expected A rating less than 1.0 means that the machine ran the benchmark slower than expected. and viceversa. The RISC machines turned out to have this rating lower than 1 while the others were close to 1.
The second benchmark is called cswitch. It measures the cost of context switching, plus the time for processing small pipe reads and writes. The benchmark operates by forking a child process and then passing one byte back and forth between parent and child using pipes. Each roundtrip costs two context switches and one kernel read and one kernel write operations. Here too the RISC machines did not perform well except for the DS3100/Ultrix combination.
The third benchmark exercises the select kernel call. It creates a number of pipes, places data in some of those pipes, and then repeatedly calls select to determine how many of the pipes are readable. A zero timeout is used in each select call so that the kernel call never waits. Three configurations were used. The first one was  a single pipe with no data. The second one involved ten pipes all empty and the third one used ten pipes all full with data.  Again the RISCs did not perform well. The RISC/os's emulation of the select kernel call seemed faulty causing a timeout of 10ms even if the calling program specified immediate timeout.

Tuesday, February 10, 2015

Today we are going to read from the WRL Research report "Why aren't operating systems getting faster with hardware ?" WRL research laboratory tests their ideas by designing, building, and using real systems. This paper was written in 1989. It evaluated several hardware platforms and operating systems using a set of benchmarks that test memory bandwidth and various operating system features such as kernel entry/exit and file systems. The benchmarks are mostly micro-benchmarks in the sense that each one measured a particular hardware or operating system feature. Such benchmarks can suggest the strength and weaknesses of a system but not indicate how good a system is. However, one benchmark was used to assess the overall speed of the operating system by testing a variety of file system features. The author finds that the different hardware platforms show significant speedup but the operating system performance of RISC workstations does not keep up. The benchmarks that were used in this study suggested two possible factors - one was the memory bandwidth that did not seem to scale to the processor speed and the second is the file system, some of which require synchronous disk I/Os in common situations.Here the processor gets faster but the disks don't. The hardware platforms used for the benchmarks include an abbreviation for each platform, an approximate MIPS rating, and an indication of whether the machine is based on RISC or CISC  processors.  The hardware used were MIPS M2000 (M2000), DEC station 3100 (DS3100), Sun-4/280 (Sun4), VAX 8800 (8800), Sun-3/75 (Sun3) and Microvax II (MVAX2). The first three are RISC machines and the next three are CISC machines.The operating systems used were Ultrix, SunOS, RISC/os and Sprite. Ultrix and SunOS are the DEC and Sun derivatives of Berkeley's 4.2 BSD Unix.  RISC/os was developed for the M2000 and Sprite was an experimental operating system at UC Berkeley. Some of the differences between SunOS 4.0 and 3.5 used with the Sun machines included a major restructuring of  the virtual memory system and file system. With these operating systems and hardware, a set of benchmarks were studied - the first of which was the kernel entry or exit. This measures the cost of entering and leaving the operating system kernel. It does this by repeatedly invoking the getpid kernel call which returns the callers process identifier. The average time for this benchmark ranged from 18 microseconds on M2000 to 207 microseconds on MVAX2. The Sun machines also showed higher average time than DS3100 and 8800. The hardware performance was also expressed in a number relative to their MIPS rating by taking the ratio of the Sun3 time to the particular machine's time and dividing that by the ratio of the machine's MIP rating to the Sun 3's MIPS rating.

Monday, February 9, 2015

Today I would like to post about Django applications and SSO integration. In SSO Integration, there's usually a SAML exchange between an identityProvider (federated) and a Service provider (Django App). The SAML assertions are authentication assertions and attribute assertions. There's usually a protocol (SOAP over http) and binding (how messages are exchanged) that defines the way SAML asks for and gets assertion.  In the Django implementation, there are a few things we need to configure to get it to talk to the idP.
First it must ask the idP for a metadata configuration. This is usually a JSON script.
The SP must implement  both metadata URLs and SSOLogin URLs. The former is used to describe what  the service provider SAML and the latter is used to implement what the callback/relay for the login to complete.
The Django app implements two methods in its view - one to call the auth.login() and the other to handle the callback from OKTA.
Note however that the Django app implemented this way handles SSO but not OAuth.
By the way, the integration of SSO with Django application is something I've learned from a colleague at my work place.
Typically OAuth is for API and SSO is for portal.
#codingexercise
Double GetOddNumberRangeSumPower ()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.OddNumberRangeSumPower(n,m);
}
We now look at an API Gateway. This is both a central authenticator and an aggregator. This is typically deployed in a demilitarized zone. There are different layers  and different places at which the gateway can apply. For example, it can apply to different protocols such as FTP, POP, HTTP etc and can  be applied to a webstack hosted locally or by a third party.  An HTTP Proxy is a typical example for an API gateway to a REST based web stack. An API gateway can be inside a CDN filtering traffic before it reaches the servers. It can be a proxy hosted by a third party before the traffic reaches the web service. It can be a dedicated machine which can be in our own cloud or infrastructure. It can be even within the application stack if the API Gateway can be a module that can be invoked within the application.

Wednesday, February 4, 2015

Today we read the paper the Gold project by Barbara, Molina and Mehrotra. The purpose of this project was to develop a set of tools that interoperate with unstructured and semi-structured data and services. Its a mailer software that allows users to send and receive messages using different media. (e-mail, faxes, phones), efficiently store and retrieve these messages and access a variety of sources of other useful information. It solves the problem of information overload, organization of messages, and multiple interfaces. The mailer doesn't provide a relational like query facilities but it provides a database system interface. What that means is the querying is convenient from programmability. The reason emails were chosen for the kind of tools they develop was that email are ubiquitous and it provides a convenient form of data.  Moreover, messages need to be stored or filed. Most mailers have primitive filing facilities and stored messages are seldom found. By its sheer volume, variations in size and number, and organizations, emails are often not easily retrievable. This paper was written prior to the indexing and full text search that's now common on the operating system and mailing software. At the same time, this software uses a mail database. Each stored message is viewed as a tuple with fixed fields and treated as structured and semi-structured sequences of words. The paper mentions that the unstructured model is more appropriate for message searches. and that it can also encompass a wide variety of message formats, even unknown formats. They also improved the search  by introducing operations that are different from sql and relations. Another goal was to allow the use of existing interfaces for electronic mail. The project also provided a graphical user interface but this added convenience was not just limited to search and UI. By abstracting the messages and putting them in a store, a user now could search based on words and they would appear as a set of virtual messages. When the user responds to any of the virtual cards, the corresponding email address is looked up and then the mail dispatched by the mailer.  The mailer also provides a flexible model for the stored objects. This is another reason why they favor the unstructured approach as opposed to looking them up based on customary fields. This paper details the indexing issues encountered  in designing the mailer.
#codingexercise
Double GetAlternateOddNumberRangeSumPower ()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSumPower(n,m);
}
We continue our discussion on the mailer. The architecture of the mailer is described as follows:
Users interact with a GUI, that issues commands to read new mail messages and retrieve others from database. When a user decides to store a new message, the user interface passes it to a preprocessor which breaks it up into tokens  and gives it to the index engine. The preprocessor can also receive messages directly for parsing and subsequent indexing.It produces a canonical file that is used by index engine to create the appropriate data structures. Messages are stored in an object store or database. The user interface also submits queries to index engine which results in a list of matching file identifiers. The user interface retrieves the objects for displaying. The index engine can receive queries and storage requests from many different user interfaces and hence implements concurrency control.  The message that is indexed has headers and body. The user sees it as a bag of words. Alternatively, the user may want to view the message as a semistructured object with separate fields. This distinction enables different type of queries for the two The first enables queries involving relative position and the second involves that only in same segments.
#codingexercise
Double GetAlternateOddNumberRangeProductPower ()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductPower(n,m);

}

We will continue our discussion with the challenges encountered during indexing. We noticed that the  user interface allows messages to flow to the preprocessor which tokenizes the words in the message. In addition, news feed can directly flow into the preprocessor.  The preprocessor then gives these messages to the indexing engine.  which interacts with the user interface for querying and retrieving based on users searches. The index engine also maintains its index structures so that the querying and retrieving can be efficient. In addition, the front end has access to an object store.  The messages themselves comprise of header and body. Gold mailer handles structured fragments within a document as well as treat the document as a bag of words. The index engine maintains attributes for each keyword encountered. This is represented in a relational table as keyword, document_id, position and header. The document_id points to the indexed object while the position points to where the keyword occurs within the document. The header attribute is for referencing the header if present. This table is used only by the indexer. The query engine has no knowledge of this internal data structure. The values in any of the cell of this table of attributes doesn't have to be discrete or atomic. This comes in useful to the indexer.  At the same time the query engine can differentiate between words by their position as hinted in the query. For example, if the engine sees subject lunch and time as three words to be searched where they appear consecutively, it can do so by setting the positional difference between the words to be zero. This differentiates this particular query from other queries where the three words may co-occur at different positions in the document.
#codingexercise
Double GetAlternateOddNumberRangeProductCube ()(Double [] A, int n)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCube(n);
}

Furthermore, the query engine can select from the table mentioned above where the header field is subject to look for the occurrence of the three keywords.  A mail message can be considered as a collection of labeled fragments and the processor will report the word subject as the label for the words and provide byte offsets of the words within the fragments. No matter whether the structured fields are used or not, the indexing engine works the same way. The index engine stores a variety of documents and a user may be interested only in documents of a specific type. 

The Gold Mailer supports the feature of message annotation. When a new message is to be stored, the user interface allows the user to enter additional words for indexing. For instance, the user may want to associate  the words budget and sales with a message even though these two words don't appear in the message itself. The user interface appends these added keywords to the end of the message, and the index engine does not treat them differently. Mailers usually provide the capability of defining folders. Messages flow into folders for organization. Gold mailer uses annotation to implement folders. This means that a virtual folder can also be created via message annotations but its not the same as an actual folder. An instance querying the system for all messages that container the keyword database retrieves messages that are annotated by the keyword as well as those that contain the keyword somewhere in the text. Folders and annotations give the user enough flexibility to organize their mails.
Next we look at their usages. Users may choose to use folders for redirecting the emails into folders that are classified appropriately. This gives them the flexibility to look for emails in narrower domains. But its merely a convenience for the user. The annotations are used on the other hand as labels. This allows both system and user to search.
Not all messages have the same format. The Gold mailer deals with messages that come from fax and scan. These are run through the optical character recognition server. This software with the help of an online dictionary recognizes as many words from the electronic document as possible and creates a file with the words.  Even a handful of keywords are enough to be recognized for indexing. The keyword file and the image are both stored in a fax directory that's owned by the addressee. The same folder can be shared. The Gold mailer also sends out faxes using the same media. 
Messages can be composed and received using both text and images and through a variety of media.
The query language was developed  with a simple and flexible message model. The goal was to keep it easier to specify. For example, mail friend Princeton command would search through the database for objects using those words friend and Princeton.
The Gold mailer expects most queries not to involve proximity searches. If proximity is required, it can be specified by grouping together the terms involved. The user interface allows the user to incorporate new messages, send messages using the information stored in address cards, edit address cards, and pose queries to the database to retrieve messages.

Tomorrow we wrap up our discussion on this project. 

The information stored in the index engine is viewed, as mentioned,  as a tuple of keyword, document identifier, position and header. The index engine supports the following three operations : 
insert document : this inserts all the records pertaining to a specific document 
delete document : this deletes all the records pertaining to a specific document
retrieve query : this command instructs the engine to find all documents that satisfy the query and return the list of names to the mailer. 
This structure and operations enable efficient evaluation of queries which is critical to the mailer.

The index engine cannot be considered as a warehouse where insertions are batched at low time and deletions are never done. In that sense, the mailer provides newly inserted documents immediately.  Index reconstruction is considered maintenance and not part of the workflow.

The file organization required for the mailer is dependent on the types of queries and their space requirements.  The queries are keyword based lookups and don't involve ranges between keywords.  Therefore the keywords are indexed and the records are stored as a hashed file. For each keyword, a list of positions where the keyword appears is maintained resulting in efficient disk access as opposed to keeping a record one for each position. If this reminds us of the BigData principles, it would behoove us to note that this paper was published in 1993.

The records corresponding to a document in a  hashed file are chained together forming a list, referred to as the next key list for the document.  This enables easier deletions.

When the records are inserted, they are not directly placed in the hashed files because the number of keywords in a document can be arbitrary. Instead an overflow list is maintained one per keyword in a block or more of main memory and the records are inserted at the tail. The records of the document therefore appear contiguously in these lists. Compare this to the buckets where the records are kept contiguously based on keywords and not the same document. When documents are deleted the records are not deleted one by one, instead blocks can be returned. This way the mailer avoids having to request n blocks for n keywords. 

Concurrency control has to make accommodations for the fact that the insert and delete operations may not just be user invoked but are also background tasks. This implies that operations can just be sequential because the user operation may then have to wait till the background job completes. Therefore locks are introduced and these are taken at page level.  Also failures can occur from processes and similar to concurrency, we cannot rollback all operations since background jobs may have substantial loss of work. This is overcome by restarting where the jobs were prior to the failure. Logging is supported for both physical page level operations as well as logical inserts and deletes.


This system shows how to organize the mail messages in a way that retrieval is efficient as opposed to user having to browse the previously created folders. As an extension, it could be made to index not only mail messages but library catalogs and databases. This mailer software also intended to investigate improvements in accessing data from other information sources such as file systems.
Today I will continue to discuss the AWS Security mechanism. We were discussing Server Side Encryption.An S3 object can be passed to the server with a per-object key  over SSL and encrypted on the server side before being put into the S3 bucket. The client side encryption is also possible so long as the client maintains its keys and encryption of objects without sending or sharing the keys.
Even data maintained on file systems or database can be encrypted. Both databases and file systems can provide transparent encryption. Whole disk or file level encryption is possible. With AWS, volumes can be encrypted too. Snapshots of encrypted volumes can also be taken. Optionally S3 bucket access can be taken and access logs to S3 can be taken. Hardware Security modules in the virtual private cloud enable to manage the AWS provisioned HSM. Customers can be single tenants in these VPC. If we are deploying a service, we can isolate our service, simplify security groups and forensics. We can also connect two VPCs in the same region bridged by routing table. Services that have APIs can also use logging. Logging can be enabled with CloudTrail. CloudTrail can monitor who made the API Call, when was the call made, what was the API call, and what were the resources that were acted upon etc. CloudTrail event collection can be configured in JSON just like a policy. CloudTrail partners with some of the major log reading, parsing and indexing vendors. To summarize the security recommendations, we should turn on Multi factor authentication for our root account, create IAM users, groups and policies, and never use the root account API keys and we should scope limit our policies.  In addition if there are any geographical requirements such the whitepapers on Auditing, Logging, Risk, Compliance, and Security for say Australia, then those should be followed as well. Lastly, the IAM groups and security is one thing and user managing their resources is another. The tools available for the user are to chain their resources in a way that different accounts can have access or to enable signed provisioning of their resources for granting access to specific people outside the users and groups. In the latter case, the user is not really sharing the access keys and at the same time not granting public access to the read or write. This method is also favored in cases where the s3 resource needs to be downloadable via the http. In addition, signed access can also be time constrained so that the access can be revoked on the expiration of the duration specified. This is convenient for revokes. And provides ability to not have to do bookkeeping of the recipients.

Tuesday, February 3, 2015

Today we will discuss AWS Security as read from the slides of James Bromberger at AWS Summit. In this he describes not only the security considerations for users of AWS but also what AWS undertakes. He says that their primary concern is compliance while the audiences' is account management, service isolation, visibility and auditing.  AWS concerns itself with securing the physical assets such as facilities, infrastructure, network, and virtualization. The customer secures the operating system, application, security groups, OS Firewall, Network configuration and Account management. AWS compliance is actually available for everyone to see on their website. It showcases certifications and approving industry organizations. On the other hand customers have primarily been interested in securing accounts - which in a way are the keys to the kingdom. Identity and Access Management consequently enable users and groups, unique security credentials, temporary security credentials, policies and permissions, roles as well as multi-factor authentication (MFA). Recommendations for account security involve : securing our master account with MFA, creating an IAM group for our Admin team, creating IAM users for our Admin staff as members of our Admin group and even turning on MFA for these users. Enhanced password management with expiry, reuse check, and change on next log in is also available.
Next we look at temporary credentials. These are used for running an application. We remove hardcoded credentials from scripts and config files, create an IAM role and assign restricted policy, then we launch the instance into a role. AWS SDKs transparently get temporary credentials.
IAM policies are applied with the least privilege policies. This means the resources will be qualified and the actions will be incremental in privileges granted. Policies can have conditions which can restrict access even further and this is good. AWS has a policy generator tool which can generate policies but there;s even a policy simulator tool that can be used to test it.
Another set of credentials we use is the access key and secret key These are used to sign requests. and to make API calls. They are generally issued once and the next time a new set is reissued.SSL is required for all traffic because data is exchanged and we want it to be encrypted. Even database connections and data transfers over them are to be encrypted.
In addition, AWS provides server side encryption using AES 256 bit that is transparent to customers. Keys are generated, encrypted and then stored using a master key. and the generated key is used to encrypt data. 

Sunday, February 1, 2015

Command line tools for object storage such as S3cmd and awscli provide almost all the functionality required to interact with objects and storage. However, the use of SDK enables integration with different types of user interfaces: For example, if we want to get ACLs, we use :
  public function get($bucket, $key = null, $accessKey=null, $accessSecret=null){
             if ($bucket == null) return array('Grants' => array());
            $client = $this->getInstance($accessKey, $accessSecret);
            if ($key != null){
             $acls = $client->getObjectAcl(array(
                  'Bucket' => $bucket,
                  'Key' => $key));
             if ($acls == null) $acls = array('Grants' => array());
             return $acls;
            }
            else {
             $acls = $client->getBucketAcl(array(
                  'Bucket' => $bucket));
             if ($acls == null) $acls = array('Grants' => array());
             return $acls;
            }
  }
If we want to set the ACLs we could use:
 // 'private', 'public-read', 'project-private', 'public-read-write', 'authenticated-read', 'bucket-owner-read', 'bucket-owner-full-control'
  public function set($bucket, $key, $acl, $accessKey=null, $accessSecret=null){
    $client = $this->getInstance($accessKey, $accessSecret);
    $result = $client->putObjectAcl(array(
    'ACL'    => $acl,
    'Bucket' => $bucket,
    'Key'    => $key,
    'Body'   => '{}'
    ));
    return $result;

  }

#codingexercise
Double GetAlternateEvenNumberRangeSumPower ()(Double [] A, int n, int m)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRange
SumPower(n,m);
}
Today we cover object storage vs block storage.
The rate of adoption for ObjectStorage can become very exciting both as an IT Admin and as a user.  Here are some use cases where it may prove more beneficial than say block storage:
  1. If you have static content of varying sizes and you cannot label the workload into  a category except calling it miscellaneous, then you can use ObjectStorage. It lets you add metadata to each content now treated as an object with some context around the data. It doesn’t split up files into raw blocks of data but the entire content is treated as an object. Whether its a few photos, uploaded music videos, backup files or just other data from your PC, they can now be archived and retrieved at will.
  2. No matter how many objects or their sizes, each object can be uniquely and efficiently retrieved by its ID. 
  3. When the basket of miscellaneous object grows to a few hundred terabytes or even a few petabytes, storage systems that were relying on adding block storage cannot keep up. Object Storage does not require you to mount drives, manage volumes or remap volumes. Besides, objects can store multiple copies of data which improves availability and durability. Whether its S3, Swift or Atmos, most vendors give this assurance.
  4. ObjectStorage can work with NAS and commodity nodes where scaling out is just addition of new compute rather than new storage.
  5. That brings to the point that if you are using data that is high up in read-write  such as say databases, then block storage such as with SAN will be helpful.
  6. You can sign a link to the object and share it with others to download with their web browser of choice.
If you have access keys to the object storage, you can upload the objects this way :

       $s3 = S3Client::factory(array(
            'base_url' => $host,
            'key'    => $access,
            'secret' => $secret,
            'region' => $region,
            'ssl.certificate_authority' => $ssl_certificate_authority,
        ));

        $result = $s3->putObject(array(
            'Bucket'     => $bucket,
            'Key'        => $key,
            'SourceFile' => $file,
            'Metadata'   => array(
                'source' => 'internet',
                'dated' => 'today'
            ) 
        ));

        $s3->waitUntilObjectExists(array(
            'Bucket' => $bucket,
            'Key'    => $key,
        ));

This is written using S3 APIs
Most S3 compatible vendors of Object Storage maintain their own set of access keys so you cannot use their access keys against another vendors' endpoint or storage.