Today we continue our discussion on system design. This time we cover Splunk. 1)
Splunk – Splunk is a scaleable time series
database arranged as events in buckets – hot/warm, cold or thawed. These are stored as index files together with
some metadata. There are special buckets called introspection. The architecture
consists of light weight forwarders, indexers and searchheads each with its own
topology and built to scale. The forwarders are the collection agent that
collect machine data from customer. The indexers receive the events and handle
the bulk of the operations. The searchheads present analysis tools, charts and
management interfaces. Splunk has
recently added analysis features based on machine learning. Previously most of
the search features were based on unix like command operators that became quite
popular and boosted Splunk’s adoption s as the IT tool of choice among other
usages. There are a variety of charting tools and their frontend is based on
beautiful Javascript while the middle tier is based on Django. The indexers are
written in C++ and come with robust capabilities. It is important to note that
their database unlike convention relational data or NoSQL was designed
primarily for specific usages. If they moved their database to commodity or
platform options in the public cloud, they can evolve their frontend to be not
restricted to a single enterprise based instance or local host based instance
and provide isolated cloud based storage per customer on a subscription basis
and Splunk as a cloud and browser based service.
Next, we cover Consistent Hashing – This is a notion that is quietly finding its way into several distributed systems and services. Initially we had a cache that was distributed among n servers as hash(o) modulo n. This had the nasty side-effect that when one or more servers went down or were added into the pool, all the objects in the cache would lose their hash because the variable n changed. Instead consistent hashing came up with the scheme of accommodating new servers and taking old servers offline by arranging the hashes around a circle with cache points. When a cache is removed or added, the objects with hashes along the circle are moved clockwise to the next cache point. It also introduced “virtual nodes” which are replicas of cache points in the circle. Since the caches may have non-uniform distribution of objects across caches, the virtual nodes have replicas of objects from a number of cache points.
Public class ConsistentHash<T>{
Private SortedDictionary<Int, T> circle = new SortedDictionary<Int, T>(); // Usually a TreeMap is used which keeps the keys sorted even with duplicates.
Since a number of replicas are maintained, the replica number may be added to the string representation of the object before it is hashed. A good example is memcached which uses consistent hashing.
#codingexercise
Find if there is a subset of numbers in a given integer array that when AND with the given number results in Zero.
static bool IsValid(List<int> items, int Z)
{
var res = new BitArray(new int[] { Z });
foreach(var item in items)
{
var b = new BitArray(new int[]{item});
res = res.And(b);
}
int[] result = new int[1];
res.CopyTo(result, 0);
return result[0] == 0;
}
Next, we cover Consistent Hashing – This is a notion that is quietly finding its way into several distributed systems and services. Initially we had a cache that was distributed among n servers as hash(o) modulo n. This had the nasty side-effect that when one or more servers went down or were added into the pool, all the objects in the cache would lose their hash because the variable n changed. Instead consistent hashing came up with the scheme of accommodating new servers and taking old servers offline by arranging the hashes around a circle with cache points. When a cache is removed or added, the objects with hashes along the circle are moved clockwise to the next cache point. It also introduced “virtual nodes” which are replicas of cache points in the circle. Since the caches may have non-uniform distribution of objects across caches, the virtual nodes have replicas of objects from a number of cache points.
Public class ConsistentHash<T>{
Private SortedDictionary<Int, T> circle = new SortedDictionary<Int, T>(); // Usually a TreeMap is used which keeps the keys sorted even with duplicates.
Since a number of replicas are maintained, the replica number may be added to the string representation of the object before it is hashed. A good example is memcached which uses consistent hashing.
#codingexercise
Find if there is a subset of numbers in a given integer array that when AND with the given number results in Zero.
static bool IsValid(List<int> items, int Z)
{
var res = new BitArray(new int[] { Z });
foreach(var item in items)
{
var b = new BitArray(new int[]{item});
res = res.And(b);
}
int[] result = new int[1];
res.CopyTo(result, 0);
return result[0] == 0;
}
No comments:
Post a Comment