Cluster computing

Thursday, January 19, 2017

Today we continue to read up on how Google protects data.Google has incredible expertise in this area. Security is an integral part of their operations. They scan for security threats using tools, they use penetration testing, quality assurance processes, security reviews and external audit. They have a rigorous incident management process for security events that affect the confidentiality, integrity or availability of systems of data. They apply defense-in-depth perspective to securing their custom-designed servers, proprietary operating systems and geographically distributed data centers.Data is encrypted in transit, at rest and on backup media, moving over the internet or traveling between the data centers. They build redundancy into their server design, data storage, network and internet connectivity and even the software services.Their audits cover compliance standards such as ISO27001, ISO27017, ISO27018, SOC 2 and SOC 3 as well as the FedRAMP programs. we saw what these certifications meant. Google takes efforts to secure the data at rest and in transit. It does not sell the data and isolates it between customers even on the same server. It provides administrative access to keep data private and secure.Approvals to grant access follows workflows that are audited. Such access include those from Law enforcement officials which are also published in a transparency report. Wherever third parties are involved, their risks have been assessed and their roles are determined by contract. Customers may export their data in portable formats. Google has a Data Privacy Officer assigned for questions on its regulatory compliance. Such compliance can include regional directives as for example EU Data Protection Directives where personal data may flow from subject to the Directive to providers outside the EU. Similarly Google provides compliance with Health Portability and Accountability Act aka HIPAA, US Family Educational Rights and Privacy Act aka FERPA and Children's Online Privacy Protection Act aka COPPA. The first pertains to health information, the second student information and the third children's information. Google empowers its users and administrators to improve security with its dashboard and admin consoles. Authentication features such as 2-step verification and single sign on and email security policies can be easily configured. Single Sign-on is enabled with SAML 2.0

#codingexercise
Determine if a binary tree is balanced
bool IsBalanced(Node root)
{
if (root == null) return true;
int lh = height(root.left);
int rh = height(root.right);
if (abs(lh-rh) <= 1 && IsBalanced(root.left) && IsBalanced(root.right))
return true;
return false;
}
int height(Node root)
{
if (root == null) return 0;
return 1 + max(height(root.left), height(root.right));
}

Wednesday, January 18, 2017

Today we continue to read up on how Google protects data.Google has incredible expertise in this area. Security is an integral part of their operations. They scan for security threats using tools, they use penetration testing, quality assurance processes, security reviews and external audit. They have a rigorous incident management process for security events that affect the confidentiality, integrity or availability of systems of data. They apply defense-in-depth perspective to securing their custom-designed servers, proprietary operating systems and geographically distributed data centers.Data is encrypted in transit, at rest and on backup media, moving over the internet or traveling between the data centers. They build redundancy into their server design, data storage, network and internet connectivity and even the software services.Their audits cover compliance standards such as ISO27001, ISO27017, ISO27018, SOC 2 and SOC 3 as well as the FedRAMP programs.
Let us take a look at what any of these certifications mean.
ISO27001 is a widely recognized independent certification. It covers 114 controls in 14 groups and 35 control objectives. These include information security policies, organization of information security, human resource security, asset management, access control, cryptography, physical and environment security, operations security, communications security, System acquisition, development and maintenance, supplier relationships, Information security incident management, information security aspects of business continuity management, Compliance with internal requirements such as policies and with external requirements such as laws.
ISO27017 is an international standard based on the ISO 27002 which is part of the ISO27000 series standards. It is specifically for cloud services. ISO27018 is a standard of practice for protection of personally indentifiable information in public cloud services.
SOC 2/3 (Service Organization Controls) is an audit framework for non-privacy principles that include security, availability, processing integrity, and confidentiality.
FedRAMP or the Federal Risk and Authorization Management Program or FedRAMP is a government wide program that provides a standard approach to security assessment, authorization and continuous monitoring for cloud products and services. This approach uses a "do once, use many times" framework that is intended to expedite US government agency security assessments and help agencies move to secure cloud solutions.
Aside from the industry standards, any company that maintains customer data has an understanding with the user where the data belongs to the customer but the company handles it. The company scan it for advertisements or sell it to third parties. The company takes efforts to protect customer data. If the customer deletes their data, the company should purge it from their systems within 180 days. Finally the company must provide tools to let the customer take their data if they cannot choose to stop using their services. Data must be kept private and secure by isolation for each customer even if they are stored on the same server. Access to data must be on a need to know basis. Approvals should be managed by workflow tools that audit records of all changes. What these certifications don't cover is how much of the traffic is configurable for encryption with an additional layer.

#codingexercise
Remove half nodes (nodes with one sibling from a binary tree)
Node RemoveHalfNodes(Node root)
{
if (root == null) return null;
root.left = RemoveHalfNode(root.left);
root.right = RemoveHalfNode(root.right);
if (root.left == null && root.right == null)
return root;
if (root.left == null)
{
var temp = root.right;
delete root;
return temp;
}
if (root.right == null)
{
var temp = root.left;
delete root;
return temp;
}
return root;
}

Tuesday, January 17, 2017

Today we read up on how Google protects its data. Generally these apply to all Google Apps Products but we will compare it to Adobe Cloud Services Compliance Overview wherever appropriate. As one of the foremost software development companies, Google has a strong security culture. Its employees are some of the best, all employees get security training, there are dedicated security and privacy team and include internal audit and compliance specialists. They also collaborate with researchers and research community in this area. Their claim includes the assertion that their cloud services security are even better than many traditional on-premises solutions. They dogfood their own cloud services that they make available to their customers which include over five million organizations worldwide.
Security is an integral part of their operations. First, they scan for security threats using tools, they use penetration testing, quality assurance processes, security reviews and external audit. They also prevent, detect and eradicate malware using tools such as VirusTotal. Their monitoring program analyses internal network traffic and thwart botnet connections. They have a proprietary correlation system that also analyses logs. They have a rigorous incident management process for security events that affect the confidentiality, integrity or availability of systems of data. Any data corruption possibility is eliminated starting from the design board itself. This particular program follows the NIST guidance on handling incidents.
Google is notable for its innovative and custom-designed servers, proprietary operating systems and geographically distributed data centers. This embraces the "defense in depth" view. The datacenters are state of the art. Access logs and activity records are maintained. Every ciritical component has alternate power while minimizing environmental impact with proprietary "free-cooling" techniques. Google's IP network consist of their own fiber, public fiber, and undersea cables. This reduces hops.
Traffic is routed through Google Front End servers to detect and stop malicious requests and Distributed Denial of Service attacks.
Data is encrypted in transit, at rest and on backup media, moving over the internet or traveling between the data centers. They build redundancy into their server design, data storage, network and internet connectivity and even the software services. Their recovery point objective (RPO) target is zero and their recovery time objective (RTO) design objective is also zero through live or synchronous replication.
Their audits cover compliance standards such as ISO27001, ISO27017, ISO27018, SOC 2 and SOC 3 as well as the FedRAMP programs. Incidentally these are also covered by Adobe. Google meets law enforcement data requests for legal requests as long as they are issued as subpoena or other written forms. They took a step forward by starting to regularly publish such data requests as their Transparency reports.

#codingexercise


Print a binary tree vertically.

void PrintVertically(Node root)

{

int min = 0; int max = 0; int hd = 0;

FindLimits( root, hd, ref int min, ref int max)

for (int i = min; i <= max; i++){

    PrintVerticalLine(root, i, hd);

    Console.WriteLine();

}

}

void findLimits(Node root, ref int min, ref int max, int hd)

{

if (root == null) return;

if (hd < min) min = hd;

if (hd > max) max = hd;

findLimits(root.left, ref min, ref max, hd-1);

findLimits(root.right, ref min, ref max, hd+1);

}

void PrintVerticalLine(Node root, int i, int hd)

{

if (root == null) return;

if (hd == i)

    Console.Write("{0} ", root.data);

PrintVerticalLine(root.left, i, hd-1);

PrintVerticalLine(root.right, i, hd+1);

}

Monday, January 16, 2017

Problem: Given a sequence of words, print all anagrams together.

Sort the letters within a word
Sort the words in the sequence
This brings the anagram groupings
Since each anagram keeps track of its index in the sequence we can find the corresponding words and their groupings.

Void PrintAllAnagrams(List<string> words)

{

Var items = new List<Tuple<string, int>>();

Words.enumerate((x,i) => items.Add(new Tuple<string, int>(x, i)));

Items.forEach(x => sort(x));

Items.sort(new TupleSorter());

Items.ForEach(x => console.writeLine("{0} at index {1}", words[x.second], x.second);

}

Sunday, January 15, 2017

Aesthetics in design improve the appeal of the product. Indeed, top designers look to improve products based on some of the following virtues:

1) mastery of complex systems and self-assembly

2) passion for versatility

3) redesign in a positive and constructive way

4) knack for making information clear and engaging

5) metaphysical appreciation of bodily functions

6) infusing objects with elegance and intelligence

But how much do we really enhance a tool to make it appealing enough for daily use ? We don’t think twice about using our toothbrush while sophisticated tooth brushes like the sonic toothbrush has also become mainstream. Furthermore, from forks, fidbits to cars, gadgets have been increasingly measuring every activity we do and becoming intelligent in making efficiencies in our habits. With the internet of things, IoT, a jargon used often to refer to connectivity to the internet for appliances and gadgets, these same measurement and data find their way from small devices to cloud based services and eventually to our smartphones giving us a rich experience of knowing even more about what we do and how to do it better.

But when does a HAPIfork that can monitor how quickly we are eating and signal us to slow down become enough that we reach for the drawer and get the old fashioned fork out? In fact, user experience talks to the art of demanding just the right amount of attention from the user and being polite and enticing to the user to get their tasks done as easily as possible.

With the social engineering applications such as Facebook, Twitter etc and their ecosystems allowing the devices and applications to share information and tap into the peer pressure for punishing and rewarding certain behaviors taking the devices from becoming smart and connected to the next level of reaching out to your friends. The application BinCam was about making your recycling behavior visible to your circle to reward responsible behavior.

This is a new trend of re-inventing social engineering as product engineering and even arguably advertising. From smart glasses to smart forks, “Smart” in Silicon Valley for example is the shorthand for transforming the convention with a disdain for the poor souls who haven’t upgraded.

But what really is a balance between the an old fork and a smart connected fork? There in seems to be the feng shui of design which can tip between good “smart” and bad “smart”. While most people can immediately relate to the two ends of spectrum between what is helpful and appreciable and the other overwhelming or resulting in premature balding, there is a far more troublesome and complicated story when all the perspectives fall into the picture. From designer, advertiser, business etc all the way to the enduser, each one has a different meaning to what is good and what is bad. There doesn’t seem to be a panacea that can be strived for in the end product but yes a common ground would almost always be welcome. There may still be concerns such as privacy and verbose information but they can be alleviated with technology and standards borrowed from realms involving the bigger world such as computers and clouds.

Among all these concerns, perhaps the closest one that I find unequivocally alluring in almost all domains is the notion of quality assurance. A product, simple or sophisticated, can be a joy to use so long as it can earn the trust of its user. The reliability and accuracy comes from quality control. Hand in hand with this working formula is the notion of minimalist design. Minimalistic design embraces the idea that less is more. The adage goes that the most important thing for an application is not what it does, but what users think it should do. Creating not just a clear map between purpose and function but also a clever map that strikes a chord with the user, then seems to be the right approach from a product design perspective that can withstand the forces exerted by the different players around the product from business to consumer.

There’s a lot more debate and some hugely popular with TED talkers but I just present a short list in the reference together with my understanding above.

Reference:

1) http://www.intelligencesquaredus.org/debates/smart-technology-making-us-dumb

2) http://gizmodo.com/5918045/why-smart-people-are-actually-dumb

3) http://www.newyorker.com/tech/frontal-cortex/why-smart-people-are-stupid

4) https://www.fastcodesign.com/3027943/innovation-by-design/berg-wants-to-be-the-platform-to-make-any-dumb-appliance-smart

5) https://www.mirumagency.com/blog/when-dumb-design-smart-tech-companies-find-minimalism-leads-success-0

6) Are smart gadgets making us dumb ? http://www.wsj.com/articles/SB10001424127887324503204578318462215991802

#codingexercise
List<int> AddTwoLists(List<int> a, List<int> b)
{
Var num1= a.ToDouble();
Var num2 = b.ToDouble();
Var sum = (a + b);
Return sum.ToList();
}
List<int> AddTwoListsDetailed(List<int> a, List<int> b)
{
Int k = a.Count + b.Count - 1;
Var ret = new List<int>(k){0};
Int I = a.Count - 1;
Int j = b.count - 1;
Int carryover = 0;

While ( I >= 0 && j >= 0)
{
Var sum = a[i] +b[j] + ret[k] + carryover;
ret[k] = sum%10;
Carryover = sum/10;
i--;
j--;
k--;
}
While( I >= 0)
{
Var sum = a[i] + ret[k] + carryover;
ret[k] = sum%10;
carryover = sum/10;
i--;
k--;
}
While( j >= 0)
{
Var sum = b[j]+ ret[k] + carryover;
ret[k] = sum%10;
carryover = sum/10;
j--;
k--;
}
While(carryover)
{
Ret[k] = carryover%10;
Carryover /= 10;
k--;
}
Return ret;
}
It can be shown that the sum of two numbers cannot have more digits than both of them combined which is why we allocate and initailze a list of zeros to store the sum.

Saturday, January 14, 2017

We continue discussing the paper by Papadimitriou - Algorithms, Games and Internet. We started with Nash equilibrium and then we discussed Internet Equilibria. We viewed the latter in terms of multi commodity flow network where the maximum total flow in a subgraph is in the core of a coalition game. we saw the definition of core to hint at optimal set. Next we looked at markets. Markets are known for price equilibrium. and then we discussed mechanism design problems.

Next we discussed privacy and clustering. Then we reviewed web-graph. The web graph is considered as a directed graph with the documents as nodes and the hyperlinks as edges but it is notoriously different from a classical graph.The deviations from classical graph have prompted many other efforts to model the web graph. One such approach is the heavy tailed distribution which is interesting because it seems to have parallels in other domains. The population of xth largest city of any country is cx^(-1) with c depending on country.
The xth largest indegree or outdegree in a web graph is about c.x^(-alpha) where c and alpha are positive instead of the Gaussian predicted by the law of large numbers.

#codingexercise
/* Multiply contents of two linked lists */
double multiplyTwoLists (Node first, Node second)
{
double num1 = 0;
double num2 = 0;
while (first != null || second != null)
{
if(first != null){
num1 = num1*10 + first.data;
first = first.next;
}
if(second != null)
{
num2 = num2*10 + second.data;
second = second.next;
}
}
return num1*num2;
}

double multiplyTwoLists(List<int> first, List<int>second)
{
var ret = new List<int>(first.count+second.count){0};
int iteration = 0;
for (int i = second.count()-1; i >=0; i--)
{
int multiplier = second[i];
int carryover = 0;
int j = ret.Count - 1 - iteration;
for (int k = first.count()-1;k >=0; k--)
{
var product = first[k]*second[i];
product += ret[j];
product += carryover;
var value = product %10;
ret[j] =value;
carryover = product / 10;
j--;
}
If ( I ==0)
{
While(carryover)
{
Ret[j]= carryover%10;

Carryover /= 10;
J--;
}
}
iteration++;
}

return ret.ToNumber();
}
List<int> multiplyTwoLists(List<int> first, List<int> second)
{
Double num1 = first.ToDouble();
Double num2 = second.ToDouble();
Double result = num1 x num2;
Return result.ToList();
}

Friday, January 13, 2017

We continue discussing the paper by Papadimitriou - Algorithms, Games and Internet. We started with Nash equilibrium and then we discussed Internet Equilibria. We viewed the latter in terms of multi commodity flow network where the maximum total flow in a subgraph is in the core of a coalition game. we saw the definition of core to hint at optimal set. Next we looked at markets. Markets are known for price equilibrium. and then we discussed mechanism design problems.

Next we discussed privacy and clustering. Today we review web-graph. The web graph is considered as a directed graph with the documents as nodes and the hyperlinks as edges. However it violates the principles of classical graph models. For example its indegrees and outdegrees of the nodes are distributed by a polynomial tailed distribution. The xth largest indegree or outdegree is about c.x^(-alpha) where c and alpha are positive instead of the Gaussian predicted by the law of large numbers. Its giant strongly connected components cover only 40% of the nodes instead of all or none. Also, there are scores of K3,3 subgraphs instead of none. A K3,3 graph is a complete bipartite graph of six vertices usually illustrated with two subsets of three nodes and edges joining every vertex in one subset with another. The deviations from classical graph have prompted many other efforts to model the web graph. However no model comes close to satisfactorily explaination One model called the heavy tailed distributions is interesting because it seems to have parallels in other domains. The population of xth largest city of any country is cx^(-1) with c depending on country. The market share of xth largest manufacturer in any sector is c.x^(-alpha) A document's indegree is determined by how interesting the document is and intuitively analogous to market share while outdegree depends on the entity's attention which is not very different from income. The heavy tailed distributions are also observed beyond the world wide web such as the degrees of routers and autonomous systems.

#codingexercise
Yesterday we showed that the optimum way of clustering into k segments of a demand curve is O(n^2)
Today we show that this can be done with dynamic programming. If the price is x, the ith customer will buy a quantity y = (bi-ai)x. The k segments are determined by increasing values of ai/bi.
int GetOptimumRevenueDP(int a, int b, int u, int i, int k, List<Customer>customers)
{
if (n == 1){assert(a <= u && u <= b);
return minimum (Revenue (u,b) + Revenue(a,u));
}
return minimum(Revenue(u,b) + GetOptimumRevenue(a,b,u,n-1,k,customers);
}

Courtesy: https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM2978.pdf

Palindrome partitioning
find the minimum number of partitions for partitioning a string into palindromes.
int GetMinPartitions(String str, int i, int j)
{
if (i ==j ) return 0;
if (IsPalindrome(str.GetRangeBetween(i,j))/ return 0;
int min = int_max;
for (int k = i; k <=j-1; k++)
{
int val = GetMinPartitions(str, i, k) + 1 + GetMinPartitions(str, k+1, j);
if (val < min)
min = val;
}
return min;
}
all non palindromes have an upper limit for the partitions as the number of characters in the string