Cluster computing

Friday, September 18, 2015

Today we discuss another Olympiad problem.
[IMO Shortlist 2009]
Five identical empty buckets of 2-liter capacity stand at the vertices of a regular pentagon. Cinderella and her wicked Stepmother go through a sequence of rounds: At the beginning of every round the Stepmother takes one liter of water from the nearby river and distributes it arbitrarily over the five buckets. Then Cinderella chooses a pair of neighboring buckets, empties them into the river, and puts them back. Then the next round begins. The Stepmother’s goal is to make one of these buckets overflow. Cinderella’s goal is to prevent this. Can the wicked Stepmother enforce a bucket overflow?

Let us take the volumes as V1, V2, V3, V4 and V5. Cinderella (C) can only empty adjacent buckets. If the alternate buckets are more than 1 liter, she will not be able to empty both. Therefore she prevents this condition. For each i = 1 to 5, C must ensure that Vi + Vi+2 is at most one on each of her turn. She maintains this 'good' state.

The good configuration enforces return to good state. For example, if two buckets are empty, V4=V5=0, Then V1 + V3 <= 1 and V2 <= 1 because V2 + V4 <= 1 After the step mothers turn, V1 + V3 + V4 + V5 <= 2. Therefore either V5 V3 <=1 or V4 + V1 <=1 So C empties both V1 and V2 and the new configuration is still good V4 <=1 and V5 + V3 <=1

Since at the end of each turn for C, we have a good configuration, C has a winning strategy.

#################### Paper Review #####################

Today we discuss "Ontology based text summarization - The case of Texminer".
This paper talks about Texminer which does summarization by a combination of summarization techniques.Before delving into the techniques, it might be a good reminder to go over the summarization techniques.

One technique is to use the structure of the discourse to generate abstracts. The Rhetorical Structure Theory, for example, attempts to identify the internal structure of the text and the relations of the discourse formed within it, giving priority to the nuclear components of these relations.On the other hand, Marcu segments the text into small units of discourse. He then proceeds to build a rhetorical structure in the form of a tree by analyzing the set of relations that exist between the units. Once the discourse structure has been created, he assigns weight and an order to each element of the structure. - the higher the element within the structure the greater its weight.

Different ways of using the text discourse structure has been shown. Some have used the rhetorical status of affirmations contained in documents to identify their internal structure. The main contribution of these authors lies in the algorithm that deals with the non-hierarchical structure : given seven fixed catagories (aim, textual, own, background, contrast, basis and other) it is capable of distributing the contents of the articles within each category.

Another way has been to use templates to generate summaries and retrieve information. This technique can only be applied when the text is previously structured. Information retrieval systems include Fies that extracts financial information from digital articles.

Mateo combined superficial with deep structure analysis to enhance the coherence and cohesion of the abstract. Alonso and Fuentes for example combine lexical chains plus the rhetorical and argumentative structure, derived using discourse markers.

Using combination of complex linguistic techniques, Aretoulaki developed a model for automatic summarization that selects sentences using content features of a pragmatic and rhetorical nature, obtained by means of superficial linguistic analysis such as the Theory of Speech Acts, the Rhetorical structure Theory and the theories centered on cohesion and coherence.

Automatic production of abstracts was initially based on statistical methods. in the wake of research by Luhn and now uses diverse methodologies. With the extraction of terms or strings of significant words, systems build discourse models. These systems have been used to identify rhetorical structures highly related with the content of the documents and their organizational scheme.

The growth of cognitive science has allowed the incorporation of semantic-conceptual models.Together with the use of knowledge bases, these models vastly improve the process of summarizing texts on a specific topics.

The demand for domain knowledge has seen exceptional growth recently. Models for extraction and disambiguation of text has changed recently. It may even have become harder to find tools to help with such narrow domain texts. The lack of specific dictionaries, the absence of a defined theory and the dire need for professionals in the sector to have summarization tools to carry out their work has become pervasive. Documentation, Terminology and Natural Language Processing (NLP) indicate that there is a demand here in this narrow-domain texts.

This is where TexMiner may help solve the problem. It is based on the conviction that summarizing texts within a specialized domain requires a model capable of processing its semantic and socio-cognitive components.

The socio-cognitive user paradigm aspect of TexMiner takes into account the historic, social and cultural factors. These techniques consider a domain as the mindshare in terms of concepts, terms and knowledge from a community.
It uses ontologies as algebraic descriptions of a conceptual network in which the elements are binary relations established through concepts.
Lastly it uses three agents for processing information.
The first is a reading agent that reads the documents and labels them in XML. Here the goal is to discern if a document belongs to a particular domain based on the count of words that fall in the domain.
The second is a summarization agent that carries out automatic tagging of the syntactic discourse structure of the text. The design of the tags is governed by
- the sequence in which the levels are tagged in practice
- the symbols used to denote cohesive elements of the text, and
- the presence in the text of geographic elements, verbs statistical data formulae and processes.
The last agent is the Information retrieval agent. Texminer allows for the searches to be made through the ontology, as well as the lexical databases developed to take advantage of the functionalities of summaries for the purpose of text retrieval.

Thursday, September 17, 2015

User API or Account Management

Most internal organizations maintain their registered users in a database. As an identity provider, this suffices to maintain the current users However, different applications and services may need read only access to the registered users, their id, name and email with or without the direct access to the database. This is typically because the applications work with a single database at any time not many databases at the same time. If the database, happens to be different some form of dependency injection may be required for the application to continue with the assumption that it can reach the list of registered users.

Active Directory is a superset of all such users and is the final authority for knowing if the user is a valid entity or not. To check if the registered user is still current, we can defer to the AD. However, AD does not come with its own API other than LDIF.

Companies often have API wrappers around the AD to facilitate such function as creating and deleting groups. But users listing is not necessarily provided by an API.

Therefore an API access to registered users should be no surprise and a way to facilitate the addition or deletion of users may very well be required in certain cases.

It may even be helpful to separate out read-only access from read-write access to this users list.

As an example,

class UserViewSet(generics.ListAPIView):

    serializer_class = UserSerializer

    queryset = User.objects.all()

The Read-Write access can include checking for existing users and adding a new user or deleting an existing user. The attributes for user information typically involve status, email, full name, password, created, modified etc. There may be additional qualifications such as type or group, comments, last login time etc. It must be noted that we assume the registered users table to be unique for the application or organization we are considering. That is why such a centralized table requires an API for programmatic access. If the table can be different for different applications, then there is no need to write an API for each and instead import it directly into the database. Said another way, we are enforcing the sharing because we want to keep one master copy up to date instead of relying on copies, replicate, merge and sync between databases.

APIs in service oriented architecture model are very useful for exporting such data to be used in different applications or services.

[IMO Shortlist 2004]
Problem: A and B take turns writing a number as follows. Let N be a fixed positive integer. First A writes the number 1, and then B writes 2. Hereafter, in each move, if the current number is k, then the player whose turn it is can either write k + 1 or 2k, but no player can write a number larger than N. The player who writes N wins. For each N, determine who has a winning strategy.

Solution:

Step 1) if N is odd, A can win. This is because A can always write an odd number after which B has to write an even number and N becomes a P position

Step 2) All even numbers greater than N/2 are P-positions. This is because until N/2, we have the ability to double the number but not beyond that otherwise it will exceed N. After N/2 both players will have to increment the number by 1.

Step 3) If N = 4K or N = 4K + 2, then K is a P-position. This is because if X writes k, Y must write k+1 or 2K Then X writes 2k + 2 if Y writes k and X writes 4k if Y writes 2K. X has thus written an even number greater than N/2 and by step 2, X wins. X can be either A or B and Y is the other of the two.

Step 4) If X has a winning strategy for N = k, then X has a winning strategy for N = 4k and N = 4k + 2
Proof: Consider a game where N = 4K or 4K + 2. Based on the previous step, the goal can now be modified to write K first. How can player Y prevent X from writing K ? The answer is to jump over K. After k/2, the number can be doubled. But X can double the number again resulting in an even number that is at least equal to 2(k + 1) > N /2. So X wins by step 2.

The recursive method for defining the answer for even N is as follows:
The answer for N is the same as that for floor(N/4). To convert this recursion into an explicit answer, write N in base 4. The floor(N/4) is the same as removing the last digit when N is written in base 4. We keep removing the last digit and the resulting numbers will all be winning for the same player by the same recursion. If at some point the number obtained is odd, then A wins for this number and hence A wins for N. If the N has only 0s and 2s in its base 4 representation, then with recursion we end up with number 2. B wins in this case and therefore for N.

The moves for A involve :
Write 1 at the beginning
check if B's move has exceeded N
if N is odd, write the next odd number
if N is even and equal to 4K or 4K+2, recurse floor(N/4) as say c till you get c as odd or 0 or 2
if c is odd, then arrive at c by playing odd
if c is 0, then keep playing odd
if c is 2, then declare B winner

The moves for B similarly involve
Write 2 at the beginning
If N is odd, write the current number + 1 or current number * 2
if N is even, follow the same strategy as A

B wins only when the recursion stops at 2. Otherwise A wins with winning strategy as above.

Wednesday, September 16, 2015

Saint Petersburg 2001

The number 1000000 is written on a board. A and B take turns, each turn consisting of replacing the number n on the board with n-1 or floor((n+1)/2). The player who writes the number 1 wins. Who has the winning strategy ?

Answer:

The intuition here is that A wins for any even number as the numbers are replaced by something smaller.

Take n = 2, it can be replaced by one of the two given transformations 2 -1 or floor((2+1)/2) in the first move itself by A. Take n = 4 and it can be replaced as 3 by A given that 2 is the desirable state for A. We call the position 2 as N-position because the player whose turn it is to play can win. We call the position 3 as P-position because the player who has just played it can force a win. In this case, A having played 3 forces B to write 2 either by 3-1 or floor((3+1)/2) and in both cases A wins. Similarly we can show that A wins for n = 6.

In other words, all even numbers are positions where A can play first and force a win. To formalize this, we can experiment incrementally for the first few multiples of 2 . At some point 2k when we are satisfied, we say that all the even numbers less than 2k are N-positions.

With this assumption , we now have to show that 2K is also an N-position. If we are able to do so, we can progressively claim all 2K positions going forward are also going to be N-positions and therefore prove that all even numbers are N-positions. So how do we prove 2k is an N-position. We can show that 2K is an N-position because k is either a P position or an N position. For example k can be odd as when k=3 and A can write k directly and win. If k is a N position, A can write 2k-1 in the first move. B is then forced to choose either 2k-2 or floor((2k+1)/2) both of which results in k and A wins. Thus we have proved that all even numbers are N positions.

At this point the given input is an even number and as A goes first in the turns, A has the winning strategy.
If A removes kn stones where k < n and B removes jn where j < n then N - kn - jn is a P position because we assumed B wins. But A could have removed kn +jn so Acould have won contradicting our assumption

Russia 2011 adapted
There are N > n^2 stones on a table. A and B play a game. A begins,
and then they alternate. In each turn a player can remove k stones,
where k is a positive integer that is either less than n or a multiple
of n. The player who takes the last stone wins. Prove that A has a
winning strategy.

Here either A or B can have a N or a P position which by the description is a winning strategy for turn of the next player or the one having just now played respectively
Let us suppose to the contrary that B has a winning strategy.
However it is not sufficient to show that B cannot win in a few cases. We have to show it also in cases where B and A respond to each others moves. Thankfully these are also similar in nature to what we described.
Let A remove kn stones in the first move. Then B removes f (k) stones in response. Then N - kn - f (k) is a P position. Now kn can be the same as jn for eliciting the same response from B because the distribution covers the same range uniformly between 1 and n-1. Therefore, N-jn-f(k) is also a P position. But suppose k < j and A removes kn stones followed by B who removes f (k) stones then A removes (j-k)n stoned leaving N-jn-f (k) and that would be a P position as seen earlier. In that case A would win and it would contradict the assumption we made.

Tuesday, September 15, 2015

2014 IMO problem

Let n >= 2 be an integer. Consider an n x n chessboard consisting of n ^ 2unit squares. A configuration of n rooks on this board is ‘peaceful’ if every row and every column contains exactly one rook. Find the greatest positive integer k such that, for each peaceful configuration of n rooks, there is a k x k square which does not contain a rook on any of its k ^ 2 squares.

Let us suppose that k = [sq.rt(n)] – 1 where [n] is the ceiling function of n.

Each n x n chessboard with a peaceful configuration of n rooks contains a valid k x k square. How do we prove this ? Take a rook and place it at the top most row. This column can be part of k consecutive columns, only one of which contains rook R. The consecutive columns can be chose including those on the left of R or to the right of R or both. There are n-k+1 of such consecutive columns and therefore as many k x k squares.

Now, n-k+1 = n- [sq.rt(n)] +2

= n-sq.rt(n)+2

> n – sq.rt(n) + 1

= sq.rt(n)(sq.rt(n) - 1) + 1

> (sq.rt(n)-1)(sq.rt(n)-2)+1

= k(k-1) + 1

From the above expression, from these many in the k x k squares in C, there exists a k x k square not covered by any rook. We say this because we can distribute those many squares above such that we leave the k x k square unoccupied. And so our supposition is a valid choice.

Now we have to mutually exclude this region and show that its still possible to have a chess board as required. Without this (k+1) times (k+1) square, we can still arrange the chess board if we proceed this way. First place a rook at the upper left corner of the board. Next place a rook at one square to the right and k+1 spaces down of the second rook. Repeat until the position is outside the bounds of the chessboard in which case wrap around to the top and continue placing the other rooks. Indeed such a board exists and the arrangement of the rooks is clearly peaceful

Because (k+1)^2 > n, it has the property that any (k+1)x(k+1) square in the board will be occupied by at least one rook. This complete the proof.

[Lithuania 2010]
Problem: In an m × n rectangular chessboard there is a stone in the lower leftmost square. A and B move the stone alternately, starting with A. In each step one can move the stone upward or rightward any number of squares. The player who moves it into the upper
rightmost square wins. Find all (m, n) such that A has a winning strategy.
Solution: Note that in this problem, the moves possible are either one up or to the right. To move diagonally, you need to exchange turns.
Therefore the winning strategy would be to occupy the diagonals ourselves. Moreover, the diagonal is only for the square portion of the chessboard of size say m x m not the rectangular portion of the m x n. In the rectangular portion, the moves will be linear since we would have already reached the top row and the squares will be equally and alternatively occupied.

Monday, September 14, 2015

Today we discuss some more Olympiad problems:
Italy TST 2009
A and B play the following game. First A writes a permutation of the numbers from 1 to n, where n is a fixed positive integer greater than 1. In each player’s turn, he or she must write a sequence of numbers that has not been written yet such that either:
a) The sequence is a permutation of the sequence the previous player wrote, OR
b) The sequence is obtained by deleting one number from the previous player’s sequence
For example, if A first writes 4123, B could write 3124 or 413. The player who cannot write down a sequence loses. Determine who has a winning strategy.
Let us focus on the endgame. when n = 2, B wins after A's first move B deletes the number 2 and is left with the sequence {1}. Then A has no move.
With this end game, we now create a progression using induction Suppose B wins n = k, then we want a strategy for n = k + 1 B's aim is to make A be the first player to delete a number from the sequence. Then from this point the game is reduced to a game with k numbers. How to do this, for every sequence that A writes for k+1 numbers, there will be a permutation that A has not written because permutations are factorial(k+1) which is even. For example, one such sequence would be to write it backwards.
Another Olympiad question:
The Y2K Game is played on a 1 × 2000 grid as follows. Two players in turn write either an S or an O in an empty square. The first player who produces three consecutive boxes that spell SOS wins.
If all boxes are filled without producing SOS then the game is a draw. Prove that the second player has a winning strategy.
Let us say an empty square is bad when it leads to the formation of the given pattern SOS.
The claim here is that an empty square is bad only when it occurs in between S empty empty S formation that we denote by S__S.
The claim follows from the configuration that an empty square is bad when it is flanked by an S on one side and an empty on the other. Clearly both the empty squares above is bad.

Sunday, September 13, 2015

Today we discuss a Forrester report title Cloud Platforms that will turn Enterprise Architecture Inside Out

The report says that cloud services have been adopted in an adhoc fashion, driven by the opportunity and expediency more than by strategy, hence this report provides a template to develop cloud architecture to balance the needs to be agile while managing risks.

Cloud services are now being added under the mandate to provide competitive edge. This introduces risks to security and business models, the potential for vendor-lockin and other challenges. The trouble with cloud services are that unless they are specifically designed, they get used arbitrarily. Architects therefore spend time developing clear guidelines for cloud adoption and use. And they must develop them differently from the on-premise centric world. This report discusses the risks associated with the explosion of the cloud services. It proposes the use of architecture zones to manage the cloud agility and risk. Moreover, cloud is an opportunity for Enterprise Architect (EA) leaders to reinvent enterprise architecture.
Let us take a look at some of these challenges and risks:
Public cloud service providers have unstable pricing models. They regularly decrease the prices of highly visible services but also add new services that increase the costs to enterprises. Architects previously were not concerned with software licensing.
Vendor lockin versus business flexibility - EA pros have always been concerned with vendor - lockin. There are very few examples that prevent this in the cloud as compared with on-premise Standard and open-source products like Cloud Foundry and OpenStack help promise to insulate customers from lock-in but are still untested.
The trouble with cloud services is that if you bend it to your needs, you have more pain than the on-premise world. Customization is not what the cloud services provider was pushing and if you do decide to go with it, the CSP was probably not a truly offering cloud services to begin with.
Security and confidentiality concerns - even though major applications and services are building on public cloud, architects are rarely involved from the start to determine the security policies, procedures and controls.
If you design your own private cloud, guidelines for security may have to be explained even more.
The right approach for the EA depends on the CSP business models.
Hosting companies that focus on infrastructure as a service and hybrid is one such model. Most CSPs make their money by renting infrastructure capacity. They often invest more in capital equipment, infrastructure administration, and software management. For example RackSpace differentiated itself early on as an IaaS provider.
Integrators that focus on delivering the solution and not the means is another such model. This is typically the consulting company at the core. They have strongest background in managing, customizing, and integrating commercial off-the-shelf COTS software. The Accenture duck-creek suite, supported on Microsoft Azure is a good example.
Software companies that put their solution front and center. Independent software vendors (ISV) led cloud services providers have their own applications, middleware, tools or other code. Their model centers on licensing and renting of this IP and the expansion of their software portfolios.
Pure cloud CSPs that focus on subscription growth is another such model. They shift the focus to low entry points and pay-per-use business models, but long term customer commitments are driving these vendors. Leading SaaS vendors are massively multi-tenant.
This paper proposes the use of architecture zones. A simple template with well defined zones can help EA professionals to provide business related guidance.
The core business zone maintains the on-premise status quo - they provide the administrative backbone of the organization. This is considered the systems of record.
The intermediate zone is where change is less dynamic. Systems of automation and systems of design are appropriate for an intermediate zone. This is considered the systems of automation.
The business agility zone is where the change is the norm. Innovative new capabilities using cloud technologies enters this agility zone. This is considered the system of insight.