Cluster computing

Sunday, September 20, 2015

There are 2n people seated around a circular table, and m cookies are distributed among them. The cookies can be passed under the following rules:

(a) Each person can only pass cookies to his or her neighbors

(b) Each time someone passes a cookie, he or she must also eat a cookie
Let A be one of these people. Find the least m such that no matter how m cookies are distributed initially, there is a strategy to pass cookies so that A receives at least one cookie.

Let us denote the people with symbols such as A-n, A-n+1, A-n+2, ...A-1, A0, A1, ... An where A-n and An are the same person since it is a circular table.

A weight 1/2^abs(i) is assigned to each cookie held by a person Ai. Thus for example if A3 passes a cookie to A2, the cookie's weight increases from 1/8 to 1/4. Since A3 must also eat a cookie of weight 1/8 in this step, we see in this case, the sum of the weights of all the cookies has remained the same. The sum of the weight of all the cookies has remained the same. If Ai has ai cookies for each i, then the total weight of all cookies is Sum of i from -n+1 to n is ai / 2^ abs(i).

Whenever a cookie is passed towards A0, (from Ai to Ai-1 or the reverse direction), one cookie is eaten and another cookie doubles its weight, so the total weight remains invariant. If a cookie has passed away from A0, then the total weight decreases. Thus the total weight is indeed a monovariant.

If m >= 2^n, we can always ensure that the A0 gets a cookie. Any of the directions could be chosen but it should not pass A0 because the weight would reduce. In each step therefore a cookie progresses towards A0 from either side of the diametrically opposite end. We use a new quantity to indicate the direction. Let W+ be the sum of the weights of the cookies held by A0, A1 ... An and let W be the sum of the weights of cookies held by A0, A-1, A-2 ... and we can suppose W+ >= W=. Then this suggests An can pass cookies to only An-1 and we use only this semi-circle containing non-negative indices, since this is the semi-circle having more weight. In each step, as m any cookies are passed as possible to An-1 and similarly forwarded. This works only if and only if W+ is > 1 which is necessary as W+ is a monovariant.
To show that this is sufficient, we note that the algorithm leaves the W+ an invariant. The algorithm terminates when we cannot pass anymore cookies from any of the Ai with i positive, and A0 does not have any cookies.A1, A2, ... An all have at most one cookie at the end. If they had more, they would eat one and pass one and the algorithm would not have terminated. Then W+ would sum upto 1/2 + 1/4 + ... + 1/2 ^ n < 1, contradicting the fact that W+ is an invariant and >= 1. Thus W+ >= 1 is a sufficient condition for the algorithm to work. Finally we prove that we indeed have W+ > 1 We assumed W+ > W-. Now simply note that each cookie contributes at least 1 / 2 ^(n-1) to the sum(W+ and W-), because each cookie has weight at least 1/2^(n-1) except for cookies at An. However, cookies at An are counted twice since they contribute to both W+ and W-, so they also contribute 1/ 2^n-1 to the sum. Since we have at least 2 ^ n cookies, W+ and W- >= 2, so W+ >= 1 and have proved that this is both necessary and sufficient.

Note that the use of a geometric progression ensures that the cookie can be consumed and passed along and using the property of the sum of this progression we bound it at the diametrically opposite end of the table because that suffices to have everyone get a cookie.

The criteria for weights on a semi-circle to be more than 1 is therefore of a consequence of the above.

Saturday, September 19, 2015

IMO Shortlist 1994

Peter has 3 accounts in a bank, each with an integral number of dollars. He is only allowed to transfer money from one account to another so that the amount of money in the latter is doubled. Prove that Peter can always transfer all his money into two accounts. Can he always transfer all his money into one account?

Let A, B, C be the number of dollars in the account 1, account 2 and account 3 respectively and A <= B <= C. If A = 0, we are done. At any time we want to reduce min(A,B,C) to the point where it is 0. The values of A, B and C can keep changing

Euclidean theorem monotonically reduces a number by using form B = qA + r with 0 <= r < A. We use this form. With this form if we can reduce B to r then we are done. Since r < A, we would have reduced min(A,B,C) which was our aim.

The question however says we have to double so we use binary representations. Let

q = 2M^k + ... + 2M1 + M0 be the binary representation of q and where Mi is 0 or 1. To reduce B to r, in step i of the algorithm, we transfer money to account 1. The transfer is from account 2 if Mi-1 = 1 and from account 3 if Mi-1 = 0. The number of dollars in the first account starts with A and keeps doubling in each step. Thus we transfer Aq dollars from account 2 to account 1, and we are left with B-Aq = r dollars in account 2. At this point we have reduced min

The answer to the last question is that it is not possible when the number is odd.

Lets try this out with sample numbers

We have accounts with 2, 5, 16

B = 2* A + 1

so q =2 and it can be written as 0 + 2 * 1 for the binary representation of 2

We transfer Aq = 2 * 2 = 4 dollars to 1 and now have 6, 1,16

Now A, B, C has changed and we have A = 1, B= 6, C = 16
now B has q=6 and r = 0 and since m1 was 1 and m2 is 1, we have the transfer again from B.
This leaves us with A=0, B=7 and C=16.

Note that we cannot reduce any further.

Friday, September 18, 2015

Today we discuss another Olympiad problem.
[IMO Shortlist 2009]
Five identical empty buckets of 2-liter capacity stand at the vertices of a regular pentagon. Cinderella and her wicked Stepmother go through a sequence of rounds: At the beginning of every round the Stepmother takes one liter of water from the nearby river and distributes it arbitrarily over the five buckets. Then Cinderella chooses a pair of neighboring buckets, empties them into the river, and puts them back. Then the next round begins. The Stepmother’s goal is to make one of these buckets overflow. Cinderella’s goal is to prevent this. Can the wicked Stepmother enforce a bucket overflow?

Let us take the volumes as V1, V2, V3, V4 and V5. Cinderella (C) can only empty adjacent buckets. If the alternate buckets are more than 1 liter, she will not be able to empty both. Therefore she prevents this condition. For each i = 1 to 5, C must ensure that Vi + Vi+2 is at most one on each of her turn. She maintains this 'good' state.

The good configuration enforces return to good state. For example, if two buckets are empty, V4=V5=0, Then V1 + V3 <= 1 and V2 <= 1 because V2 + V4 <= 1 After the step mothers turn, V1 + V3 + V4 + V5 <= 2. Therefore either V5 V3 <=1 or V4 + V1 <=1 So C empties both V1 and V2 and the new configuration is still good V4 <=1 and V5 + V3 <=1

Since at the end of each turn for C, we have a good configuration, C has a winning strategy.

#################### Paper Review #####################

Today we discuss "Ontology based text summarization - The case of Texminer".
This paper talks about Texminer which does summarization by a combination of summarization techniques.Before delving into the techniques, it might be a good reminder to go over the summarization techniques.

One technique is to use the structure of the discourse to generate abstracts. The Rhetorical Structure Theory, for example, attempts to identify the internal structure of the text and the relations of the discourse formed within it, giving priority to the nuclear components of these relations.On the other hand, Marcu segments the text into small units of discourse. He then proceeds to build a rhetorical structure in the form of a tree by analyzing the set of relations that exist between the units. Once the discourse structure has been created, he assigns weight and an order to each element of the structure. - the higher the element within the structure the greater its weight.

Different ways of using the text discourse structure has been shown. Some have used the rhetorical status of affirmations contained in documents to identify their internal structure. The main contribution of these authors lies in the algorithm that deals with the non-hierarchical structure : given seven fixed catagories (aim, textual, own, background, contrast, basis and other) it is capable of distributing the contents of the articles within each category.

Another way has been to use templates to generate summaries and retrieve information. This technique can only be applied when the text is previously structured. Information retrieval systems include Fies that extracts financial information from digital articles.

Mateo combined superficial with deep structure analysis to enhance the coherence and cohesion of the abstract. Alonso and Fuentes for example combine lexical chains plus the rhetorical and argumentative structure, derived using discourse markers.

Using combination of complex linguistic techniques, Aretoulaki developed a model for automatic summarization that selects sentences using content features of a pragmatic and rhetorical nature, obtained by means of superficial linguistic analysis such as the Theory of Speech Acts, the Rhetorical structure Theory and the theories centered on cohesion and coherence.

Automatic production of abstracts was initially based on statistical methods. in the wake of research by Luhn and now uses diverse methodologies. With the extraction of terms or strings of significant words, systems build discourse models. These systems have been used to identify rhetorical structures highly related with the content of the documents and their organizational scheme.

The growth of cognitive science has allowed the incorporation of semantic-conceptual models.Together with the use of knowledge bases, these models vastly improve the process of summarizing texts on a specific topics.

The demand for domain knowledge has seen exceptional growth recently. Models for extraction and disambiguation of text has changed recently. It may even have become harder to find tools to help with such narrow domain texts. The lack of specific dictionaries, the absence of a defined theory and the dire need for professionals in the sector to have summarization tools to carry out their work has become pervasive. Documentation, Terminology and Natural Language Processing (NLP) indicate that there is a demand here in this narrow-domain texts.

This is where TexMiner may help solve the problem. It is based on the conviction that summarizing texts within a specialized domain requires a model capable of processing its semantic and socio-cognitive components.

The socio-cognitive user paradigm aspect of TexMiner takes into account the historic, social and cultural factors. These techniques consider a domain as the mindshare in terms of concepts, terms and knowledge from a community.
It uses ontologies as algebraic descriptions of a conceptual network in which the elements are binary relations established through concepts.
Lastly it uses three agents for processing information.
The first is a reading agent that reads the documents and labels them in XML. Here the goal is to discern if a document belongs to a particular domain based on the count of words that fall in the domain.
The second is a summarization agent that carries out automatic tagging of the syntactic discourse structure of the text. The design of the tags is governed by
- the sequence in which the levels are tagged in practice
- the symbols used to denote cohesive elements of the text, and
- the presence in the text of geographic elements, verbs statistical data formulae and processes.
The last agent is the Information retrieval agent. Texminer allows for the searches to be made through the ontology, as well as the lexical databases developed to take advantage of the functionalities of summaries for the purpose of text retrieval.

Thursday, September 17, 2015

User API or Account Management

Most internal organizations maintain their registered users in a database. As an identity provider, this suffices to maintain the current users However, different applications and services may need read only access to the registered users, their id, name and email with or without the direct access to the database. This is typically because the applications work with a single database at any time not many databases at the same time. If the database, happens to be different some form of dependency injection may be required for the application to continue with the assumption that it can reach the list of registered users.

Active Directory is a superset of all such users and is the final authority for knowing if the user is a valid entity or not. To check if the registered user is still current, we can defer to the AD. However, AD does not come with its own API other than LDIF.

Companies often have API wrappers around the AD to facilitate such function as creating and deleting groups. But users listing is not necessarily provided by an API.

Therefore an API access to registered users should be no surprise and a way to facilitate the addition or deletion of users may very well be required in certain cases.

It may even be helpful to separate out read-only access from read-write access to this users list.

As an example,

class UserViewSet(generics.ListAPIView):

    serializer_class = UserSerializer

    queryset = User.objects.all()

The Read-Write access can include checking for existing users and adding a new user or deleting an existing user. The attributes for user information typically involve status, email, full name, password, created, modified etc. There may be additional qualifications such as type or group, comments, last login time etc. It must be noted that we assume the registered users table to be unique for the application or organization we are considering. That is why such a centralized table requires an API for programmatic access. If the table can be different for different applications, then there is no need to write an API for each and instead import it directly into the database. Said another way, we are enforcing the sharing because we want to keep one master copy up to date instead of relying on copies, replicate, merge and sync between databases.

APIs in service oriented architecture model are very useful for exporting such data to be used in different applications or services.

[IMO Shortlist 2004]
Problem: A and B take turns writing a number as follows. Let N be a fixed positive integer. First A writes the number 1, and then B writes 2. Hereafter, in each move, if the current number is k, then the player whose turn it is can either write k + 1 or 2k, but no player can write a number larger than N. The player who writes N wins. For each N, determine who has a winning strategy.

Solution:

Step 1) if N is odd, A can win. This is because A can always write an odd number after which B has to write an even number and N becomes a P position

Step 2) All even numbers greater than N/2 are P-positions. This is because until N/2, we have the ability to double the number but not beyond that otherwise it will exceed N. After N/2 both players will have to increment the number by 1.

Step 3) If N = 4K or N = 4K + 2, then K is a P-position. This is because if X writes k, Y must write k+1 or 2K Then X writes 2k + 2 if Y writes k and X writes 4k if Y writes 2K. X has thus written an even number greater than N/2 and by step 2, X wins. X can be either A or B and Y is the other of the two.

Step 4) If X has a winning strategy for N = k, then X has a winning strategy for N = 4k and N = 4k + 2
Proof: Consider a game where N = 4K or 4K + 2. Based on the previous step, the goal can now be modified to write K first. How can player Y prevent X from writing K ? The answer is to jump over K. After k/2, the number can be doubled. But X can double the number again resulting in an even number that is at least equal to 2(k + 1) > N /2. So X wins by step 2.

The recursive method for defining the answer for even N is as follows:
The answer for N is the same as that for floor(N/4). To convert this recursion into an explicit answer, write N in base 4. The floor(N/4) is the same as removing the last digit when N is written in base 4. We keep removing the last digit and the resulting numbers will all be winning for the same player by the same recursion. If at some point the number obtained is odd, then A wins for this number and hence A wins for N. If the N has only 0s and 2s in its base 4 representation, then with recursion we end up with number 2. B wins in this case and therefore for N.

The moves for A involve :
Write 1 at the beginning
check if B's move has exceeded N
if N is odd, write the next odd number
if N is even and equal to 4K or 4K+2, recurse floor(N/4) as say c till you get c as odd or 0 or 2
if c is odd, then arrive at c by playing odd
if c is 0, then keep playing odd
if c is 2, then declare B winner

The moves for B similarly involve
Write 2 at the beginning
If N is odd, write the current number + 1 or current number * 2
if N is even, follow the same strategy as A

B wins only when the recursion stops at 2. Otherwise A wins with winning strategy as above.

Wednesday, September 16, 2015

Saint Petersburg 2001

The number 1000000 is written on a board. A and B take turns, each turn consisting of replacing the number n on the board with n-1 or floor((n+1)/2). The player who writes the number 1 wins. Who has the winning strategy ?

Answer:

The intuition here is that A wins for any even number as the numbers are replaced by something smaller.

Take n = 2, it can be replaced by one of the two given transformations 2 -1 or floor((2+1)/2) in the first move itself by A. Take n = 4 and it can be replaced as 3 by A given that 2 is the desirable state for A. We call the position 2 as N-position because the player whose turn it is to play can win. We call the position 3 as P-position because the player who has just played it can force a win. In this case, A having played 3 forces B to write 2 either by 3-1 or floor((3+1)/2) and in both cases A wins. Similarly we can show that A wins for n = 6.

In other words, all even numbers are positions where A can play first and force a win. To formalize this, we can experiment incrementally for the first few multiples of 2 . At some point 2k when we are satisfied, we say that all the even numbers less than 2k are N-positions.

With this assumption , we now have to show that 2K is also an N-position. If we are able to do so, we can progressively claim all 2K positions going forward are also going to be N-positions and therefore prove that all even numbers are N-positions. So how do we prove 2k is an N-position. We can show that 2K is an N-position because k is either a P position or an N position. For example k can be odd as when k=3 and A can write k directly and win. If k is a N position, A can write 2k-1 in the first move. B is then forced to choose either 2k-2 or floor((2k+1)/2) both of which results in k and A wins. Thus we have proved that all even numbers are N positions.

At this point the given input is an even number and as A goes first in the turns, A has the winning strategy.
If A removes kn stones where k < n and B removes jn where j < n then N - kn - jn is a P position because we assumed B wins. But A could have removed kn +jn so Acould have won contradicting our assumption

Russia 2011 adapted
There are N > n^2 stones on a table. A and B play a game. A begins,
and then they alternate. In each turn a player can remove k stones,
where k is a positive integer that is either less than n or a multiple
of n. The player who takes the last stone wins. Prove that A has a
winning strategy.

Here either A or B can have a N or a P position which by the description is a winning strategy for turn of the next player or the one having just now played respectively
Let us suppose to the contrary that B has a winning strategy.
However it is not sufficient to show that B cannot win in a few cases. We have to show it also in cases where B and A respond to each others moves. Thankfully these are also similar in nature to what we described.
Let A remove kn stones in the first move. Then B removes f (k) stones in response. Then N - kn - f (k) is a P position. Now kn can be the same as jn for eliciting the same response from B because the distribution covers the same range uniformly between 1 and n-1. Therefore, N-jn-f(k) is also a P position. But suppose k < j and A removes kn stones followed by B who removes f (k) stones then A removes (j-k)n stoned leaving N-jn-f (k) and that would be a P position as seen earlier. In that case A would win and it would contradict the assumption we made.

Tuesday, September 15, 2015

2014 IMO problem

Let n >= 2 be an integer. Consider an n x n chessboard consisting of n ^ 2unit squares. A configuration of n rooks on this board is ‘peaceful’ if every row and every column contains exactly one rook. Find the greatest positive integer k such that, for each peaceful configuration of n rooks, there is a k x k square which does not contain a rook on any of its k ^ 2 squares.

Let us suppose that k = [sq.rt(n)] – 1 where [n] is the ceiling function of n.

Each n x n chessboard with a peaceful configuration of n rooks contains a valid k x k square. How do we prove this ? Take a rook and place it at the top most row. This column can be part of k consecutive columns, only one of which contains rook R. The consecutive columns can be chose including those on the left of R or to the right of R or both. There are n-k+1 of such consecutive columns and therefore as many k x k squares.

Now, n-k+1 = n- [sq.rt(n)] +2

= n-sq.rt(n)+2

> n – sq.rt(n) + 1

= sq.rt(n)(sq.rt(n) - 1) + 1

> (sq.rt(n)-1)(sq.rt(n)-2)+1

= k(k-1) + 1

From the above expression, from these many in the k x k squares in C, there exists a k x k square not covered by any rook. We say this because we can distribute those many squares above such that we leave the k x k square unoccupied. And so our supposition is a valid choice.

Now we have to mutually exclude this region and show that its still possible to have a chess board as required. Without this (k+1) times (k+1) square, we can still arrange the chess board if we proceed this way. First place a rook at the upper left corner of the board. Next place a rook at one square to the right and k+1 spaces down of the second rook. Repeat until the position is outside the bounds of the chessboard in which case wrap around to the top and continue placing the other rooks. Indeed such a board exists and the arrangement of the rooks is clearly peaceful

Because (k+1)^2 > n, it has the property that any (k+1)x(k+1) square in the board will be occupied by at least one rook. This complete the proof.

[Lithuania 2010]
Problem: In an m × n rectangular chessboard there is a stone in the lower leftmost square. A and B move the stone alternately, starting with A. In each step one can move the stone upward or rightward any number of squares. The player who moves it into the upper
rightmost square wins. Find all (m, n) such that A has a winning strategy.
Solution: Note that in this problem, the moves possible are either one up or to the right. To move diagonally, you need to exchange turns.
Therefore the winning strategy would be to occupy the diagonals ourselves. Moreover, the diagonal is only for the square portion of the chessboard of size say m x m not the rectangular portion of the m x n. In the rectangular portion, the moves will be linear since we would have already reached the top row and the squares will be equally and alternatively occupied.