Cluster computing

Monday, September 21, 2015

Example 10 [China 2010, Problem 5]
There are some (finite number of) cards placed at the points A1, A2, …, An and O, where n ≥ 3. We can perform one of the following operations in each step:

(1) If there are more than 2 cards at some point Ai, we can remove 3 cards from this point and place one each at Ai-1, Ai+1 and O (here A0 = An and An+1 = A1)

(2) If there are at least n cards at O, we can remove n cards from O and place one each at A1, A2, …, An.

Show that if the total number of cards is at least n2+3n+1, we can make the number of cards at each vertex at least n + 1 after finitely many steps.

Solution : The total number of cards stays the same.

(a) if we balance out the cards at each Ai, then there won't be disproportionate cards at any one point.
(b) We can make each of the Ai's have 0, 1, or 2 cards.
(c) from observation b, we can have 1,2 or 3 cards from the stage where they have 0,1 or 2 cards, by applying operation 2.
(d) based on observation c, we can make each of the Ai's have 1, 2, or 3 cards. Suppose x of the Ai's have one card, y of the Ai's have 2 cards and z of the Ai's have 3 cards. The number of cards at O is (n2+3n+1)-(x+2y+3z) Since (x+y+z) = n, the latter component is <= 2n if x >= z and O will have n2+n+1 cards. Then all of Ai's will now have at least n+1 cards and O will have n2+n+1-n2 cards and we are done.
Based on observation d(), we can start with points on the perimeter having 1,2 or 3 cards and ends in a position having 1, 2 or 3 cards but the number of points having 3 cards is not more than those with 1 card. We can do this by ensuring that any two points having three cards has a point with one card in between.

We can do this by applying 1 on all consecutive 3's to transform the set (x,3, ... 3,y) to (x+1, 1, 2 ... 2, 1, y+1) and there are no adjacent 3's. Now suppose there are two 3's with only 2's in between them like (x,3,2,2, ... 2, 3, y). Then we can do operation 1) on the first 3 to covert its adjacent to 3 and then again to the next adjacent 3. This we can please repeat as long as there are no adjacent 3's.

Sunday, September 20, 2015

IMO 2007
In a mathematical competition some competitors are friends; friendship is always mutual. Call a group of competitors a clique if each two of them are friends. The number of members in a clique is called its size. It is known that the size of the largest clique(s) is even. Prove that the competitors can be arranged in two rooms such that the size of the largest cliques in one room is the same as the size of the largest cliques in the other room.
Let M be one of the cliques of largest size. It is given that the largest size is even, therefore abs(M) = 2m. Let's put all of them in the same room A and all the others in room B. Now if c(A) and c(B) denotes the size of the largest cliques in rooms A and B at a given point in time. Since M is a clique of the largest size, we initially have c(A) >= c(B)
Now our task is to balance them and so we move members from room A to room B. As we break the clique membership in room A, we decrease its size by 1. In the other room the clique size may or may not increase by 1. In other words, it may increase by at most 1. Therefore while the clique size of A is greater than B, keep sending one member to room B.
The Balance happens when c(A)<=c(B)<=c(A) +1. In other words they are just about the same or exactly the same. At this point the size of the clique in room A has to be at least m. We can argue the contrary and see that we prove this. If there were less than m in A, then there would be m+1 in B and at most m-1 in room A, implying c(B)-c(A) >= (m+1)-(m-1) = 2. But we just now argued that they should be just about the same or exactly the same when we stop balancing.
if c(A) = c(B), we are done. At this point the required proof holds already.
So far we have covered the cases when the they are unbalanced or overbalanced or exactly the same.
All that is left now is when they are just about balanced which is the case when the clique sizes in the two rooms differ by one member.
Let us say c(A) = k and c(B) = k + 1. Now if there is a competitor in B who is also in M but is not in the biggest clique in B then by sending her to A we increase the clique size there by 1 but do not affect the clique size of B. At this point again we are done.
Now suppose that there is no such competitor. This case remains to be handled. For this case, we take a member from the clique of B and send it to A. The clique size reduces in B but it does not increase the size of A. How do we guarantee that ? Well suppose there is no competitor in B who is also in M but not in the biggest clique of B This would mean that the all the members of the intersection B and M would be in the cliques of sizes k+1 That means the competitor would have no membership in cliques of A and therefore no increase in the size of A.

There are 2n people seated around a circular table, and m cookies are distributed among them. The cookies can be passed under the following rules:

(a) Each person can only pass cookies to his or her neighbors

(b) Each time someone passes a cookie, he or she must also eat a cookie
Let A be one of these people. Find the least m such that no matter how m cookies are distributed initially, there is a strategy to pass cookies so that A receives at least one cookie.

Let us denote the people with symbols such as A-n, A-n+1, A-n+2, ...A-1, A0, A1, ... An where A-n and An are the same person since it is a circular table.

A weight 1/2^abs(i) is assigned to each cookie held by a person Ai. Thus for example if A3 passes a cookie to A2, the cookie's weight increases from 1/8 to 1/4. Since A3 must also eat a cookie of weight 1/8 in this step, we see in this case, the sum of the weights of all the cookies has remained the same. The sum of the weight of all the cookies has remained the same. If Ai has ai cookies for each i, then the total weight of all cookies is Sum of i from -n+1 to n is ai / 2^ abs(i).

Whenever a cookie is passed towards A0, (from Ai to Ai-1 or the reverse direction), one cookie is eaten and another cookie doubles its weight, so the total weight remains invariant. If a cookie has passed away from A0, then the total weight decreases. Thus the total weight is indeed a monovariant.

If m >= 2^n, we can always ensure that the A0 gets a cookie. Any of the directions could be chosen but it should not pass A0 because the weight would reduce. In each step therefore a cookie progresses towards A0 from either side of the diametrically opposite end. We use a new quantity to indicate the direction. Let W+ be the sum of the weights of the cookies held by A0, A1 ... An and let W be the sum of the weights of cookies held by A0, A-1, A-2 ... and we can suppose W+ >= W=. Then this suggests An can pass cookies to only An-1 and we use only this semi-circle containing non-negative indices, since this is the semi-circle having more weight. In each step, as m any cookies are passed as possible to An-1 and similarly forwarded. This works only if and only if W+ is > 1 which is necessary as W+ is a monovariant.
To show that this is sufficient, we note that the algorithm leaves the W+ an invariant. The algorithm terminates when we cannot pass anymore cookies from any of the Ai with i positive, and A0 does not have any cookies.A1, A2, ... An all have at most one cookie at the end. If they had more, they would eat one and pass one and the algorithm would not have terminated. Then W+ would sum upto 1/2 + 1/4 + ... + 1/2 ^ n < 1, contradicting the fact that W+ is an invariant and >= 1. Thus W+ >= 1 is a sufficient condition for the algorithm to work. Finally we prove that we indeed have W+ > 1 We assumed W+ > W-. Now simply note that each cookie contributes at least 1 / 2 ^(n-1) to the sum(W+ and W-), because each cookie has weight at least 1/2^(n-1) except for cookies at An. However, cookies at An are counted twice since they contribute to both W+ and W-, so they also contribute 1/ 2^n-1 to the sum. Since we have at least 2 ^ n cookies, W+ and W- >= 2, so W+ >= 1 and have proved that this is both necessary and sufficient.

Note that the use of a geometric progression ensures that the cookie can be consumed and passed along and using the property of the sum of this progression we bound it at the diametrically opposite end of the table because that suffices to have everyone get a cookie.

The criteria for weights on a semi-circle to be more than 1 is therefore of a consequence of the above.

Saturday, September 19, 2015

IMO Shortlist 1994

Peter has 3 accounts in a bank, each with an integral number of dollars. He is only allowed to transfer money from one account to another so that the amount of money in the latter is doubled. Prove that Peter can always transfer all his money into two accounts. Can he always transfer all his money into one account?

Let A, B, C be the number of dollars in the account 1, account 2 and account 3 respectively and A <= B <= C. If A = 0, we are done. At any time we want to reduce min(A,B,C) to the point where it is 0. The values of A, B and C can keep changing

Euclidean theorem monotonically reduces a number by using form B = qA + r with 0 <= r < A. We use this form. With this form if we can reduce B to r then we are done. Since r < A, we would have reduced min(A,B,C) which was our aim.

The question however says we have to double so we use binary representations. Let

q = 2M^k + ... + 2M1 + M0 be the binary representation of q and where Mi is 0 or 1. To reduce B to r, in step i of the algorithm, we transfer money to account 1. The transfer is from account 2 if Mi-1 = 1 and from account 3 if Mi-1 = 0. The number of dollars in the first account starts with A and keeps doubling in each step. Thus we transfer Aq dollars from account 2 to account 1, and we are left with B-Aq = r dollars in account 2. At this point we have reduced min

The answer to the last question is that it is not possible when the number is odd.

Lets try this out with sample numbers

We have accounts with 2, 5, 16

B = 2* A + 1

so q =2 and it can be written as 0 + 2 * 1 for the binary representation of 2

We transfer Aq = 2 * 2 = 4 dollars to 1 and now have 6, 1,16

Now A, B, C has changed and we have A = 1, B= 6, C = 16
now B has q=6 and r = 0 and since m1 was 1 and m2 is 1, we have the transfer again from B.
This leaves us with A=0, B=7 and C=16.

Note that we cannot reduce any further.

Friday, September 18, 2015

Today we discuss another Olympiad problem.
[IMO Shortlist 2009]
Five identical empty buckets of 2-liter capacity stand at the vertices of a regular pentagon. Cinderella and her wicked Stepmother go through a sequence of rounds: At the beginning of every round the Stepmother takes one liter of water from the nearby river and distributes it arbitrarily over the five buckets. Then Cinderella chooses a pair of neighboring buckets, empties them into the river, and puts them back. Then the next round begins. The Stepmother’s goal is to make one of these buckets overflow. Cinderella’s goal is to prevent this. Can the wicked Stepmother enforce a bucket overflow?

Let us take the volumes as V1, V2, V3, V4 and V5. Cinderella (C) can only empty adjacent buckets. If the alternate buckets are more than 1 liter, she will not be able to empty both. Therefore she prevents this condition. For each i = 1 to 5, C must ensure that Vi + Vi+2 is at most one on each of her turn. She maintains this 'good' state.

The good configuration enforces return to good state. For example, if two buckets are empty, V4=V5=0, Then V1 + V3 <= 1 and V2 <= 1 because V2 + V4 <= 1 After the step mothers turn, V1 + V3 + V4 + V5 <= 2. Therefore either V5 V3 <=1 or V4 + V1 <=1 So C empties both V1 and V2 and the new configuration is still good V4 <=1 and V5 + V3 <=1

Since at the end of each turn for C, we have a good configuration, C has a winning strategy.

#################### Paper Review #####################

Today we discuss "Ontology based text summarization - The case of Texminer".
This paper talks about Texminer which does summarization by a combination of summarization techniques.Before delving into the techniques, it might be a good reminder to go over the summarization techniques.

One technique is to use the structure of the discourse to generate abstracts. The Rhetorical Structure Theory, for example, attempts to identify the internal structure of the text and the relations of the discourse formed within it, giving priority to the nuclear components of these relations.On the other hand, Marcu segments the text into small units of discourse. He then proceeds to build a rhetorical structure in the form of a tree by analyzing the set of relations that exist between the units. Once the discourse structure has been created, he assigns weight and an order to each element of the structure. - the higher the element within the structure the greater its weight.

Different ways of using the text discourse structure has been shown. Some have used the rhetorical status of affirmations contained in documents to identify their internal structure. The main contribution of these authors lies in the algorithm that deals with the non-hierarchical structure : given seven fixed catagories (aim, textual, own, background, contrast, basis and other) it is capable of distributing the contents of the articles within each category.

Another way has been to use templates to generate summaries and retrieve information. This technique can only be applied when the text is previously structured. Information retrieval systems include Fies that extracts financial information from digital articles.

Mateo combined superficial with deep structure analysis to enhance the coherence and cohesion of the abstract. Alonso and Fuentes for example combine lexical chains plus the rhetorical and argumentative structure, derived using discourse markers.

Using combination of complex linguistic techniques, Aretoulaki developed a model for automatic summarization that selects sentences using content features of a pragmatic and rhetorical nature, obtained by means of superficial linguistic analysis such as the Theory of Speech Acts, the Rhetorical structure Theory and the theories centered on cohesion and coherence.

Automatic production of abstracts was initially based on statistical methods. in the wake of research by Luhn and now uses diverse methodologies. With the extraction of terms or strings of significant words, systems build discourse models. These systems have been used to identify rhetorical structures highly related with the content of the documents and their organizational scheme.

The growth of cognitive science has allowed the incorporation of semantic-conceptual models.Together with the use of knowledge bases, these models vastly improve the process of summarizing texts on a specific topics.

The demand for domain knowledge has seen exceptional growth recently. Models for extraction and disambiguation of text has changed recently. It may even have become harder to find tools to help with such narrow domain texts. The lack of specific dictionaries, the absence of a defined theory and the dire need for professionals in the sector to have summarization tools to carry out their work has become pervasive. Documentation, Terminology and Natural Language Processing (NLP) indicate that there is a demand here in this narrow-domain texts.

This is where TexMiner may help solve the problem. It is based on the conviction that summarizing texts within a specialized domain requires a model capable of processing its semantic and socio-cognitive components.

The socio-cognitive user paradigm aspect of TexMiner takes into account the historic, social and cultural factors. These techniques consider a domain as the mindshare in terms of concepts, terms and knowledge from a community.
It uses ontologies as algebraic descriptions of a conceptual network in which the elements are binary relations established through concepts.
Lastly it uses three agents for processing information.
The first is a reading agent that reads the documents and labels them in XML. Here the goal is to discern if a document belongs to a particular domain based on the count of words that fall in the domain.
The second is a summarization agent that carries out automatic tagging of the syntactic discourse structure of the text. The design of the tags is governed by
- the sequence in which the levels are tagged in practice
- the symbols used to denote cohesive elements of the text, and
- the presence in the text of geographic elements, verbs statistical data formulae and processes.
The last agent is the Information retrieval agent. Texminer allows for the searches to be made through the ontology, as well as the lexical databases developed to take advantage of the functionalities of summaries for the purpose of text retrieval.

Thursday, September 17, 2015

User API or Account Management

Most internal organizations maintain their registered users in a database. As an identity provider, this suffices to maintain the current users However, different applications and services may need read only access to the registered users, their id, name and email with or without the direct access to the database. This is typically because the applications work with a single database at any time not many databases at the same time. If the database, happens to be different some form of dependency injection may be required for the application to continue with the assumption that it can reach the list of registered users.

Active Directory is a superset of all such users and is the final authority for knowing if the user is a valid entity or not. To check if the registered user is still current, we can defer to the AD. However, AD does not come with its own API other than LDIF.

Companies often have API wrappers around the AD to facilitate such function as creating and deleting groups. But users listing is not necessarily provided by an API.

Therefore an API access to registered users should be no surprise and a way to facilitate the addition or deletion of users may very well be required in certain cases.

It may even be helpful to separate out read-only access from read-write access to this users list.

As an example,

class UserViewSet(generics.ListAPIView):

    serializer_class = UserSerializer

    queryset = User.objects.all()

The Read-Write access can include checking for existing users and adding a new user or deleting an existing user. The attributes for user information typically involve status, email, full name, password, created, modified etc. There may be additional qualifications such as type or group, comments, last login time etc. It must be noted that we assume the registered users table to be unique for the application or organization we are considering. That is why such a centralized table requires an API for programmatic access. If the table can be different for different applications, then there is no need to write an API for each and instead import it directly into the database. Said another way, we are enforcing the sharing because we want to keep one master copy up to date instead of relying on copies, replicate, merge and sync between databases.

APIs in service oriented architecture model are very useful for exporting such data to be used in different applications or services.

[IMO Shortlist 2004]
Problem: A and B take turns writing a number as follows. Let N be a fixed positive integer. First A writes the number 1, and then B writes 2. Hereafter, in each move, if the current number is k, then the player whose turn it is can either write k + 1 or 2k, but no player can write a number larger than N. The player who writes N wins. For each N, determine who has a winning strategy.

Solution:

Step 1) if N is odd, A can win. This is because A can always write an odd number after which B has to write an even number and N becomes a P position

Step 2) All even numbers greater than N/2 are P-positions. This is because until N/2, we have the ability to double the number but not beyond that otherwise it will exceed N. After N/2 both players will have to increment the number by 1.

Step 3) If N = 4K or N = 4K + 2, then K is a P-position. This is because if X writes k, Y must write k+1 or 2K Then X writes 2k + 2 if Y writes k and X writes 4k if Y writes 2K. X has thus written an even number greater than N/2 and by step 2, X wins. X can be either A or B and Y is the other of the two.

Step 4) If X has a winning strategy for N = k, then X has a winning strategy for N = 4k and N = 4k + 2
Proof: Consider a game where N = 4K or 4K + 2. Based on the previous step, the goal can now be modified to write K first. How can player Y prevent X from writing K ? The answer is to jump over K. After k/2, the number can be doubled. But X can double the number again resulting in an even number that is at least equal to 2(k + 1) > N /2. So X wins by step 2.

The recursive method for defining the answer for even N is as follows:
The answer for N is the same as that for floor(N/4). To convert this recursion into an explicit answer, write N in base 4. The floor(N/4) is the same as removing the last digit when N is written in base 4. We keep removing the last digit and the resulting numbers will all be winning for the same player by the same recursion. If at some point the number obtained is odd, then A wins for this number and hence A wins for N. If the N has only 0s and 2s in its base 4 representation, then with recursion we end up with number 2. B wins in this case and therefore for N.

The moves for A involve :
Write 1 at the beginning
check if B's move has exceeded N
if N is odd, write the next odd number
if N is even and equal to 4K or 4K+2, recurse floor(N/4) as say c till you get c as odd or 0 or 2
if c is odd, then arrive at c by playing odd
if c is 0, then keep playing odd
if c is 2, then declare B winner

The moves for B similarly involve
Write 2 at the beginning
If N is odd, write the current number + 1 or current number * 2
if N is even, follow the same strategy as A

B wins only when the recursion stops at 2. Otherwise A wins with winning strategy as above.