Cluster computing

Thursday, November 16, 2017

We continue our discussion on modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorization and optimizations.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. Values for the parameters are obtained from estimation techniques. It generally involves an iterative process to minimize the average difference. The quality of the inverse model is evaluated using well known mathematical techniques as well as intuition.
The steps for inverse modeling of data include:
1) selecting an appropriate mathematical model using say polynomial or other functions
2) defining an objective function that agrees between the data and the model
3) adjusting model parameters to get a best fit usually by minimizing the objective function
4) evaluating goodness of fit to data by not being perfect due to measurement noise
5) estimating accuracy of best fit parameter values
6) determining whether a much better fit is possible which might be necessary if there is local minima

There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters. ain order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
#codingexercise
Find next greater number using the same digits as the given number. If no other number is possible return the original
Int GetNextGreater(uint n)
{
Var digits = Int.ToDigits(n);
If (digits.IsEmpty()) return 0;
Int I = 0;
Int J = 0;
// find the start for change in digits
For (int i = digits.Count-1;I > 0; I--)
{
If (digits[I] > digits[I-1]) {
break;
}
If (I == 0) return n;
//find the substitute and sort the digits from position
Int min = I;
For (j = I+1; j < digits.Count; j++)
If (digits[j] > digits[I-1] && digits[j] < digits[min])
min = j;
Swap(digits, min, I-1)
returnDigits.GetRange(0,I-1).Union(digts.GetRange(I, digits.Count-I+1).Sort()).ToList().ToInteger();

}

There is an alternative to getting the number as above. It simply rolls the number forward until each number has the other number has the same count of each digits.

Wednesday, November 15, 2017

We continue our discussion on modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorization and optimizations.
A forward model is a mathematical model that is detailed enough to include the desired level of real world behaviour or features. It is used for simulating realistic experimental data which under the right constraints can be used to test hypothesis. While it may be too complicated to fit experimental data, it can be used to generate synthetic data sets for evaluating parameters.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. Values for the parameters are obtained from estimation techniques. It generally involves an iterative process to minimize the average difference. The quality of the inverse model is evaluated using well known mathematical techniques as well as intuition.
A forward-inverse modeling is a process to combine data simulation with model fitting so that all parameters can be sufficiently evaluated for robustness, uniqueness and sensitivity. This is very powerful for improving data analysis and understanding the limitations.
A good inverse model should have a good fit and describe the data adequately so that some insights may follow. The parameters are unique and their values are consistent with the hypothesis and changes to experimental data in response to alterations in the system.
The steps for inverse modeling of data include:
1) selecting an appropriate mathematical model using say polynomial or other functions
2) defining an objective function that agrees between the data and the model
3) adjusting model parameters to get a best fit usually by minimizing the objective function
4) evaluating goodness of fit to data by not being perfect due to measurement noise
5) estimating accuracy of best fit parameter values
6) determining whether a much better fit is possible which might be necessary if there is local minima as compared to global minimum.
#codingexercise
Given an array and an integer k, find the maximum for each and every contiguous subarray of size k.
List<int> GetMaxInSubArrayOfSizeK(List<int> A, int k)
{
var ret = new List<int>();
var q = new Deque<int>();
for (int i = 0; i < k; i++)
{
while ( (q.IsEmpty() == false) && A[i] >= A[q.Last()])
q.PopLast();

q.AddLast(i);
}

for (int i = k ; i < A.Count; i++)
{
ret.Add(A[q.PeekFirst()]);

while ( (q.IsEmpty() == false) && q.PeekFirst() <= i - k)
q.PopFirst();

while ( (q.IsEmpty() == false) && A[i] >= A[q.PeekLast()])
q.PopLast();

q.AddLast(i);
}

if (q.IsEmpty () == false)
ret.Add(A [q.PeekFirst()]);
return ret;
}

Tuesday, November 14, 2017

We resume our discussion on modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorization and optimizations.
Sometimes we relied on experimental data to corroborate the model and tune it. Other times, we simulated the model to see the predicted outcomes and if it matched up with the observed data. There are some caveats with this form of analysis. It is merely a representation of our understanding based on our assumptions. It is not the truth. The experimental data is closer to the truth than the model. Even the experimental data may be tainted by how we question the nature and not nature itself. This is what Heisenberg and Covell warn against. A model that is inaccurate may not be reliable in prediction. Even if the model is closer to truth, garbage in may result in garbage out
Any model has a test measure to determine its effectiveness. since the observed and the predicted are both known, a suitable test metric may be chosen. for example the sum of squares of errors or the F-measure may be used to compare and improve systems.
A forward model is a mathematical model that is detailed enough to include the desired level of real world behaviour or features. It is used for simulating realistic experimental data which under the right constraints can be sued to test hypothesis. While it may be too complicated to fit experimental data, it can be used to generate synthetic data sets for evaluating parameters.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. Values for the parameters are obtained from estimation techniques. It generally involves an iterative process to minimize the average difference. The quality of the inverse model is evaluated using well known mathematical techniques as well as intuition.
A forward-inverse modeling is a process to combine data simulation with model fitting so that all parameters can be sufficiently evaluated for robustness, uniqueness and sensitivity. This is very powerful for improving data analysis and understanding the limitations.
A good inverse model should have a good fit and describe the data adequately so that some insights may follow. The parameters are unique and their values are consistent with the hypothesis and changes to experimental data in response to alterations in the system.

#codingexercise
Given an array and an integer k, find the maximum for each and every contiguous subarray of size k.
List<int> GetMaxFromSubArraysOfSizeK(List<int> A, int k)
{
var ret = new List<int>();
int max = INT_MIN;
for (int i = 0; i <= A.Count-k; i++)
{
max = A[i];

for (j = 1; j < k; j++)
{
if (A[i+j] > max)
max = A[i+j];
}
ret.Add(max);
}
return ret;

}

Monday, November 13, 2017

We were discussing our recommender software yesterday and the day before.The recommender might get geographical location of the user, the time of the day and search terms from the owner. These are helpful to predict the activity the owner may take. The recommender does not need to rebuild the activity log for the owner but it can perform correlations for the window it operates on. If it helps to build the activity log for year to date, then the correlation can become easier by translating to queries and data mining over the activity log.
The activity log has a natural snowflake schema and works well for warehouse purposes as well. Addition to the activity log may be very granular or coarse or both and these may be defined and labeled as per information found in the past or input from user. the activity log has a progressive order as in time series database and allows standard query operator in date ranges. By virtue of allowing user to accrue this log from anywhere and on any device, this database is well suited to be in the cloud. public cloud databases or virtual data warehouse are well suited for this purpose. when the recommender performs correlations for the owner, it discovers activities by the owner. These activities are recorded in this database. if the recommender needs to search for date ranges it can quickly use the activity log it helped build. Activity log for the owner gives the most information about the owner and consequently helps with recommendations.
Since the recommender queries many data sources, it is not limited to the Activity Log but it eventually grows the Activity Log to be the source of truth about the owner.
#codingexercise

Segregate and Sort even and odd integers:

List <int> SegregateAndSort (List <int> input)

{

var odd = input.Select ( x => x % 2 == 1 ).ToList();

var even = input.Select ( x => x % 2 == 0).ToList ();

return even.Sort ().Union (odd.Sort()).ToList ();

}
and test for our k-means classifier: https://github.com/ravibeta/cexamples/blob/master/classifiertest.c

Sunday, November 12, 2017

We were discussing the personal recommender yesterday. The recommender has access to more data sources than conventional web applications and can perform more correlation than ever before. when integrated with social engineering application such as Facebook or Twitter the recommender finds information about the friends of the owner. places that they have visited or activities that they have posted become relevant to the current context for the owner. In this case, the recommender super imposes what others have shared with the owner. The context may be a place or activity to be used in the search query to pull data from these social engineering applications. This is not intrusive to others and does not raise privacy issues. Similarly it does not instigate movements or flash mob because the will to act on the analysis still rests with the owner. The level of information made available by the social engineering applications is a setting in that application and independent from the recommender. there are no HIPAA violations and whether a user shared his or her visit to a pharmacy or hospital is entirely up to the user. It does provide valuable insight to the owner of the recommender when she decides to find a doctor.
The recommender does not have to take any actions. whether the owner chooses to act on the recommendations or publish it on Facebook is entirely her choice. this feedback loop may be appealing to her friends and their social engineering application but it is an opt in.
The recommender is a personal advisor who can use intelligent correlation algorithms. For example, it can perform collaborative filtering using the owner's friends as data point. In this technique, the algorithm finds people with tastes similar to the owner and evaluates a list to determine the collective rank of items. While standard collaborative filtering uses viewpoints from a distributed data set, the recommender may adapt it to include only the owner's friends.
The recommender might get geographical location of the user, the time of the day and search terms from the owner. These are helpful to predict the activity the owner may take. The recommender does not need to rebuild the activity log for the owner but it can perform correlations for the window it operates on. If it helps to build the activity log for year to date, then the correlation can become easier by translating to queries and data mining over the activity log.

Saturday, November 11, 2017

The personal recommender
we want a guide, a local expert when we visit a new place. if we are doing something routine, we don't need any help. if we are in a new place or we are experiencing something new, we appreciate information. we do this today by flipping brochures, maps and ads. we use a scratch pad and a pen to put together a list. we know our tastes and we know how to find a match in the new environment. Booking a travel itinerary involves a recommender from the travel website. we get choices for our flight searches and hints to add a hotel and a car. Some such websites involve intelligent listings based on past purchases, reward program memberships and even recommendations based on a collaborative filtering technique from other travelers. We appreciate these valuable tips on the otherwise mundane listing of flights and hotels ordered by price.
Therefore information is valuable when we explore. it is similar to wearing google eyeglasses or anything that hones our senses. Maps do this in the form of layers overlaid over the geographic plot but the data is merely spatial and improved with annotations. It is not temporal as in what events are happening at a venue next to a hotel. It also does not give a time based activity recordings that we can rewind and forward to see peak and low traffic. such data would be gigantic to store statically with maps. besides they may not be relevant all the time as business change.
A recommender on the other hand can query many data sources over the web. for example, it can query credit card statements to find patterns in spending and use it to highlight related attractions in a new territory. wouldn't it be helpful to have our login pull up restaurants that match our eating habits instead of cluttering up the map with all those in the area ? similarly, wouldn't it be helpful to use our social graph to annotate venues that our friends visited ? The recommender may be uniquely positioned to tie in a breadth of resources from public and private data sources. many web applications such as deli.cio.us and kayak make data available for browsing and searching.
This recommender is empowered to add a personal touch to any map or dataset by correlating events and activities to the user's profile and history. By integrating signing in credentials across banks, emails, bills, and other agencies, the recommender gets to know more about our habits and becomes more precise in the recommendations. Moreover, the recommender can keep learning more and more about us and it accrues habits and recommendations and improves it with feedback loop.
The recommender is also enabled to query trends and plot charts to analyze data in both spatial and temporal dimensions. it could list not only the top 5 in any category but also the items that remained in the top 5 week after week. These and such other aggregation and analysis techniques indicate tremendous opportunity for the recommender to draw the most appealing and intuitive information layer over any map.
The recommender is also not limited to standard querying operators and can employ a variety of statistical models and machine learning algorithms to better judge on our behalf. By using different ways to determine correlations, a recommender becomes closer to truth.

The personal coder
Everybody could do with their own assistant. Speech recognition based software such as Alexa, Siri and Cortana are able to understand simple commands. While they know how to locate an item of interest they can only serve what is readily available either in context of the device or online from the world wide web. Instructions like a one-word genre of music is automatically translated to playing that music from the genre in the collection available to it.
These assistants are being made smarter to understand the relevance of the instruction and to narrow down the execution of the command to improve satisfaction. Some of these techniques have utilized artificial intelligence and the abilities to group, sort, rank, learn with neurons and make recommendations. There is definitely a lot of improvements on the horizon here given that we have just recently started making this territory mainstream.
However, I introduce the notion of a personal automation assistant, when we are not just looking for one item and we have more than one task to sequence, then giving repeated instructions itself becomes a chore. Try saying play violin ten times and you need an integrator along with a command to play it once. Similarly, if you need the result of one task to be taken as the input for the second, then we require a script that knows how to integrate between two actions. Expand the definition of instructions to be able to do assorted tasks such as turn on the toaster or strand music to the bathroom via heterogeneous systems and you need a software solution integrator to write code to make that automation.
Fortunately, nowadays devices and applications can work independently while allowing scriptable programmatic access for remote invocation. Even the language of interaction has become standard and easy to be invoked over what are called REST based APIs. These APIs follow well defined conventions and executing a task merely translates to making one or more API calls in the correct form and order. These APIs can be further explored for their corresponding resource with the help of a technique that makes them readable like a book. Therefore, the availability of standard API and their invocations is straightforward task of mix and match which can be folded into the portfolio of tasks that these assistants perform. Hence the notion of a personal coder.
It may behoove us to note that coding is one of the natural and immensely expansive capability that can be added to the assistant. There are many more roles other than a coder that an be folded into repertoire of tasks that the assistant may assume. Professionals such as accounting, transcribing, remote management are merely describable as automatable tasks and consequently roles for the assistant. Finally, ‘the’ becomes equivalent to an irreplaceable assistant.
The emphasis made here is about making API calls over the HTTP by picking the right API and supplying the right parameters as an example for the coding assistant. Scripts that are as easily written as a single curl command are probably more suited for this kind of assistant. For complex operations, we naturally want manual intervention. Another way to look at improving the assistant's capabilities is to make more data sources available for the already smooth-running operations of the assistant. For example, Google used to ship search appliances that worked on the customer's premise for their proprietary data. We could now use a similar concept for the voice activated assistant.