Thursday, November 30, 2017

We resume our discussion about correlation versus regression. We saw that one of the best advantage of a linear regression is the prediction with regard to time as in independent variable. When the data point have many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves as good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also a scatter plot s only changing in one dependent variable in conjunction with the independent variable. Thus lets us pick the dimension we consider to fit the linear regression independent of others. Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.
We also saw how model parameters for linear regressions are computed. We saw how the best values for the model parameters can be determined from the
The correlation coefficient describes the strength of the association between two variables. If the two variables, the correlation coefficient tends to +1. If one decreases as another increases, the correlation coefficient tends to -1.  If they are not related to one another, the correlation coefficient stays at zero. In addition, the correlation coefficient can be related to the results of the regression. This is helpful because we now find a correlation not between parameters but between our notions of cause and effect. This also leads us to use correlation between any x and y which are not necessarily independent and dependent variables.  This follows from the fact that the correlation coefficient (denoted by r) is symmetric in x and y. This differentiates the coefficient from the regression.
Non-linear equations can also be "linearized" by selecting a suitable change of variables.  This is quite popular because it makes the analysis simpler. But reducing the dimensions is prone to distortion of the error structure. It is a oversimplification of the model.  It violates key assumptions and impacts the resulting parameter values. All of this contributes toward incorrect predictions and are best avoided.  Non-linear squares analysis has well defined techniques that are not too difficult with computing. Therefore it is better to do non-linear square analysis when dealing with non-linear inverse models.
#codingexercise
Given three sorted arrays, find one element from each array such that the element is closest to the given element. All the elements should be from different arrays.
For Example :-
A[] = {1, 4, 10}
B[] = {2, 15, 20}
C[] = {10, 12}
Given input: 10
Output: 10 15 10
10 from A, 15 from B and 10 from C
List<int> GetClosestToGiven(List<int> A, List<int> B, List<int>C, int value)
{
assert (A.Count > 0 && B.Count > 0 && C.Count > 0);
var ret = new List<int>();
ret.Add(GetClosest(A, value)); // using binary search
ret.Add(GetClosest(B, value));
ret.Add(GetClosest(C, value));
return ret;
}
int GetClosest(List<int> items, int value)
{
int start = 0;
int end = items.Count-1;
int closest = items[start];
while (start < end)
{
closest = Math.Abs(items[start]-value) < Math.Abs(items[end]-value) ? items[start] : items[end];
int mid = (start + end ) /2;
if (mid == start) return closest;
if (mid == end) return closest;
if (items[mid] == value)
{
return value;
}
if (items[mid] < value)
{
  start = mid;
}else{
  end = mid;
}
}
return closest;
}

No comments:

Post a Comment