Today we are resuming the discussion on model fitting and error estimation by Kleinstein and Hershberg
If we are not inclined towards error estimation then we can attempt Bootstrap method. This method uses actual data sets with its N points to generate synthetic data with just as many points. The synthetic differs from the original with N being the fraction of the original points replaced with duplicated originals. Since the order of data points does not matter, the estimation can take place with actual measurement noise. We can use the Chi-square merit function for this purpose.
Next we review confidence intervals and accuracy of model parameters. A model may have parameters that correspond to m dimensions. Since each of these dimensions can allow variations, the probability distribution is a function defined on M-dimensional space. With this probability distribution, we choose a region that has a high percentage of the total distribution relative to the selected model parameters This region is called the confidence interval. Depending on the degree of distribution chosen, confidence intervals may be mentioned in levels as percentages. The region shape of the distribution may also be mentioned such as ellipsoids. Generally we pick the region that is reasonably compact and centered around the point of reference of the parameter space. This region or band of data can be described as y = prctile(x, [5,95])
As we work with model parameters, some rules of thumb come into view. For example, if we define a progressive rate constant, it cannot turn out to be negative. Similarly, Poisson's ratio cannot turn out to exceed 0.5. This is the ratio of the proportional decrease in width to the ratio of the proportional increase in length as a material is stretched. Other examples might put both an upper and lower bound on the parameters. These facts should not be violated.
#codingexercise
We talked about finding closest elements to a given value across three arrays. We introduced the notion of projections to find non-overlapping regions We also mentioned that projections can be initiated from any sentinel values to split the range in the three arrays. What sentinel values we choose does not depend only the start and the end of an array. Given any range of interest, it can be projected on all the three arrays to split the respective ranges into array specific and non-overlapping subarrays as well as overlapping with the given range. This is very useful to shrink the indices for performing computations related to a given range.
Also, note that range division is not the only benefit. We can also approximate computations in all three arrays by performing a similar operation within a projection.
If we are not inclined towards error estimation then we can attempt Bootstrap method. This method uses actual data sets with its N points to generate synthetic data with just as many points. The synthetic differs from the original with N being the fraction of the original points replaced with duplicated originals. Since the order of data points does not matter, the estimation can take place with actual measurement noise. We can use the Chi-square merit function for this purpose.
Next we review confidence intervals and accuracy of model parameters. A model may have parameters that correspond to m dimensions. Since each of these dimensions can allow variations, the probability distribution is a function defined on M-dimensional space. With this probability distribution, we choose a region that has a high percentage of the total distribution relative to the selected model parameters This region is called the confidence interval. Depending on the degree of distribution chosen, confidence intervals may be mentioned in levels as percentages. The region shape of the distribution may also be mentioned such as ellipsoids. Generally we pick the region that is reasonably compact and centered around the point of reference of the parameter space. This region or band of data can be described as y = prctile(x, [5,95])
As we work with model parameters, some rules of thumb come into view. For example, if we define a progressive rate constant, it cannot turn out to be negative. Similarly, Poisson's ratio cannot turn out to exceed 0.5. This is the ratio of the proportional decrease in width to the ratio of the proportional increase in length as a material is stretched. Other examples might put both an upper and lower bound on the parameters. These facts should not be violated.
#codingexercise
We talked about finding closest elements to a given value across three arrays. We introduced the notion of projections to find non-overlapping regions We also mentioned that projections can be initiated from any sentinel values to split the range in the three arrays. What sentinel values we choose does not depend only the start and the end of an array. Given any range of interest, it can be projected on all the three arrays to split the respective ranges into array specific and non-overlapping subarrays as well as overlapping with the given range. This is very useful to shrink the indices for performing computations related to a given range.
Also, note that range division is not the only benefit. We can also approximate computations in all three arrays by performing a similar operation within a projection.
No comments:
Post a Comment