Friday, November 10, 2017

We were discussing modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorizations and optimizations.  
Sometimes we relied on experimental data to corroborate the model and tune it. Other times, we simulated the model to see the predicted outcomes and if it matched up with the observed data. There are some caveats with this form of analysis. It is merely a representation of our understanding based on our assumptions. It is not the truth. The experimental data is closer to the truth than the model. Even the experimental data may be tainted by how we question the nature and not nature itself.  This is what Heisenberg and Covell warn against. A model that is inaccurate may not be reliable in prediction. Even if the model is closer to truth, garbage in may result in garbage out
Any model has a test measure to determine its effectiveness. since the observed and the predicted are both known, a suitable test metric may be chosen. for example the sum of squares of errors or the F-measure may be used to compare and improve systems.
#codingexercise 
implement the fix centroid step of k-means
bool fix_centroids(int dimension, double** vectors, int* centroids, int* cluster_labels, int size, int k)

{
    bool centroids_updated = false;
    int* new_centroids = (int*) malloc(k * sizeof(int));
    if (new_centroids == NULL) { printf("Require more memory"); exit(1);}
    for (int i = 0; i < k; i++)
    {
        int label = i;
        double minimum = 0;
        double* centroid = vectors[centroids[label]];
        for (int j = 0; j < size; j++)
        {
             if (j != centroids[label] && cluster_labels[j] == label)
             {
                double cosd = get_cosine_distance(dimension, centroid, vectors[j]);
                minimum += cosd * cosd;
             }
        }

        for (int j = 0; j < size; j++)
        {
             if (cluster_labels[j] != label) continue;
             double distance = 0;
             for (int m = 0; m < size; m++)
             {
                if (cluster_labels[m] != label) continue;
                if (m == j) continue;
                double cosd = get_cosine_distance(dimension, vectors[m], vectors[j]);
                distance += cosd * cosd;
             }

             if (distance < minimum)
             {
                 minimum = distance;
                 new_centroids[label] = j;
                 centroids_updated = true;
             }
        }
    }

    if (centroids_updated)
    {
        for (int j = 0; j < k; j++)
        {
            centroids[j] = new_centroids[j];
        }
    }
    free(new_centroids);
    return centroids_updated;

}

No comments:

Post a Comment