Cluster computing

Saturday, May 20, 2017

We were discussing the MicrosoftML rxFastTree algorithm.
The gradient boost algorithm for rxFastTree is described by Friedman in his paper as possible with several loss functions including the squared loss function
The algorithm for the least squares regression can be written as :
1. Set the initial approximation
2. For a set of successive increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the residuals
5. Perform the boost along the line of search

6. Repeat 3,4,5 for each of 2.

We now discuss the rxFastForests method:
The rxFastForest is a fast forest algorithm also used for binary classification or regression. It can be used for churn prediction. It builds several decision trees built using the regression tree learner in rxFastTrees. An aggregation over the resulting trees then finds a Gaussian distribution closest to the combined distribution for all trees in the model

with rxFastTree and rxFastForest, we wonder if rxFastGraph is available. It would help to notice that clasification is inherently tree based and mostly statistical.
The Naive Bayes Classifier produces a simple tree model and relates to a logistic regression model. The Hidden Markov Model relates with a linear chain Conditional Random Fields. The General Directed Model relates to General Conditional Random Fields. These models are pictured as tree, linear chains and graphs.
A generative model takes the form of conditional probabilities over the training sample joint distribution. This conditional distribution is a CRF with a particular choice of feature functions in the case of HMM. With generalization, we move from a linear chain factor graph to a more general factor graph. The conditional distribution is now written as a normalization of factors. The noteworthy consideration in a general CRF is the repeated structure and parameter tying. A few methods to specify the repeated structure include dynamic conditional random fields are sequence models which allow multiple labels at each step and that too with dynamic Bayesian networks. Also relational Markov networks allow parameter tying based on a SQL like syntax.

Cluster computing

Saturday, May 20, 2017

No comments:

Post a Comment