Tuesday, May 9, 2017

Machine Learning is not limited to using NoSQL databases or graph databases. Recently SQL Server announced machine learning Services that are supported in database. We can now run Python in SQL Server using stored procedures or remote compute contexts. The package used in Python for machine learning purposes is revoscalepy module. This module has a subset of algorithms and contexts in RevoScaleR Since data scientists have to experiment with different models overs subsets of a very large datasets, parallel creation and execution of different models is now enabled using the new rxExecBy function. This function accepts a dataset containing ungrouped and unordered data and then lets us partition it by a single entity for training and model building. The result is the training of multiple models on appropriate subsets.
The R utilities are made available in SQL Server 2017 and Microsoft R Server. These include supported compute contexts such as RxSpark and RxInSQLServer.
Not just the models and contexts but also access to who uses them can be controlled by managing permissions associated with the packages. The users can be assigned the right to install their own packages or share packages with other users. Users who belong to these roles can install and uninstall R packages on the SQL server computer from a remote development client, without having to go through the database administrator each time. RevoScaleR package can also be upgraded from previous SQL Server  The upgrade is done by merely switching the server to Modern LifeCycle policy. It takes advantages for faster release cycle for R and automatically upgrades all R components. A specific package in R is also published by Microsoft. This is called the MicrosoftML package and it brings speed, performance and scale to handling a large corpus of text data and  high dimensional categorical data in R models. It also includes five fast, highly accurate learners that are included in Azure Machine Learning. MicrosoftML now also includes new image and test featurization functions as well as support for predictable models with rxExecBy. This package provides machine learning transform pipelines where we can specify the transforms to be applied to our data for featurization before training or testing to facilitate these processes. These include concat(), categoricalHash(), categorical(), selectFeatures(),  featurizeText() , featurizeImage() and getSentiment(). concat() creates a single vector valued column from multiple columns. categoricalHash converts a categorical value into an indicator array using Hashing. categorical() does the same using a dictionary. selectFeatures selects features from the specified variables using one of the two modes, count or mutual information. FeaturizeText produces a bag of counts of n-grams from a given text after performing language detection, tokenization, stopwords removing, text normalization, feature generation and term weighting.featurizeImage featurizes an image using the specified pre-trained deep neural network model. getSentiment returns a sentiment score of the specified natural language text, without the need for any text processing. A value that is closer to 0 indicates a negative sentiment while a value that is closer to 1 indicates a positive sentiment.
#codingexercise
Find the minimum number of moves in a Snake and Ladder game:
Consider each cell to be  vertex of a graph where each cell can connect with six other vertices based on the roll of a dice We find the minimum number of moves using a Breadth-First-Search.
For every cell we maintain the preknown next cell based on ladder or snake or a default value of -1 if neither is present.
int GetMinMoves(List<int> move, int n)
{
Initialize a queue and a boolean array for visited cells
Enqueue the root
while the queue is not empty:
      dequeue a cell
      if the dequeued cell is last
            break
      for each of the six next cells that are valid:
           if the visited flag is false:
              visited for that cell is set to true
              increment the distance
              determine the next cell from move for that cell
              enqueue the next cell
return the distance of the last cell
}
}

No comments:

Post a Comment