Monday, May 10, 2021

Transfer Learning in summarizing text. 

Problem statement: Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. It only works in deep learning where the model features learned from the first task are general and the features are suitable to both the base and the target tasks. This form of transfer is also called inductive transfer and the scope for the allowed hypothesis is narrowed down in a beneficial way because the model is fit on a different but related task. When the data is sparse or expensive, this focus becomes especially efficient. This article describes how text summaries are generated with transfer learning. 


Solution: spaCy is a newer natural language processing library compared to its predecessors that were written in Java. It is written in Python, provides word vectors, and performs syntactic analysis, tokenization, sentence boundary detection, POS tagging, syntactic parsing, text alignment, and entity recognition reliably, accurately, and in a production-ready manner. It begins by preparing a language object with a pipeline where the pipeline has the necessary configuration information to load the model data and weights. There are several prepackaged pipelines to choose from and each one returns a language object which is ready for trying out with the given text sample. This pre-trained model approach is an example of transfer learning. The model is used as a starting point on the given text. It can also be tuned with different configuration options. 


The word embedding is already part of their pre-trained models. This is a mapping of words to a high-dimensional continuous vector space where different words with a similar latent meaning have a similar vector representation.  These are obtained from convolutional neural networks which are run against a large corpus of text. When the model is initially trained, it learns more and more as the training data increases. When this pre-trained model is run against the target data, it tends to have a higher starting point, a higher slope, and a higher asymptote. The start refers to the initial skill of the model before any learnings from the dataset.  A pre-trained model has a non-zero start when compared to a model that is starting from scratch. The rate of learning is also higher on the pre-trained model because of the narrowing scope in the allowed hypothesis. The convergence of the model trained on the target data set is better than the control model because of the improved learning. The only caution here is that this is general across all pre-trained models. It does not mean that a model is the right choice for the given data set. There might be an alternative to choose from and the training regime and underlying corpus matter to the results. 


Transformers also come with such pre-trained models. While spaCy is for natural language processing, Transformers is used for natural language understanding. It comes with several general-purpose models including BERT and its variations and can be used in conjunction with spaCy. A model can encapsulate both. 


Conclusion:  Transfer learning is not just a convenience. It is a requirement for sparse datasets and faster prediction. 

 

 

No comments:

Post a Comment