DL Menu


Semi-Supervised Learning


In the paradigm of semi-supervised learning, both unlabeled examples from P (x) and labeled examples from P (x, y) are used to estimate P (y | x) or predict y from x.

The most basic disadvantage of any Supervised Learning algorithm is that the dataset has to be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data. The most basic disadvantage of any Unsupervised Learning is that it’s application spectrum is limited.

To counter these disadvantages, the concept of Semi-Supervised Learning was introduced. In this type of learning, the algorithm is trained upon a combination of labeled and unlabelled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabelled data.

Working of Semi-Supervised Learning

  • Firstly, it trains the model with less amount of training data similar to the supervised learning models. The training continues until the model gives accurate results.
  • The algorithms use the unlabeled dataset with pseudo labels in the next step, and now the result may not be accurate.
  • Now, the labels from labeled training data and pseudo labels data are linked together.
  • The input data in labeled training data and unlabeled training data are also linked.
  • In the end, again train the model with the new combined input as did in the first step. It will reduce errors and improve the accuracy of the model.

Sharing Parameters, Instead of separate unsupervised and supervised components in the model, construct models in which generative models of either P(x) or P(x,y) shares parameters with a discriminative model of P(y|x). One can then trade-off the supervised criterion –log P(y|x) with the unsupervised or generative one (such as –log P(x) or –log P(x,y))


Next Topic :Multi-Task Learning