While practicing Machine Learning there are many hurdles that are encountered. One of such is Overfitting/Underfitting to stop this from happening, we do apply many solutions such as Bias Variance Tradeoff, Regularization, etc. You can find the recent blog on Bias Variance Tradeoff at the end of this one. In this article let's see more of Regularization.
Regularization is the technique to adjust the weights assigned to the features so as to reduce the variance of a certain feature that causes the model to overfit. This is also called Penalizing the Feature. By penalizing the features, regularization keeps them from being weighted too heavily if they inflate to large values. This is done by adding the coefficient terms to the cost function. The optimizer controls the coefficient values to minimize the cost function.
Some of the types of Regularization are:
LASSO Regression (L1 Norm)
Ridge Regression (L2 Norm)
Elastic Net Regression
Dropout
1. LASSO Regression (Least Absolute Shrinkage and Selection Operator)
Lasso regression penalizes the coefficients by the l1 norm. This constraint will reduce bias or the capacity of the learning algorithm. To add such a penalty forces the coefficients to be small,
i.e. it shrinks them toward zero. The objective function to minimize becomes:
where, w = weights assigned,
y = dependent variable,
X = independent variable,
λ = regularization parameter
This penalty forces some coefficients to be exactly zero, providing a feature selection property.
Thus the Loss Function is:
where, βo = intercept
β1 = slope
LASSO seems to reduce some of the coefficients to zero. Features with coefficients value as zero can be treated as features with no contribution to the model. LASSO can also be used for feature selection, that is, remove features with zero coefficients, thereby reducing the number of features.
2. Ridge Regression (L2 Norm)
Prevents the weights from getting too large (defined by L2 norm). Larger the weights, more complex the model is, more chances of overfitting.
Ridge Regularization aims to alleviate this phenomenon by constraining (biasing or reducing) the capacity of the learning algorithm in order to promote simple solutions. This regularization penalizes “large” solutions forcing the coefficients to be small, i.e. to shrink them toward zeros. Ridge term distributes (smoothens) the coefficient values across all the features.
Thus the Loss Function is:
3. Elastic Net Regression
Elastic Net regression combines both L1 and L2 regularizations to build a regression model.
where α is the mixing parameter between ridge (α = 0) and lasso (α = 1).
Now, there are two parameters to tune: λ and α.
4. Dropout
This technique is generally used in Deep Learning Neural Networks. During training, some number of layer outputs are randomly ignored or “dropped out.” This has the effect of making the layer look-like and be treated-like a layer with a different number of nodes and connectivity to the prior layer. In effect, each update to a layer during training is performed with a different “view” of the configured layer.
By dropping a unit out, we mean temporarily removing it from the network, along with all its incoming and outgoing connections
image credit: ai-pool.com
Key Takeaways:
Regularization is done to make the data smoother for machine to learn.
LASSO makes the weights of non contributing parameters to zero. It introduces sparsity in the weights. It forces more weights to be zero, than reducing the the average magnitude of all weights
Ridge enforces the B (slope/partial slope) coefficients to be lower, but not 0. Also it does not remove irrelevant features, but minimizes their impact
Elastic Net is a combination of L1 and L2 to adjust weights of the parameters.
Dropout is randomly turning OFF and ON the nodes of a Deep learning Model to make the learning slow enough to avoid overfitting
. . .
Connect with me on Linkedin
Open to Entry Level jobs as Data Scientist/Data Analyst. Please DM on Linkedin for my Resume for any openings in near future 🤗 🙏
Comments