Ensemble means "A group" or "A bunch". Ensemble learning is a technique of Machine Learning to employ a bunch of weak learning models to take a final decision. This not only helps to improve prediction but also to reduce bias and variance.
“A less common strategy for manipulating the search space is to manipulate the input attribute set. Feature subset-based ensemble methods are those that manipulate the input feature set in order to create the ensemble members. The idea is simply to give each classifier a different projection of the training set.”
― Lior Rokach, Ensemble Learning: Pattern Classification Using Ensemble Methods
What are the various types of models used in Ensemble Learning ?
Simple : Max Voting, Averaging, Weighted Averaging
Advanced : Stacking, Bagging & Boosting
We'll see the above types with small and simple illustrations:
Image credit: Medium.com
Simple Ensemble Methods
1. Max Voting
Mode of all predictions.
Model 1 | Model 2 | Model 3 | Model 4 | Model5 | Result |
5 | 4 | 5 | 4 | 4 | 4 |
2. Average Voting
Average of all values.
Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Result |
5 | 4 | 5 | 4 | 4 | 4.4 |
3. Weighted Averaging Method
Averaging all values after giving some weights to all predictions
Formula: ((w1*P1)+(w2*P2)+(w3*P3)+(w4*P4)+(w5*P5))
where, w = weight
P = prediction value
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Result |
Weights | 0.2 | 0.18 | 0.25 | 0.22 | 0.21 | |
Predictions | 5 | 4 | 5 | 4 | 4 | 4.69 |
Advanced Ensemble Methods
1. Stacking Method
As the name suggests, it means stacking models onto each other. Usually a weak learner is stacked over a meta-learner.
2. Bootstrap Aggregating Method (Bagging)
In this method, a bunch of weak learning models are trained on number of Non-Overlapping subsets of original dataset in parallel. The label with the greatest number of predictions is selected as the prediction.
Bagging Algorithms :
Bagging meta-estimator
Random Forest
3. Boosting Method
A bunch of weak learners are fed a complete dataset in a sequential manner. Error terms in first model are fed to next with higher weight so that they are easily separable from other values and more emphasis is given on them. Again the process continues till we get a low biased prediction. Finally, voting is done on results to get result.
Boosting Algorithms –
AdaBoost
GBM
XGBoost
LightGBM
CatBoost
Key Takeaways:
Ensemble Learning is to employ many weak learners.
In Stacking, weak learners are stacked over Meta Learners
In Bagging, weak learners are trained by Non Overlapping subset of data.
In Boosting, weak learners are trained sequentially with same data after adding weights on error terms.
. . .
Connect with me on Linkedin
Open to Entry Level jobs as Data Scientist/Data Analyst. Please DM on Linkedin for my Resume for any openings in near future 🤗 🙏
Comments