top of page
Mayuresh Madiwale

Ensemble way of Machine Learning. What is it? How is it done?

Ensemble means "A group" or "A bunch". Ensemble learning is a technique of Machine Learning to employ a bunch of weak learning models to take a final decision. This not only helps to improve prediction but also to reduce bias and variance.


“A less common strategy for manipulating the search space is to manipulate the input attribute set. Feature subset-based ensemble methods are those that manipulate the input feature set in order to create the ensemble members. The idea is simply to give each classifier a different projection of the training set.”

Lior Rokach, Ensemble Learning: Pattern Classification Using Ensemble Methods


What are the various types of models used in Ensemble Learning ?
  1. Simple : Max Voting, Averaging, Weighted Averaging

  2. Advanced : Stacking, Bagging & Boosting

We'll see the above types with small and simple illustrations:

Image credit: Medium.com


Simple Ensemble Methods


1. Max Voting


Mode of all predictions.

Model 1

Model 2

Model 3

Model 4

Model5

Result

5

4

5

4

4

4

2. Average Voting


Average of all values.

Model 1

Model 2

Model 3

Model 4

Model 5

Result

5

4

5

4

4

4.4

3. Weighted Averaging Method


Averaging all values after giving some weights to all predictions


Formula: ((w1*P1)+(w2*P2)+(w3*P3)+(w4*P4)+(w5*P5))

where, w = weight

P = prediction value

Model 1

Model 2

Model 3

Model 4

Model 5

Result

Weights

0.2

0.18

0.25

0.22

0.21


Predictions

5

4

5

4

4

4.69


Advanced Ensemble Methods


1. Stacking Method


As the name suggests, it means stacking models onto each other. Usually a weak learner is stacked over a meta-learner.




2. Bootstrap Aggregating Method (Bagging)


In this method, a bunch of weak learning models are trained on number of Non-Overlapping subsets of original dataset in parallel. The label with the greatest number of predictions is selected as the prediction.




Bagging Algorithms :

  1. Bagging meta-estimator

  2. Random Forest


3. Boosting Method


A bunch of weak learners are fed a complete dataset in a sequential manner. Error terms in first model are fed to next with higher weight so that they are easily separable from other values and more emphasis is given on them. Again the process continues till we get a low biased prediction. Finally, voting is done on results to get result.



Boosting Algorithms –

  1. AdaBoost

  2. GBM

  3. XGBoost

  4. LightGBM

  5. CatBoost


Key Takeaways:

  1. Ensemble Learning is to employ many weak learners.

  2. In Stacking, weak learners are stacked over Meta Learners

  3. In Bagging, weak learners are trained by Non Overlapping subset of data.

  4. In Boosting, weak learners are trained sequentially with same data after adding weights on error terms.




. . .


Connect with me on Linkedin


Open to Entry Level jobs as Data Scientist/Data Analyst. Please DM on Linkedin for my Resume for any openings in near future 🤗 🙏

44 views0 comments

Recent Posts

See All

Comments


bottom of page