September 27, 2023 • 4 min read

# Mastering LightGBM: Unravelling the Magic Behind Gradient Boosting

Rédigé par Agathe Minaro

Welcome to the fascinating world of LightGBM—a lightning-fast gradient-boosting framework, revolutionizing data science. LightGBM, short for Light Gradient Boosting Machine, is a leading-edge tool in machine learning, boosting predictions with incredible speed. Based on decision-tree algorithms, it can be used for all sorts of data science projects like ranking, regression, or classification. If you're a fan of Kaggle data science competitions, stop using random forest, this algorithm is the key to your success.

Have you ever thought of LightGBM as a fascinating puzzle waiting to be unravelled? We're here to crack the code and reveal that this algorithm isn't as difficult as it may appear! Picture it like understanding a basketball game's tactics—both thrilling and logical. We'll break it down into simple pieces, making you the MVP of understanding LightGBM.

But that's not all... In a second article, we will guide you on how to maximise LightGBM's potential with a concrete example, so stay tuned!

Let's get into it and make LightGBM your best ally.

## Grasping the basics of a gradient-boosting model

LightGBM is one of the gradient-boosted decision tree algorithms. They combine gradient optimisation with boosting techniques. If you are not familiar with these two concepts, don't worry, the following two parts will clarify them for you.

### Hiking is as easy as a gradient optimisation

Let’s first understand the idea of gradient descent!

Picture finding the lowest point in a valley by taking small steps downhill. Gradient descent is like using the slope to guide each step; the steeper the slope, the faster you will get to the bottom. You start anywhere on the hill, analyse the slope, and carefully move downward. With each step, you're closer to the lowest point.

Mathematically, it aims to minimise a cost function, with respect to the model parameters. At each iteration of gradient descent, we adjust the parameters in the opposite direction of the gradient defined by the partial derivatives of the cost function with respect to each parameter. The learning rate controls the size of each step we take during this iterative parameter update process. A higher learning rate means more significant steps, potentially risking overshooting the minimum. Conversely, a very small one results in tiny steps, slowing down convergence. Thus, the gradient descent helps adjust model parameters in a direction that minimises errors, focusing on the best fit for accurate predictions.

It is not that complicated, is it?

### Gradient Boosting Algorithms: It's Simpler Than You Think

Now that you understand how gradient optimisation algorithms work, let’s focus on how gradient-boosting algorithms operate and how they leverage gradient optimisation:

1. Creation of a weak model, defined as slightly better than the random one, trained on the data
2. Construction of a second model in an attempt to correct the errors of the first one. The gradient descent algorithm minimises the errors; each tree is added to compensate for the errors made previously, without damaging the predictions that were correct.
3. Other models are added until the predictions are accurate enough, or the selected maximum number of models has been reached.
4. The sum of the predictions of all trees will be the final predictions.

## Key characteristics of LightGBM

Two important characteristics need to be underlined in LightGBM that differentiate it from other models like AdaBoost: where each tree in the ensemble is grown, and how trees are computed.

### Leaf-wise tree growth

When constructing trees in gradient boosting, you can employ two primary strategies: level-wise and leaf-wise.

In most gradient-boosting algorithms, the level-wise strategy, which grows the tree level by level, is computed. In this approach, each node splits the data, giving priority to nodes closer to the tree's root. However, LightGBM uses the leaf-wise strategy. It optimally selects the leaf with the most significant loss reduction. It leads to the creation of deeper and more expressive trees by dramatically increasing tree depth, especially in high-loss regions. This approach accelerates training by expanding fewer nodes, making it faster than traditional gradient-boosting algorithms.

However, this efficiency may heighten the risk of overfitting, which can be solved by effective regularization techniques. These strategies will be covered in the next article.

### Histogram-based algorithm

Thanks to its histogram-based algorithm, LightGBM speeds up the construction of decision trees. Indeed, instead of looking at each data point when deciding how to split a node in a tree, it groups data into bins or buckets, forming histograms. Each bin represents a range of values for a particular feature.

Here are the key stages of the algorithm:

• Data binning. The algorithm bins or groups the data into intervals based on the feature values. Each bin contains a certain range of values, which simplifies the computations.
• Histogram Construction. For each feature, it creates a histogram, which is a collection of bins with their associated statistics. These statistics often include the sum of gradients and the sum of squared gradients for each bin.
• Histogram-Based Splitting. When building a tree node, instead of considering individual data points, the algorithm uses the histograms to make decisions on how to split the data. It computes the best-split points based on the statistics in the bins.

Thus, this approach speeds up the tree-building process because it reduces the number of operations needed to find the best split at each node.

These two characteristics are also used in the famous XGBoost, so why is LightGBM better?

## Why is LightGBM unique?

Two key differentiating factors set LightGBM apart from other gradient-boosting algorithms like XGBoost: the Gradient-Based One-Side Sampling and Exclusive Feature Bundling algorithms.

LightGBM introduces an innovative sampling technique that involves downsampling instances based on gradients. During training, instances with small gradients are normally well-trained, while those with large gradients are under-trained. A simplistic approach to downsampling would involve discarding instances with small gradients and focusing solely on instances with large gradients. However, this approach would disrupt the original data distribution. Thus, the GOSS algorithm preserves instances with large gradients while conducting random sampling for those with small gradients.

### Exclusive Feature Bundling (EBF)

Building a histogram is time-consuming, and the length is proportional to the number of data points multiplied by the number of features. Thus, LightGBM addresses this by grouping features together, speeding up tree learning. When working with high-dimensional data, many features are mutually exclusive, never taking zero values at the same time. LightGBM identifies and bundles these features into a single feature, simplifying the process.

To conclude, these two specific algorithms in LightGBM significantly accelerate processing compared to XGBoost. This speed boost is particularly noticeable when dealing with large datasets: for instance, with 119 million data points and 54 million features, as explained in this article, the training time falls from 192s to 13s by iteration!

### Conclusion

LightGBM revolutionizes gradient boosting with its lightning-fast speed and tree-growth strategies, which makes it stand out from the field of machine learning. The GOSS and EBF algorithms optimize model efficiency and accuracy by prioritizing key features in the data.

Now that you have a better understanding of this algorithm, stay tuned for the second article on how to use it effectively!

Are you looking for machine learning experts? Don’t hesitate to contact us!

Agathe Minaro