4 min read

Machine learning metrics are as essential as your model

Rédigé par Guillaume DELEPOULLE


Building an ML model is an amazing experience but evaluating its business value can be difficult depending on the machine learning metrics you are following. Is the AUC (Area Under the ROC curve) meaningful for your product owner or stakeholder?

Why are your machine learning metrics important?

Developing machine learning software in a business context comes with new priorities. Most of the time, your model should respond to a user’s need. Therefore, you need to make sure you are bringing value to them.
How do I make sure I am building the right tool? Metrics!

Machine learning metrics are your north star. They symbolize your user’s needs:

  • My user needs alarms for predictive maintenance: I have a metric to know how early I can make the prediction;
  • Working on active learning: I have a metric to know the increased performance of my model (check out in this article if active learning is not a familiar concept);
  • Your user wants to go faster on a task: I have a metric for the speed improvement.

Machine learning metrics are a direct implementation of the understanding you have of your user.

There are some contexts where standard statistical metrics (accuracy, precision, MAE ...) are the right tools to communicate with the Data Science team. But what about business product managers and stakeholders? Are they familiar with these metrics? Not that much from my experience. Custom metrics are key in leading a machine learning project to success.

Study of a data science use-case

I want to highlight how important it can be to develop your own metrics and put a lot of attention to them! So let me present a use-case inspired by a project I worked on this year where the implementation of custom machine learning metrics was a key track in the success of the project.

My last project was to give optimized value for some parameters to my users. The idea was for users to follow these recommendations to make more effective their task. During the proof of concept, we wanted to know the performance of our optimization even if we had no user feedback. Let’s build this POC!

A time-series dataset

The dataset used for this article comes from the Kaggle dataset Appliances Energy Consumption. It is a time-series dataset with an hourly frequency. It gives the appliances of a house as well as several values like temperature and pressure of the rooms for 4.5 months.

For the separation in train and testing sets, I simply took 75% of my data for the training and 25% for the testing set.

In this use case, the goal is to reduce appliance consumption by turning some parameters to their optimal values. Selecting which parameter can be changed is a key subject in such a use case.
For this article, I have just selected T1, T2, and T3 (the temperature of the first 3 rooms), it was quite an arbitrary decision and the efficiency of my algorithm will depend on this decision.

Optimization algorithm


The first step of the algorithm is to train a regression model to be able to infer the appliances' consumption in the house. It requires a lot of work on the features to improve the results but for this article, I restrained myself to only using the raw columns given by the Kaggle dataset.

Based on this work, I noticed SVR (support vector regression) is the best algorithm, and here is a summary of my machine learning metrics :

  • r2: 0.15
  • MAE: 0.38

With that model, here is a little plot of my predictions vs the actual values of the appliances of the house on the testing set :

Machine learning metrics_2
Ground truth vs model inference on the testing set

We can see it’s far from perfect but let’s pursue it!

Optimization of parameters

As I told you before, the goal is to give the optimal parameters (here T1, T2, and T3) so the user can minimize or maximize a target value (here minimize the appliances by changing the temperature by tweaking the heating system).

To do that, we use the values of the current timestamp. For the 3 parameters to optimize, we create a set of values by looking around the values as shown below in the code and the diagram.

Construct a grid of features for the optimization

To complete the features, we get the values of the rest of the static features.

Add the rest of the parameters in the grid of features

Then, we infer the appliances with the model trained during the previous step for each combination of parameters. And keep the combination of parameters that returns the minimum value of the appliances.

Machine learning metrics_1
Optimization as a drawing


Custom machine learning metrics

The problem with this optimization is the lack of metrics. We have a metric to know if the regression part has good performance (r2: coefficient of determination).

But we don’t have a clue to know if following the recommendations of the algorithm would help to decrease the appliances of the house.

To have that information, we create a metric that gives the average decrease of appliances resulting from the optimization algorithm on the testing set.

Implementation of my custom machine learning metrics

The output result is an average decrease in appliance consumption of 4%. It doesn’t seem very high but it is tangible value for SMEs (subject matter experts) to take decisions if the algorithm should go to prod.

Limits of the presented machine learning metrics

But before presenting and even before implementing the machine learning metric, it is important to think what are the bias and assumptions made for its computation.

In our context, the metric will be based on a few assumptions :

  • Tuning the parameters is possible
  • The change of parameter would be done instantly (which might be complicated in a lot of contexts including this one)
  • The correctness of the model is high: so we can compute a difference of appliances and make tangible optimization.

The best way to have impartial machine learning metrics would be to make some A/B testing when the algorithm goes to production.

But if we still assume we are in a development context, we should focus on the fact that the metric depends on the correctness of the model. We should add the average gain error made by the model. The implementation of the metric would now look like this:

Implementation of my custom machine learning metrics with related error

The output results are now 4±9 %. Shows our model is not ready for production. And without checking our hypothesis, we would have gone in the wrong direction. We can be quite convinced the regression model should be improved by feature engineering, hyper-optimization...

This use-case shows the necessity of :

  • Implementing your own machine learning metrics to capture business value
  • Checking the limitations of your machine learning metrics could lead you and the rest of your team in the wrong direction.

Thanks for reading, if you are wondering how to measure the carbon impact of your cloud project then you can read my previous article on cloud carbon footprint monitoring.

Are you looking for machine learning experts? Don’t hesitate to contact us!

Cet article à été écrit par



Suivre toutes nos actualités

Data migration: Thinking about using AWS Data Pipeline? Think twice

4 min read

Machine learning metrics are as essential as your model

4 min read

Fundamentals of NLP with multi-choice question generation

6 min read