3 min read

The Carbon Footprint of an AI project

Rédigé par Mathieu Soul

Mathieu Soul

 

What is a carbon footprint, and why is it important?

As you’ve probably heard, we are in the midst of an unprecedented ecological crisis. The impact of mankind’s activities on the planet’s ecosystem is pushing this system off-balance, creating a destabilization that may threaten humanity’s continued survival in the next decades.

The impact of our activities is multi-dimensional and complex. One of the most important and studied axes of this impact, however, is the emission of greenhouse gasses (GHG), which contribute to global warming.

 

Capture d’écran 2022-06-07 à 11.44.38CO2 reductions needed to keep global temperature rise below 1.5C

The carbon footprint of an activity is defined as the amount of GHG released into the atmosphere as a result of this activity. As shown in the graph above, to keep the global temperature rise below 1.5 degrees Celsius, we need to drastically reduce our CO2 emissions per year...

Trends in AI 

Building AI products, like any other human activity, has an environmental impact. As AI applications are becoming increasingly widespread, it is expected that the number of chips produced, and the energy consumption of the Information and Communications Technology sector (ICT) will continue increasing.

energy_projections_5f266b9c31e1a58ca0a46574f7f38c99_2000Expected ICT energy projections

In addition, SOTA models are increasing in size and complexity, leading to higher training and inference cost.

SOTAmodelsize_ca3f0bbc2ed113aa8c2ff85f3cab333c_2000SOTA model size, increasing 10x per year!

Given these trends, there is all the more reason to dive into what is the Carbon Footprint of an AI project!

What are the carbon expenses of an AI project?

An oversimplified AI project looks something like this:
Screenshot2022-05-23at14_49_55_ed440ae8f03673199808eff4ab8f5b5b_2000An oversimplified AI project

Basically, each core step of building an AI product consumes energy to run its compute, and hardware to compute on.

Estimating hardware’s carbon footprint

It turns out that evaluating the environmental impact of building computing hardware (CPU/GPU, RAM, SSD, racks...) is difficult today, mostly due to a lack of data.

Usually, this data can be found in Life Cycle Assessments (LCA), a standard methodological framework for estimating the environmental impacts of a product from the cradle to the grave. However, few LCAs are available for tech products! A project called Boavizta is currently working on consolidating all of the existing analyses, but they remain insufficient to precisely estimate this footprint...

Nevertheless, to obtain an order of magnitude of this impact, we can use data from GHG Protocol reports from Facebook and Google (as suggested by Gupta et. Al in Chasing Carbon: The Elusive Environmental Footprint of Computing).

 

The GHG Protocol is another standard framework used to measure GHG emissions of companies. In particular, the Protocol defines 3 scopes, which, in the case of data center companies like Google and Facebook, contain:

  • Scope 1 - Direct GHG emissions (Natural gas, diesel)
  • Scope 2 - Electricity indirect GHG emissions (Electrical energy for data centers)
  • Scope 3 - Other indirect GHG emissions (Hardware manufacturing, construction)

ghg_reporting_c3cd16bf3ac7bcb1c1e3e4857d869c2d_2000GHG Protocol reporting for Facebook and Google

 

Considering an average electrical grid, the ratio between Scope 3 emissions and Scope 2 emissions for these companies is about 4. It is also worth noting that manufacturing hardware has other environmental impacts (depletion of natural ressources, pollution...).

 

Said differently, a low estimate of the carbon footprint of manufacturing hardware is that it is 4 times more than the carbon footprint of its usage.

How can we estimate the carbon footprint of the hardware’s usage?

Estimating energy consumption and its footprint

The carbon intensity of electrical production varies geographically and temporally.

In countries like France, where electrical power is mostly nuclear, the GHG emissions per kWh produced is rather low on average (68 gCO2eq/kWh in 2021). Carbon intensity of electricity production in other countries like the USA (379 gCO2eq/kWh in 2021) or China (549 gCO2eq/kWh in 2021) can be much higher.

Depending on the fluctuations of energetic demand, and the varying production of renewables, this carbon intensity changes over time. These temporal changes, however, are an order of magnitude below geographical difference.

With this carbon intensity of electrical production available, all that remains in order to estimate the GHG emissions linked to electrical consumption is to measure the electrical consumption itself!

A bottom-up approach to this problem is to measure the electrical consumption of all relevant processes on all of the machines that run your code. This becomes even more difficult when using cloud ressources... To my knowledge, there isn’t a tried-and-tested way to do this, although a French project, Scaphandre, is working on it!

A top-down approach is to start from billing information (typically, your cloud provider bills), and deduce electrical consumption thanks to estimated coefficients for each type of task (compute, memory, storage, GPU...). This is the methodology used by Cloud Carbon Footprint, and open-source tool you can read more about here.

Estimating the business goal’s carbon footprint

What is likely to be the most impactful item on the carbon footprint of an AI project is the objective in itself. It is also what will put the cost of building and serving the model into perspective: if you emit 10 tons creating your model, but estimate that you’ll reduce by 11 tons the emissions of your business, then you’ve reduced your overall emissions by 1 ton!

It is also our duty, as Data Scientists/Engineers and members of the tech community as a whole, to ask ourselves if what we’re building is really worth it.

Conclusion

 

Estimating the carbon footprint of our activities is necessary in order to make rational choices to reduce our GHG emissions.

For AI, it is difficult to give precise estimates: the carbon footprint of hardware manufacturing is often unknown (or publicly available), and the true energy consumption of our code isn’t systemically measured. However, with certain hypotheses, and based on the existing data, we should be able to ascertain the order of magnitude of our footprint.

The next step for us, at Sicara, is to implement the measurement tools described in this article on our real projects, and share our experience. As Lord Kelvin famously said:

“If you cannot measure it, you cannot improve it.”

 

There’s more to come, do not hesitate to contact us for more information. 

 

Cet article à été écrit par

Mathieu Soul

Mathieu Soul

Suivre toutes nos actualités

Data migration: Thinking about using AWS Data Pipeline? Think twice

4 min read

Machine learning metrics are as essential as your model

4 min read

Fundamentals of NLP with multi-choice question generation

6 min read