What is a carbon footprint, and why is it important?
As you’ve probably heard, we are in the midst of an unprecedented ecological crisis. The impact of mankind’s activities on the planet’s ecosystem is pushing this system off-balance, creating a destabilization that may threaten humanity’s continued survival in the next decades.
The impact of our activities is multi-dimensional and complex. One of the most important and studied axes of this impact, however, is the emission of greenhouse gasses (GHG), which contribute to global warming.
CO2 reductions needed to keep global temperature rise below 1.5C
The carbon footprint of an activity is defined as the amount of GHG released into the atmosphere as a result of this activity. As shown in the graph above, to keep the global temperature rise below 1.5 degrees Celsius, we need to drastically reduce our CO2 emissions per year...
Trends in AI
Building AI products, like any other human activity, has an environmental impact. As AI applications are becoming increasingly widespread, it is expected that the number of chips produced, and the energy consumption of the Information and Communications Technology sector (ICT) will continue increasing.
Expected ICT energy projections
In addition, SOTA models are increasing in size and complexity, leading to higher training and inference cost.
SOTA model size, increasing 10x per year!
Given these trends, there is all the more reason to dive into what is the Carbon Footprint of an AI project!
What are the carbon expenses of an AI project?
An oversimplified AI project looks something like this:
An oversimplified AI project
Basically, each core step of building an AI product consumes energy to run its compute, and hardware to compute on.
Estimating hardware’s carbon footprint
It turns out that evaluating the environmental impact of building computing hardware (CPU/GPU, RAM, SSD, racks...) is difficult today, mostly due to a lack of data.
Usually, this data can be found in Life Cycle Assessments (LCA), a standard methodological framework for estimating the environmental impacts of a product from the cradle to the grave. However, few LCAs are available for tech products! A project called Boavizta is currently working on consolidating all of the existing analyses, but they remain insufficient to precisely estimate this footprint...
Nevertheless, to obtain an order of magnitude of this impact, we can use data from GHG Protocol reports from Facebook and Google (as suggested by Gupta et. Al in Chasing Carbon: The Elusive Environmental Footprint of Computing).
The GHG Protocol is another standard framework used to measure GHG emissions of companies. In particular, the Protocol defines 3 scopes, which, in the case of data center companies like Google and Facebook, contain:
Scope 1 - Direct GHG emissions (Natural gas, diesel)
Scope 2 - Electricity indirect GHG emissions (Electrical energy for data centers)
Scope 3 - Other indirect GHG emissions (Hardware manufacturing, construction)
GHG Protocol reporting for Facebook and Google
Considering an average electrical grid, the ratio between Scope 3 emissions and Scope 2 emissions for these companies is about 4. It is also worth noting that manufacturing hardware has other environmental impacts (depletion of natural ressources, pollution...).
Said differently, a low estimate of the carbon footprint of manufacturing hardware is that it is 4 times more than the carbon footprint of its usage.
How can we estimate the carbon footprint of the hardware’s usage?
Estimating energy consumption and its footprint
The carbon intensity of electrical production varies geographically and temporally.
In countries like France, where electrical power is mostly nuclear, the GHG emissions per kWh produced is rather low on average (68 gCO2eq/kWh in 2021). Carbon intensity of electricity production in other countries like the USA (379 gCO2eq/kWh in 2021) or China (549 gCO2eq/kWh in 2021) can be much higher.
Depending on the fluctuations of energetic demand, and the varying production of renewables, this carbon intensity changes over time. These temporal changes, however, are an order of magnitude below geographical difference.
With this carbon intensity of electrical production available, all that remains in order to estimate the GHG emissions linked to electrical consumption is to measure the electrical consumption itself!
A bottom-up approach to this problem is to measure the electrical consumption of all relevant processes on all of the machines that run your code. This becomes even more difficult when using cloud ressources... To my knowledge, there isn’t a tried-and-tested way to do this, although a French project, Scaphandre, is working on it!
A top-down approach is to start from billing information (typically, your cloud provider bills), and deduce electrical consumption thanks to estimated coefficients for each type of task (compute, memory, storage, GPU...). This is the methodology used by Cloud Carbon Footprint, and open-source tool you can read more about here.
Estimating the business goal’s carbon footprint
What is likely to be the most impactful item on the carbon footprint of an AI project is the objective in itself. It is also what will put the cost of building and serving the model into perspective: if you emit 10 tons creating your model, but estimate that you’ll reduce by 11 tons the emissions of your business, then you’ve reduced your overall emissions by 1 ton!
It is also our duty, as Data Scientists/Engineers and members of the tech community as a whole, to ask ourselves if what we’re building is really worth it.
Estimating the carbon footprint of our activities is necessary in order to make rational choices to reduce our GHG emissions.
For AI, it is difficult to give precise estimates: the carbon footprint of hardware manufacturing is often unknown (or publicly available), and the true energy consumption of our code isn’t systemically measured. However, with certain hypotheses, and based on the existing data, we should be able to ascertain the order of magnitude of our footprint.
The next step for us, at Sicara, is to implement the measurement tools described in this article on our real projects, and share our experience. As Lord Kelvin famously said:
“If you cannot measure it, you cannot improve it.”