6 min read

Meta High-Resolution Population Density Map

Rédigé par Antoine Miquel

Antoine Miquel
How to compute the population 50km around the city of Rennes?

You work in real estate and you need to evaluate the population in a custom area around your apartment?
You analyze the transport network of a city and want to estimate the population living below a specific distance?
You are a growth hacker and need to spot the more crowded spot to implement your new store?

Generally speaking, you want the population of any custom areas which are not administrative boundaries with an available census?

How to compute the population inside this custom polygon?

I have good news for you: this can be easily and quickly done for free using BigQuery GIS and Facebook-High-Resolution-Population-Density-Map data. By the end of this article, you will be able to compute the population on any custom shapes, in a scalable way.

For this tutorial, I will expose this method with two use cases :

  1. Counting the population in a 50km radius around the beautiful city of “Rennes”.
  2. Compute the population and the density of the districts of Paris.

Let’s compute some population!

Deep dive into Facebook High-Resolution Population Density Maps

Presentation of High-Resolution Population Density Map

Meta (previously Facebook) proposes High-Resolution Population Density Maps for a lot of countries in the world. These maps give a precise population estimation of a 30m x 30m resolution.
If you are interested, here is an explanation of how Facebook creates these High-Resolution Population Density Maps from census data and satellite images.

These Facebook High-Resolution Population Density Maps can be found on the Humanitarian Data Exchange website which proposes a bunch of open data.

For this tutorial, we will work with the French High-Resolution Population Density Map available here. Facebook proposes several maps for different population densities: women, men, children, etc. Let’s keep it simple and take the overall population density map.

The CSV file of the overall High-Resolution Population Density Map of France.

The data is stored in a CSV format, in which each line corresponds to a 30m x 30 m square and its population. You will find three columns :

  • latitude: The latitude of the center of the square
  • longitude: The longitude of the center of the square
  • square_population: The population estimation inside the square

In the rest of the article, for simplicity, we will call these 30m x 30m squares ‘points’ since we use their center points.

Integrate High-Resolution Population Density Map inside the BigQuery environment

To perform operations on this High-Resolution Population Density Map of France, we will work in the BigQuery environment which the advantages of :

  • Being super efficient
  • Being super scalable
  • Having an integrated GIS (geographic information system)
  • Being part of the GCP ecosystem

Another good news: if you don’t have a GCP account you can use the free version to follow this tutorial.

Let’s see how we can upload this High-Resolution Population Density Map in BigQuery.

Upload the CSV to Google Cloud Storage

Since the uncompressed CSV is pretty big (~2 Go) we have to store it first on Google Cloud Storage, which allows us to store files and use them in the other services of GCP.

Inside Cloud Store, create a new bucket by choosing a name and the region of your choice. You can let all the others parameters with default values.

Google Cloud Storage view of our bucket

In your bucket, upload the France High-Resolution Population Density Map CSV.

The France High-Resolution Population Density Map CSV in a Cloud Storage bucket

Create the table in BigQuery

Now go to the Big Query service UI and open your project. Create a dataset by choosing a name and the same region as your bucket:

The panel to create your new dataset

Within your dataset, you can now create a table to upload the France High-Resolution Population Density Map CSV. To do so, choose Google Cloud Storage as a source: you should be able to find your bucket and your CSV inside. Let’s call it ‘FR_population’.

Choose the France High-Resolution Population Density Map CSV

Then set the schema like this :

Advised schema for your FR_population table

And finally, in the Advanced options, set the ‘Header rows to skip’ parameters to 1.

Advanced options for the table creation

Create your table and check its preview, you should have something like that :

Preview of the FR_population table

Congratulation, you are done now with the upload of the High-Resolution Population Density Map in BigQuery!
We can move now to the interesting part: the uses cases.

Use case 1: Compute the population of the 50km circle around Rennes.

The objective of this first use case is to estimate the number of people who live less than 50km away from the city center of Rennes.

BigQuery GIS quick introduction

To manipulate our FR_population table we need to see some of the BigQuery GIS functions. If you are interested in exploring the possibilities of BigQuery you can read its documentation. The GIS allows you to work with geographical objects such as points, lines, and polygons. They are stored in a type called GEOGRAPHY by BigQuery.

The functions that we will use are :

  • ST_GEOGPOINT: Creates a point with the GEOGRAPHY type from longitude and latitude.
  • ST_DISTANCE: Computes the distance in meters between two GEOGRAPHY objects.

Computation strategy and query

To compute the population in a 50km radius around Rennes we have to :

  1. Create a point for the center of Rennes which is (latitude: 48.117266, longitude: -1.6777926)
  2. Transform our latitude, longitude columns to GEOGRAPHY points
  3. Keep the points from our FR_population table which are distant less than 50km from the center of Rennes
  4. Sum the population of the kept points to get our total population.

The corresponding query is :

SELECT SUM(point_population) as population_50km
FROM `speedy-octane-332614.population_tutorial.FR_population`
WHERE
ST_DISTANCE(ST_GEOGPOINT(longitude, latitude),ST_GEOGPOINT(-1.6777926, 48.117266)) < 50000

And give us an output consistent with the 1 million population of the ‘Ille-et-Vilaine’ administrative region.

The output of the above query

Bravo 👏 , you know now how to get the population within any range from a specific point! Now let’s step up and work with custom polygons...

Use case 2: count the population and the density of Paris districts

This second use case will illustrate how you can compute population on custom polygons. We will take the example of the Paris districts.

Retrieve Paris districts and upload it on BigQuery

The first step is to create a table of all Paris districts polygons. The source file is available on the opendata.paris website, in a CSV format.

You can then create your table in a similar way to the Facebook High-Resolution Population Density map upload but :

  • You can directly upload the CSV from your local computer because the CSV is pretty light
  • You can use the Auto detect option for the Schema
  • ⚠️ The CSV Field delimiter is “;”

In the end, you should be able to have a table, that we will call paris_districts, with this schema :

The schema of the paris_districts table

Whose Geometry column has this preview :

First-row example of the Geometry field that stores our polygons.

The polygons are stored as strings in a geojson format. Fortunately, BigQuery GIS can support this format and we are going to see how.

The GIS functions we will need

To be able to count the population on our custom shapes we will need to use:

  • ST_GEOGFROMGEOJSON: This function will transform our polygon in GeoJson text to a GEOGRAPHY object
  • ST_CONTAINS: This function returns true if the first GEOGRAPHY object contains the second one.
  • ST_AREA: This function returns the area of a GEOGRAPHY object

The strategy and the query

To be able to compute the population and the population density for each of the Paris districts we will :

  1. Transform our GeoJson text polygons to GEOGRAPHY object with ST_GEOGFROMGEOJSON
  2. Transform our latitude, longitude columns from the FR_population table into GEOGRAPHY points
  3. Join FR_population with paris_districts to keep population points inside the correct Paris district using the ST_CONTAINS function
  4. Grouping our data by districts computing the population by summing the population_point column, and the density by dividing by the area of the corresponding district

The corresponding query is

SELECT
paris_districts.L_QU as district_name,
SUM(FR_population.point_population) as district_pop,
SUM(FR_population.point_population) / ST_AREA(ST_GEOGFROMGEOJSON(paris_districts.Geometry)) * 1000000 as district_density,
FROM `speedy-octane-332614.population_tutorial.FR_population` as FR_population
JOIN
`speedy-octane-332614.population_tutorial.paris_districts` as paris_districts ON
ST_CONTAINS(
ST_GEOGFROMGEOJSON(paris_districts.Geometry),
ST_GEOGPOINT(FR_population.longitude, FR_population.latitude)
)
GROUP BY paris_districts.L_QU, paris_districts.Geometry
ORDER BY district_pop DESC

That gives you the districts with their population and their density, ordered from the most populated to the less.

The top5 most populated districts from the above query result

As an example, Saint-Lambert's result is compatible with the official figure.

Visualize it with BigQueryGeoViz

If you want to share your geospatial results beautifully, BigQueryGeoViz is an amazing tool.

You only have to connect with your Google account, select the project ID you work with, and then you will be able to run some queries directly in the interface.

The Paris districts colored according to their population density

For example, we are going to display the above colored map using the following query :

SELECT
paris_districts.L_QU as district_name,
ST_GEOGFROMGEOJSON(paris_districts.Geometry) as district_polygon,
SUM(FR_population.point_population) / ST_AREA(ST_GEOGFROMGEOJSON(paris_districts.Geometry)) as district_density,
FROM `speedy-octane-332614.population_tutorial.FR_population` as FR_populationJOIN
`speedy-octane-332614.population_tutorial.paris_districts` as paris_districts ON
ST_CONTAINS(
ST_GEOGFROMGEOJSON(paris_districts.Geometry),
ST_GEOGPOINT(FR_population.longitude, FR_population.latitude)
)
GROUP BY paris_districts.L_QU, paris_districts.Geometry
The area to enter your query in BigQueryGeoViz

Once the query ran, you can customize your display style with the following panel :

The area to customize your visualization in BigQueryGeoViz

Congratulation! You are now able to :

  1. Compute population on any custom shape
  2. Compute density on any custom shape
  3. Make quick visualization to check and share your work

Conclusion

I hope this tutorial has been useful for you and that you will find amazing usage of this fabulous data from Facebook High-Resolution Population Density Maps. Note that BigQuery GIS has many other functionalities that can help you to leverage your geospatial data in a very scalable way.

Moreover, BigQuery has several public datasets, directly available from the UI: so check this out!

Are you looking for Data Engineering Experts? Don't hesitate to contact us!

If you liked this article, you might be interested in computing the impact of a cloud project or deploying models in serverless backend

Cet article a été écrit par

Antoine Miquel

Antoine Miquel

Suivre toutes nos actualités

Use Airbyte to set up an ETL pipeline in minutes without code

6 min read

Databricks x DataHub: How to set up a Data Catalog in 5 minutes

5 min read

How to improve python unit tests thanks to Hypothesis!

4 min read