As your project grows, the process of installing Python packages with the correct versions can become quite slow. When this happens in your Continuous Integration (CI) pipeline, the feedback loop can seem endless. This guide aims to help you reduce your feedback loop time with GitlabCI and Docker. Poetry is now a widely adopted packaging and dependency management tool for Python. With its robust dependency resolver, it helps developers install and maintain their dependencies effortlessly.
Here are the key steps and lessons I discovered to streamline GitlabCI during a recent project. It all started with my struggle in managing dependencies and then transitioning all my jobs to Docker to expedite the entire pipeline.
Dependencies, GitlabCI Executors, and Docker
First, let's take a moment to consider what dependencies are. Dependencies are essentially relationships between different software components; one piece of software relies on the functionalities provided by another to function correctly. In Python, it's as simple as the
airflow module needing the
Flask module. However, some Python packages may require system-specific packages. For instance, the PostgreSQL client package,
psycopg2, needs the system package
libpq-dev to function. Since it's system-specific, it can vary from your local machine to your CI machine and even to your production machine.
In Gitlab, you have several types of executors to run your CI scripts. It's important to note that the default executor in GitlabCI is the Shell Executor, where your scripts run directly on the machine, specifically in a dedicated per-job shell. So, if you come across a dependency package with a system requirement, you can install it manually. However, this may not be the best approach.
Firstly, it can lead to conflicts with other software. Secondly, your primary concern should be isolation and reproducibility, which is precisely what Docker is designed for. GitlabCI offers other executors, including the Docker Executor. Each script runs in a dedicated Docker container, allowing you to control precisely which dependencies (both system-wide and Python-wide) you want to install in your image.
The isolation and reproducibility come at a minimal cost. Here's what you need:
- A Dockerfile that defines your image and installs, for example,
- A Docker registry accessible to your CI runner for both pushing and pulling images
Dockerizing Your Python & Poetry Environment: The Simple Approach
With these steps, you can safely replicate your environment from your local machines to the production environment. Now, let's focus on the Python and Poetry specific aspects.
At first glance, Dockerizing Python and Poetry may appear straightforward. Even though there's no official Poetry image on the Docker Hub, it only takes a few lines to build an image.
- You start from a base Python image. The Python version you set in your tag must match the version you specify in your
- Then you install Poetry with the official install.
- As a good measure, you can set the
POETRY_HOMEenvironment variable to control where Poetry will be installed.
And that’s it ! You can now build, tag and push your image to your registry and use it from GitlabCI. Just don’t forget to install your dependencies with Poetry.
Yes, it works. But, if like me, you use this image to run your tests in GitlabCI, you may notice it is terribly slow. Much slower than with our good old Shell Executor. In my case, we went from 2 minutes to more than 5 minutes to run our 4 test jobs. If you dig a bit, you’ll find out that the
poetry install is the one at fault. Let’s speed that up.
Optimizing Python Dependency Management with Poetry
We won’t exactly speed up the
poetry install, it’s nonsense. The key point is that you don’t have to install your dependencies on every run.
By nature, you don’t add or modify your dependencies on every commit. Your
pyproject.toml changes here and there, sometimes. Thus, installing your dependencies directly inside your Docker image is a good idea.
The best practice is to use the
poetry.lock file. It lists the exact versions of dependencies, as it is very well explained in the Poetry documentation. It can speed up the installation as the version resolution is already done.
Either way, you can now use a pre-built image that contains all your dependencies in your GitlabCI. You can remove the line
poetry install in your
.gitlab-ci.yml. Yet, you now have to solve the ultimate issue.
What happens if I want to add another dependency?
You can’t use the current image you have in your registry. You have to build it again. It can be painful if you have to do it every time you change your
The best way is to build it only when you need it, thanks to an additional job in your CI.
Let’s call this job Deps. This job should be triggered when there is a change on whether:
You can do this using
changes in GitlabCI.
In simple cases, you can set the image tag to something fixed like
latest for instance. If you are numerous in your team, you may face some issues like teammates overriding your image. In such case, you can compute an image tag by hashing these 3 files with the
md5sum native Linux function. Here is an example. Note that you can potentially exclude
To sum up, you would end up with a workflow looking like this.
- Dependencies can be both system-wide and Python-wide
- Docker ensures isolation and reproducibility - dependencies are the same from one image to another
- Install your dependencies - with Poetry - in an image you can reuse in GitlabCI to save time
- Optimize your CI by installing dependencies only when you detect a change
Looking for Data experts? Feel free to reach out!