8 min read

Mypy: get rid of Python bugs with static type-checking!

Rédigé par Achille Huet

Achille Huet

Introduction

Python is known as a very easy-to-learn programming language for beginners. This is in part due to the fact that it is a dynamically typed language: the value and type of variables can change at any time.

However, this feature is also the source of a lot of bugs. When writing and using functions that accept only certain data types as arguments, you risk running into errors when the wrong data types are used, without any prior warning by Python.

In this article, I’ll introduce you to mypy, a python library that relies on python’s type-hinting feature to check your code and detect potential issues with variable types. This is called “static type-checking”.

This subject is quite vast, and to keep this article short, I’ve separated it in 4 parts:

  1. Why we need static type-checking
  2. Getting started with mypy
  3. Mypy basics
  4. Mypy in practice

Why we need static type-checking

Static type-checking is a process through which a piece of code is parsed and analyzed to make sure that variable types stay coherent between their creation and their use.

Data types

Fundamentally, data types give information on how to store and interpret data. A variable in a script is just a reference to a space in memory, whose size is fixed after allocation.

Trying to store new data in an existing variable can lead to issues in certain cases: if the new data requires more memory, we might end up overwriting other data; and even if the required memory is the same, the new data might need to be interpreted differently.

Statically typed languages

In statically typed languages such as C++ or Java, developers have to indicate the data type of variables, which must stay the same throughout the code.

This is done by adding type annotations, called “type hints”, directly inside the code. For example, when declaring a new numeric variable, we need to specify whether it is an integer, a long, a float, etc.

a simple C script that changes the type of the variable “x”

If these type-hints are inconsistent (as in the script above), an error is raised during the compilation process, which fails.

Python particularities

Python is a dynamically, implicitly, and strongly typed language.

  • Implicitly typed: Contrary to statically-typed languages, in Python, there is no need to declare the type of your variables. The interpreter determines by itself the type of your variable.
x = 3 # x is an int
y = 5.2 # y is a float
  • Strongly typed: Languages with weak typing automatically cast variables to other types in basic operations. While the following piece of code raises an error in Python, it yields “a1” in javascript!
"a" + 1 # raises an error
  • Dynamically typed: In dynamically-typed languages, variables can change types at any time.
x = 2
x = "hello" # python automatically reallocates memory associated to x

Consequences

In Python and other dynamically-typed languages, variable types are not enforced. Instead, variables can be reassigned to any data type, while the resulting interpretation and memory allocation issues are handled at runtime.

This is a feature that greatly reduces constraints on the developer, but also introduces some new challenges. For example, when using functions, it isn’t always clear which data type to use as an input if the signature is not explicit (as in statically-typed languages). This can lead to bugs due to the function being used incorrectly.

Example of a simple function that can be misused

In this code snippet, the developer has given us a good idea of how this function should be used. However, if we are not careful, we might try to use this function with a float … and end up with a script that crashes.

As a project gets bigger, this kind of situation gets more and more frequent as developers become less and less familiar with the entire codebase. This is why it is important to have ways of indicating data types.

Type hinting

Until Python 3.4, Python developers all had their own way of adding type annotations. This made it hard to juggle between different syntaxes, and impossible to develop any tools that could analyze these type-hints automatically.

Python 3.5 introduced typing, a module for type-hinting whose aim was to standardize type annotations in Python. This module became the base for 3rd party libraries that provide the user with static type-checking, code completion, and refactoring features.

The typing module is essential to mypy, as it provides the syntax that is used to add type hints in our python code. Mypy then analyses these type hints to determine whether the code contains typing mistakes.

Get started with mypy

Installing mypy

To install mypy, simply install it with pip :

pip install mypy

You can now check if mypy is correctly installed by running the following command in your terminal :

mypy --version

Now that mypy is installed, we can run it on any file to check if our types are correct. This requires that our code be typed, which means we need to familiarize ourselves with the typing module’s syntax.

The typing module: syntax and examples

The typing module enables us to provide type hints for any python code, from short simple scripts to very large and complex projects. Therefore, the possibilities for using type-hints are extremely diverse and can require a lot of experience to really master.

In this part, I will briefly summarize what you need to know about type-hinting, but you can (and should) check out the official documentation for more details.

Variables and functions

To type-hint variables and functions, we use the following syntax :

Examples of type-hinting for variables and functions
  • basic python types such as int, str, etc. can be directly used as types
    (Note: these are available only from Python 3.9+; for older versions, you should import the String and Integer types from typing)
  • more complex types such as Iterable, or Tuple should be imported from typing
  • typing-specific notations such as Optional need to be imported as well

Note that user-defined classes can also be used as types :

type-hinting using a custom class Number

Aliases

Aliases are simple derivatives of the classic python types. They can be used for documentation purposes (give more information to the reader), or be more concise.

example of type-hinting using aliases

Aliases can be very powerful tools when writing high-quality code, as they help communicate the developer’s intention as to how each function or variable should be used. This explicitness is part of the python zen, a philosophy for writing better python code.

Callables

Callables allow us to type-hint functions, whether they are variables, arguments, or returned values.

The syntax is as follows :

my_func : Callable[[argType1, argType2], returnType]

In the example below, we consider a function that takes an image and a transformation function as input and applies the transformation to the image before saving and returning it.

code of a transformation function with a Callable argument

For now, we have enough typing knowledge to start using mypy. However, to use mypy more efficiently, I strongly recommend getting more familiar with type-hinting. In particular, you should familiarize yourself with generic types, as these can be frequently used when coding medium to large-scale projects.

Mypy basics

1. Type-checking variables

Let’s create a simple python file vectors.py. We use an alias to create a type Vector3, which represents a 3-dimensional vector.

simple script to test mypy

We can then run mypy to type-check this file. In our terminal, we type the following command :

$ mypy vectors.py

This gives us the following error message :

code_check_1.py:8: error: Incompatible types in assignment (expression has type "Tuple[float, float, float, float]", variable has type "Tuple[float, float, float]") Found 1 error in 1 file (checked 1 source file)

Mypy detects that we have an error on line 8: a tuple with 4 elements is not compatible with the Vector3 type.

If we remove line 8 and run mypy again, we should get the following message:

$ mypy vectors.py
Success: no issues found in 1 source file

2. Type-checking functions

In a new file vector_functions.py, let’s consider the following code:

mypy test for function definitions

In this file, we’ve coded several ways to implement the addition and division of vectors, some of which are wrong. If you run mypy on this example, you should see the following error message:

$ mypy vector_functions.py
vector_functions.py:10: error: Incompatible return value type (got "Tuple[float, float]", expected "Tuple[float, float, float]")
vector_functions.py:14: error: Tuple index out of range
vector_functions.py:24: error: Incompatible return value type (got "None", expected "Tuple[float, float, float]")

When running mypy, all the errors are caught, and we are prompted to either fix the type annotation, or the code for each function.

3. Type-checking function calls

To have complete coverage of the code, mypy also checks that functions (as well as methods, class instantiations, etc.) are called with valid arguments. If a variable is created through a function, it also determines that variable’s type.

mypy test for function calls

Running mypy on this file gives us the following error :

vector_script.py:9: error: Argument 2 to "add_vectors" has incompatible type "Tuple[float, float]"; expected "Tuple[float, float, float]"

We can then quickly fix our mistake before running our script.

We see that for each function call and variable declaration, mypy is able to determine the newly created type, and check if this is coherent with further uses of the variable. This is how mypy is able to check interdependent bits of code, and detect bugs even in large codebases.

Mypy in practice

Refactoring with mypy

Mypy is a huge help when refactoring code because it lets you know immediately if a function call has incorrect arguments.

Consider the following script :

script implementing and using a class Person

Imagine that for each person, we now want to add the city they live in, to use this information in another part of the code. In that case, we’ll write :

class Person(TypedDict):
name: str
age: int
city: str

If we think that we’re done, we can run mypy on our script: we get the error
person.py:12: error: Missing key "city" for TypedDict "Person" which indicates that the “city” argument is missing in our instantiation of Andy and Suzy. If we had tried running the script, we would have gotten an error - instead, we detected the mistake before anything bad could happen.

To fix this, we can add the city information for Suzy and Andy :

people = [
Person(name="Andy", age=45, city="Paris"),
Person(name="Suzy", age=30, city="London"),
]

Running mypy again, we get another error :

person.py:17: error: Extra argument "city" from **args for "print_information"

This tells us that we tried calling print_information with too many arguments. Again, we can fix this mistake :

def print_information(name: str, age: int, city: str) -> None:
print(name, age, city)

This time, when running mypy, no error shows up - we’ve completed our refactoring!

While this refactoring process was quite simple and could have been done the right way straight from the start, it is common for the refactoring of larger projects to become quite messy. Mypy becomes a very useful tool for overcoming these difficulties, by locating typing mistakes and helping you fix them step by step.

Pros and cons of mypy

After working with mypy on several personal and professional projects, I’ve compiled some basic tips for starting out with this module: what mypy is best suited for, which situations you should avoid, etc.

Pros

  • greatly limits the number of bugs
  • huge help for refactoring
  • promotes having a clean code base
  • facilitates the use of abstract classes
  • great to use with native python code and/or small-sized projects

Cons

  • doesn’t work well with certain libraries (ex: pandas or pyspark dataframes)
  • hard to detect problems at interfaces with files or other languages
  • can be quite difficult and time-consuming to find the right typing syntax in complex projects
  • developers can tend to rely too much on mypy, even though it doesn’t detect all bugs (ex: division by zero, NaNs, incorrect script inputs, etc.)

Tips

  • configure mypy to disallow untyped variables and function definitions: this will force you to type-hint everything but will ensure that mypy works as best as it can
  • set up a pre-commit hook to run mypy every time you commit, to keep a clean, bug-free git history
  • use custom or native python classes to store information from objects that mypy doesn’t handle (ex: use dataclasses to store the contents of a pandas dataframe)
  • if you really need to use objects that mypy doesn't handle very well, keep them isolated from the rest of your code as much as possible

Synergy with other tools

Mypy shares some type-checking functionalities with most IDEs, which have their own type-checking algorithms.

pylance detecting typing issues in VSCode

However, unlike IDEs, mypy can enforce the use of typing everywhere, and require the developer to fix typing issues. I also like the fact that by running mypy, you get an extensive list of all incorrect types, while with IDEs you need to look through your entire workspace to see if you impacted other files. Also, the ability to use mypy in a pre-commit hook is a huge plus when trying to keep a clean git history.

Conclusion

Mypy is an amazing tool for static type-checking in python, which greatly helps to reduce bugs in your code by detecting them before they can cause any damage. By using it on fully type-hinted projects, you can speed up refactoring processes and improve your general productivity. Finally, although type-checking can already be done directly by most IDEs, mypy offers many complementary features to keep a well-typed codebase.

As a general rule, you should always use mypy in your CI (with a pre-commit or pre-push hook) on projects that are well-suited for type-hinting. For development purposes, I believe that it is good to rely on both mypy and your IDE, as the two will complement each other. Sometimes one might detect a mistake that the other doesn’t, and your IDE will give you real-time information as you code, while mypy runs a final, exhaustive check when you commit or push.

Thanks for reading, and stay tuned for updates on typing practices !

Are you looking for Python Experts? Don't hesitate to contact us!

Cet article à été écrit par

Achille Huet

Achille Huet

Suivre toutes nos actualités

Data migration: Thinking about using AWS Data Pipeline? Think twice

4 min read

Machine learning metrics are as essential as your model

4 min read

Fundamentals of NLP with multi-choice question generation

6 min read