Python

October 18, 2022 • 4 min read

How to improve python unit tests thanks to Hypothesis!

Rédigé par Emilio De Sousa

Sometimes, I have the impression that my unit tests are not enough, and that they don't cover my problem enough. Or that they are, on the contrary, too coupled to my production code and that I have to modify my tests at the slightest change in implementation details. This is why, today, I will introduce Property Based Testing and how it can help us write more robust code.

We like tests, right?

More and more people understand the importance of writing tests, not only to find bugs but also to provide a better design of our code.

We believe that the major benefits of testing happen when you think about and write tests, not when you run them.
The Pragmatic Programmer: Your Journey To Mastery, 20th Anniversary Edition - Andy Hunt, Dave Thomas

However, our usual unit tests, which I will call "Example Based-Testing" for the rest of the article, are:

Costly to write
Costly to maintain
Opaque because it is sometimes difficult to understand their meaning with all the data defined, while only a part of it is needed
tests are code and can have errors, this is even more true when the size of the tests increases: the more complex the tested code is, the more complex the tests are, so the more the chances of bugs increase
possibly not sufficient because we integrate our own biases of understanding of the business needs to write the examples. Thus it happens that our tests do not cover all the cases necessary for the robustness of our code

"Testing can reveal the presence of errors but never their absence"
Notes on structured programming - Edgar W. Dijkstra

Property-based testing helps us write better tests

PBTs are not intended to replace example-based testing but to complement it. Indeed, a classical unit test will test the assertion of a function output for given inputs.
With PBTs, the idea is to be less specific about the result, the important thing is to ensure that the business requirements are validated. This has the advantage of leaving more freedom in the implementation since its coupling with the test is less important.

What are Property Based Testings?

PBT is about identifying and testing invariants. An invariant is something that will always be true, no matter what data you provide to your algorithm.
To do this, you can use a framework that will generate random data and check if the invariant remains true. Each time you run your test suite, it will test different combinations (usually about 100). It is important to note that a successful PBT test does not mean that the implementation is correct, it just means that the framework has missed the implementation. It is quite possible that it finds an edge case after several hours, days, weeks, months... Unlike tests, because each run we have different inputs, we can fix an error thanks to our tests a long time after writing it.

How to implement PBT in python? With Hypothesis!

To discover what PBT is, we will see a quick example of an addition function in python.

	from hypothesis import given
	from hypothesis import strategies as st

	@given(st.integers(), st.integers())
	def test_commutativity(x: int, y: int):
	assert add(x, y) == add(y, x)

view raw hypothesis_simple_example.py hosted with ❤ by GitHub

Testing commutativity of the add function with Hypothesis

Here we want to PBT our add function using the commutativity property and Hypothesis:

@given: this decorator let us define the valid inputs for our test.
strategies: We specify inputs thanks to it, which describe the range of valid values for the argument to take. Hypothesis implements lots of strategies to generate lots of different input types, like DataFrames, Datetime, strings from regex, etc...

In the above example, we are using python type hints, but if you want to have python type checking, read this article!

Hypothesis also has some very interesting features to make our life easier:

To be able to debug quickly, a good PBT framework is able to do shrinking. Once the limiting case is identified, it will work on the data used to try to simplify it as much as possible while reproducing the error. This makes the analysis effort easier: imagine a function that takes a list as an argument, is it my list of 250 elements or just one element that crashes my code? If it is an element, shrinking can isolate it.
When a test fails, Hypothesis keeps in memory the test inputs and restarts the next tests with this data set. This allows us to avoid a possible regression once we have fixed the test.
The generation of data being "random", Hypothesis uses a seed to help us to have a "deterministic" behavior. So, let's imagine that a test fails at the level of the CI, Hypothesis gives us the seed that allowed us to generate the inputs and we can then run the tests again locally with the same data.

Example of properties for our tests

List of common mathematical properties:

Associative: a + (b + c) = (a + b) + c
Commutative: a + b = b + a
Distributive: a(b + c) = ab + ac
Idempotent: f(f(x)) = f(x)As a Data Engineer, the concept of idempotency is very important: it ensures that running a pipeline several times will not create duplicates or fail by deleting data.

Other test patterns

Reversible Operation pattern

Applying an operation and its opposite should return to the initial state, it is sometimes called Encoder Decoder pattern or There And Back Again

	@given(st.lists(st.integers()))
	def test_reversible_operation(my_list:List[int]):
	assert my_list.reverse().reverse() == my_list

view raw reversible_operation.py hosted with ❤ by GitHub

“Should not explode” pattern

System robustness testing
For the example, we take an API but this pattern works for all systems that must be robust to data returned by other systems and not return unexpected errors (here the expected codes are 200, 401, 400 and 404, all other codes would be unexpected)

	@given(st.integers(), st.text(), st.integers())
	def test_should_not_explode(id:int, sort:str, max:int):
	response = requests.get(API_URL.format(id, sort, max))

	in_expected_status_code = response.status_code in [200, 401, 400, 404]
	assert response and response.json() and in_expected_status_code

view raw should_not_explode.py hosted with ❤ by GitHub

Old-New or Naive-Complex (or Testing Oracle)

Legacy vs New implementation or New implementation vs Brute Force
Here, we have already an implementation that produces the right outputs, but we want to improve the way we produce it (like refactoring to get a better code or to get some performance improvements). Thanks to Hypothesis, we will be able to generate tons of outputs to challenge our new implementation and get some confidence about it.

	@given(st.lists(st.integers()))
	def test_old_vs_new(my_list:List[int]):
	assert legacy_implementation(my_list) == new_implementation(my_list)

view raw testing_oracle.py hosted with ❤ by GitHub

No unexpected changes

Some operations should not modify some attributes of our elements, for example, a sort algorithm, should not modify the length of our list. Testing this property gives us confidence that a part of our behavior is correct, but not all the functions (we don’t check directly the sorting algorithm).

	@given(st.lists(st.integers()))
	def test_no_changes(my_list:List[int]):
	assert len(my_list) == len(sorting_function(my_list))

view raw no_unexpected_changes.py hosted with ❤ by GitHub

Hard to prove, easy to verify

For some problems, it is quite easy to verify the good behavior, like the max product of a list (we only have to check if the output is greater or equal to two different elements of the list)

	@given(st.lists(st.integers(), min_size=2))
	def test_hard_to_prove_easy_to_verify(my_list:List[int]):
	assert max_product(my_list) >= my_list[0]*my_list[1]

view raw hard_to_prove_easy_to_verify.py hosted with ❤ by GitHub

Last words

PBT is an additional tool in the software craftsman's toolbox. It allows one to gain confidence and reliability when developing new features. It remains complementary to classical unit tests because properties only cover a limited area of the behavior of our functions. Properties are sometimes difficult to find and force us to think about them before developing.

The tests can be long to run, that's why I invite you to define locally the number of iterations per test quite low (<100) but in the CI to increase it (>1000) thanks to the decorator hypothesis.settings(max_example=<nb_examples>). This will reduce the execution time locally, therefore, improve the feedback loop.

Unit tests alone are not enough in all cases, and that is why we have to do end-to-end tests. At Theodo Data & AI we are aware of this and we have a great article that demonstrates how to create data pipeline tests.

Thank you very much and see you next time for a concrete example of property testing!

Are you looking for Data Engineering Experts? Don't hesitate to contact us!

Cet article a été écrit par

Emilio De Sousa