Unit Testing for Data Engineers

Do it. Or Not.

Feb 19, 2024

Maybe you've gone your entire career without writing a single test, or you're in a place that writes them but you can't quite figure out what value they bring. Maybe you simply don’t like them.

Perhaps you've endured incessant battles about how much code coverage is enough or whether you really need to strive for 100%.

Maybe you’ve taken a 7-hour road trip to the Great White North sitting next to a TDD zealot who refuses to be brought into the light of day like some test vampire hiding underneath a desk sucking the life of unsuspecting victims.

If you're here, and you aren't clear on what the reasons (and yes, there's more than one) are for test-driven code, or not, I hope to offer you a path to enlightenment.‘

You should check out Prefect, the sponsor of the newsletter this week! Prefect is a workflow orchestration tool that gives you observability across all of your data pipelines. Deploy your Python code in minutes with Prefect Cloud.

What even is a test?

This question doesn't have a single straightforward answer, leading to some of the confusion people often experience around testing. Instead of talking about what a test is, it might be helpful to talk about what a test does (when done properly):

Validates correct behavior
Documents your code
Protects against regressions
Justifies your code

We will call these our four pillars of proper testing.

And, to be clear, "properly" means understanding TDD ("test-driven development") and being disciplined about it, not simply creating a few aimless tests for the sake of it.

If you're not disciplined, the value of tests plummets rapidly, but it's hard to be disciplined if you don't appreciate the value of testing in the first place.

Whether you decide to embrace classic TDD as propounded about the aged pundits stuck in the past, or simply better testing than you have today … well that’s up to you. Choose wisely and carefully.

Validating behavior

I think this point is the most self-evident on the list. Basically, the SUT ("system under test") is a function, and you want to demonstrate its correct working behavior by throwing inputs at it to see what kind of outputs you get.

Also, if you're not writing pure functions, you have some hidden complexities with side effects that also need to be addressed in tests. Thus, if when thinking of side-effects (e.g. I/O, system clock, global variables), it should be as if they are also inputs to the system.

For example … we have this function.

def add(x, y):
    """Add two numbers together."""
    return x + y

In this play example, we should ensure we are testing the function from multiple angles if you will.

import unittest

class TestAddFunction(unittest.TestCase):
    def test_add_positive_numbers(self):
        self.assertEqual(add(3, 4), 7, "Should be 7")

    def test_add_negative_numbers(self):
        self.assertEqual(add(-1, -1), -2, "Should be -2")

    def test_add_zero(self):
        self.assertEqual(add(0, 0), 0, "Should be 0")

    def test_add_positive_and_negative(self):
        self.assertEqual(add(-5, 5), 0, "Should be 0")

    def test_add_with_floats(self):
        self.assertAlmostEqual(add(0.1, 0.2), 0.3, places=1, msg="Should be approximately 0.3")

if __name__ == '__main__':
    unittest.main()

I think the point is you need to test a function fully, not just as a token sacrifice to the Angry TDD Gods. Look here … I wrote a test … who knows if it’s any good … it’s there!

In case you're not already thinking of it in these terms, exceptions are outputs. Tests must not only account for the obvious outputs, but also provide for all the ways your code could break.

At the end of the day, you should be able to write a test for every possible way you can use your function.

Documenting your code

If done well (and in conjunction with clean code principles and other best practices as appropriate), your tests will serve as living documentation for your code.

For starters, if you're observing the clean code principles (e.g.: naming things well, single responsibility), your code may already be answering the question "what is it doing?" simply by existing.

As necessary, you might also have comments to explain "why do we do this?" where names alone don't make things clear. Where tests come in is in answering "how do we do this?" or "what can we do?"

Every test is living proof that "if you give my function XYZ, it gives back ABC." You could spend the time to write detailed documentation for your code to serve this purpose, but if you're already writing tests, you get this documentation for free.

Protecting against regressions

Thank you for reading Data Engineering Central. This post is public so feel free to share it.

If we're being honest, writing the program the first time is the easy part. Maintenance down the road, such as bug fixes or adding features, is the hard part; if you pull a thread you want to know with confidence you didn't just unravel the entire ball of yarn by mistake.

If you've been properly test-driving your code, you're also protecting yourself from code tampering down the road.

If you've been pursuing comprehensive test coverage, you should be able to fearlessly refactor your code and know whether you've made things worse in the process.

Justifying your code

To be clear, if you decide to approach TDD as a discipline, you are not simply bolting tests onto already-written code. You must write the test first, and any subsequent code is an effort to appease that test.

When letting tests dictate the needs of your codebase, you are giving your code an immediate reason to exist, and ensuring that you're only writing the code necessary to fulfill requirements.

If done right, you will have no code that exists without a purpose, and those purposes will be explained by the tests themselves. It's an extension of the "living documentation" point above, and it is also the key to 100% code coverage.

When done this way, you're ensuring that nothing is "slipping through the cracks" in your codebase and you should have few surprises.

What kinds of tests do I need?

Yes, that's right, there's more than one kind of test.

When discussing TDD, the most famous example is the "unit test," one that minimally validates a particular behavior with a simple "if given this, XYZ happens" workflow. These unit tests are developer-centric - meaning, they exist for the purposes of facilitating development and documenting the codebase for other developers.

If you have a lead developer, you might be getting your specifications in the form of low-level requirements that translate directly to unit tests. In all likelihood, though, you'll also (or maybe even exclusively) be getting your requirements in high-level "I want a doohickey that does XYZ" language.

In this paradigm, you might start to see BDD ("behavior-driven development"). Rather than TDD's approach of eliciting tiny incremental code changes to drive toward a working product, BDD presents scenarios that your code must fulfill.

Tests of this nature tend to produce more human-readable specifications than TDD, and serve as an excellent bridge between developers and other stakeholders such as QA or product owners.

Not everything you write will be an isolated function. It's also not always practical to use TDD to test complex functions that themselves are composed of other SUTs that have been tested elsewhere. For this, you'll find yourself reaching for mocking tools.

Also, you'll inevitably write code that pulls in external dependencies, which also leads to boundary tests or contract tests to ensure that you're using dependencies correctly and that they are what you think they are (e.g. ensuring that an API you depend on hasn't changed on you since you first started working with it).

Choosing TDD

I won't get too detailed here, but there are some points I want to call out …

For TDD to work, you need to have a goal in mind, lest you fall into an underpants gnomes trap.
Work in incremental steps. You're not writing tests that elicit entire programs at once, but instead are trying to tease out a working program from simple tests that gradually start to build a big picture.
As you're writing tests, they will initially be failing, and the test should then force you to go back to your code and write the least amount possible to make the test pass without breaking the other tests.

Choosing BDD

One of the biggest critiques of TDD is that it assumes that you can reason out a large system by first working through its smallest units of work.

For that to work, though, you need a handle on what those units are, and sometimes the problem you're tackling isn't so well-defined.

Also, with that level of granularity, it can be hard to "understand" what the program is supposed to be doing simply by virtue of the sheer volume of test noise. In many ways, it's a death-by-papercut approach to writing software.

BDD, on the other hand, looks at a function as a black box and is less dogmatic about how the function is written. With TDD, the developer is often trying to get a developer to write a specific line of code, and it can often lead to code that exists because the developer "knows" that it will be needed.

Note, it doesn't have to be either/or - BDD can complement TDD within a project, and can be a great way to give TDD a sense of direction or purpose.

For BDD, a DSL exists in the form of Gherkin - a human-readable specification for how a feature should be implemented. A feature comprises scenarios, each of which is responsible for explaining the different kinds of inputs/preconditions and how they translate to an expected outcome.

The Python library behave is able to parse these files and relies on developers to then flesh out how the different steps translate to actual test code. While the Gherkin files make for excellent documentation (far better than what comes out of TDD), the process of implementing the steps can create more cruft and work than the simpler TDD tests, and requires the additional testing framework to pull off.

Alternatively, Gherkin could serve as a rough structural template (given/when/then) for how to write tests, giving the tests a BDD-like pattern without the additional overhead of a framework.

Unit tests for beginners

So, you've decided you want tests but aren't sure where to start. First, let's put some lines in the sand, if only to make your life easier:

Don't write tests to fit the implementation - this is backwards, and you'll inevitably get something wrong
Don't try to force a specific implementation through janky tests; too many opinionated developers try to force a pairing partner to write specific code, and the resulting tests are hideous
Work incrementally; resist the urge to "know" in advance what you need to do (YAGNI). If it's needed, you'll eventually end up writing the test, so don't get ahead of yourself.

Ultimately, all this means you're not going back and retrofitting existing behavior with tests.

Example

Let's do something simple - a calculator. It's the perfect foundation for an incremental development workflow, but today we're just going to get some basics knocked out.

Strap in, we're in for a bit of a ride...

Our first test

Let's be as naive as possible. Start by asking what the most fundamental aspect of a calculator is - my own answer to that is that it is a thing that gives us results:

calculator.py:

import unittest

class CalculatorTests(unittest.TestCase):
    def setUp(self):
        self.calculator = Calculator()

    def test_initial_result(self):
        # SUT (system under test):
        result = self.calculator.getResult()

        # Validate expected result, giving a meaningful explanation:
        self.assertEqual(result, 0, "initial result should be 0")

Above, we have a "happy path" test that asserts that our not-yet-written Calculator gives us an initial result of 0. The structural elements of this test are:

setUp wires up dependencies for us to make an assertion; it's invoked anew for every test (we wouldn't want one test's behavior to mess with the results of another, after all)
test_initial_result is the test function representing our scenario, in which we call the getResult method and assert that it initially spits out a 0 result
self.assertEqual(<actual>, <expected>, <explanation>) asserts that the SUT result is as desired, and when it inevitably fails on our first go, we'll get a nice message explaining why it failed.

We run this using the pytest test runner from the command line:

$ pytest calculator.py

It should explode gloriously because we haven't given it a Calculator to work with yet. This is expected, and the resulting failure from the test runner is the impetus needed to get us to write some code:

================================================== FAILURES ===================================================
_____________________________________ CalculatorTests.test_initial_result _____________________________________

self = <test.CalculatorTests testMethod=test_initial_result>

    def setUp(self):
>       self.calculator = Calculator()
E       NameError: name 'Calculator' is not defined

test.py:5: NameError
=========================================== short test summary info ===========================================
FAILED test.py::CalculatorTests::test_initial_result - NameError: name 'Calculator' is not defined
============================================== 1 failed in 0.03s ==============================================

Now that we have a failing test, let's address the immediate failure we're given - in this case, we need to create a Calculator class to satisfy the test runner; write the least amount of code possible to clear that error and move onward:

class Calculator:
    pass

I place this directly above the CalculatorTests class, and now get a different error:

>       result = self.calculator.getResult()
E       AttributeError: 'Calculator' object has no attribute 'getResult'

Okay, now we write just enough code to satisfy the need for getResult:

class Calculator:
    def getResult(self):
        pass

...
>       self.assertEqual(result, 0, "initial result should be 0")
E       AssertionError: None != 0 : initial result should be 0

... and our resulting error now forces us to give a result:

class Calculator:
    def getResult(self):
        return 0

Finally, we get a success message from the test runner. It may seem like a lot of work, but it does get easier from here; that first test led to a lot of infrastructure, but subsequent tests will require less effort.

More tests

Our second test will set us up for additional behavior (the add function). We're ultimately looking to create a function that lets us add to our accumulator with every function call. To start, let's add a new method to the CalculatorTests suite:

class CalculatorTests(unittest.TestCase):
    # ... skipping over the existing implementation

    def test_add_zero(self):
        # SUT (system under test):
        self.calculator.add(0)

        result = self.calculator.getResult()

        # Validate expected result, giving a meaningful explanation:
        self.assertEqual(result, 0, "adding 0 to initial value should give 0")

So, we're adding 0 to our initial 0 and expecting 0 back. Your first step is to run tests, and see the error that forces us to then define the function:

    def test_add_zero(self):
        # SUT (system under test):
>       self.calculator.add(0)
E       AttributeError: 'Calculator' object has no attribute 'add'

So, just as before, you add the function:

class Calculator:
    # ... skipping over the existing implementation
    
    def add(self, value):
        pass

This should be enough to pass our test. Hopefully, you resisted the urge to do more, since that would be wrong. Instead, if we want to make our add function do more meaningful things, we'll need a new test to give us that functionality:

# We want random numbers for this next test:
import random

class CalculatorTests(unittest.TestCase):
    # ... skipping over the existing implementation

    def test_add_random_value(self):
        # given a random number to add:
        toAdd = random.random()

        # when adding that number to the calculator:
        self.calculator.add(toAdd)

        # then our accumulator should equal our added value:
        result = self.calculator.getResult()
        self.assertEqual(toAdd, result, f"adding {toAdd} to initial value should give {toAdd}")

To make our test as robust as possible, we use random inputs. We could just write some tests with fixed values for input, but if we're being TDD purists, we'd end up with a switch statement covering just the fixed inputs. Here, we're forcing our implementation to account for any numeric value.

        result = self.calculator.getResult()
>       self.assertEqual(toAdd, result, f"adding {toAdd} to initial value should give {toAdd}")
E       AssertionError: 0.770880984964011 != 0 : adding 0.770880984964011 to initial value should give 0.770880984964011

Now, you're in for a treat: in order to make this test pass, we'll need to add some state to our Calculator to let it hang onto results, and also need to go back and retrofit our getResult implementation to supply that value.

Remember, we're making the smallest changes here that don't ultimately break previously-passing tests:

class Calculator:
    def __init__(self):
        # Give us some state, initialized to 0
        self.accum = 0

    def getResult(self):
        # Return the results we've accumulated thus far
        return self.accum

    def add(self, value):
        # Since we only add one time at most, we can get away with blowing out previous state:
        self.accum = value

As noted in the above comments, our tests aren't yet robust enough to force us to actually add; thus far, we've only asserted that we can get the same value back out that we passed in with add. As tedious as it sounds, this justifies another test:

class CalculatorTests(unittest.TestCase):
    # ... skipping over the existing implementation

    def test_add_multiple_values(self):
        # given a random number to add:
        toAdd1 = random.random()
        toAdd2 = random.random()

        # when adding both numbers to the calculator:
        self.calculator.add(toAdd1)
        self.calculator.add(toAdd2)

        # then our accumulator should equal our added values' combined total:
        result = self.calculator.getResult()
        expected = toAdd1 + toAdd2
        self.assertEqual(expected, result, f"adding {toAdd1} and {toAdd2} to initial value should give {expected}")

Now, we're forced to go back to our add implementation and actually accumulate our results. In its entirety, the code should now look like the following:

import random
import unittest

class Calculator:
    def __init__(self):
        self.accum = 0

    def getResult(self):
        return self.accum

    def add(self, value):
        self.accum += value


class CalculatorTests(unittest.TestCase):
    def setUp(self):
        self.calculator = Calculator()

    def test_initial_result(self):
        # SUT (system under test):
        result = self.calculator.getResult()

        # Validate expected result, giving a meaningful explanation:
        self.assertEqual(result, 0, "initial result should be 0")

    def test_add_zero(self):
        # SUT (system under test):
        self.calculator.add(0)

        result = self.calculator.getResult()

        # Validate expected result, giving a meaningful explanation:
        self.assertEqual(result, 0, "adding 0 to initial value should give 0")

    def test_add_random_value(self):
        # given a random number to add:
        toAdd = random.random()

        # when adding that number to the calculator:
        self.calculator.add(toAdd)

        # then our accumulator should equal our added value:
        result = self.calculator.getResult()
        self.assertEqual(toAdd, result, f"adding {toAdd} to initial value should give {toAdd}")

    def test_add_multiple_values(self):
        # given a random number to add:
        toAdd1 = random.random()
        toAdd2 = random.random()

        # when adding both numbers to the calculator:
        self.calculator.add(toAdd1)
        self.calculator.add(toAdd2)

        # then our accumulator should equal our added values' combined total:
        result = self.calculator.getResult()
        expected = toAdd1 + toAdd2
        self.assertEqual(expected, result, f"adding {toAdd1} and {toAdd2} to initial value should give {expected}")

Are you willing to drink the kool-aid?

At this point … you probably get the point. And, one this is clear, you are either going to decide to head down the road of TDD or not.

For some people, the above work seems meaningless and they will just never put in the time or effort required to embrace a TDD testing approach that is pure. Humans will be humans.

You will generally find the following types of folks …

No tests ever … they slow me down and I won’t do it.
I will write token tests here or there to make someone happy
I will die on the TDD hill of greatness above all else.
I will write good unit tests for all my code after I have written my code.

Conclusion

Hopefully what we covered today gives you an appreciation for what testing looks like (what types of testing is out there), why you want it, and some of the things to keep in mind as you put it into practice in your own code.

If you look into other online resources, you'll quickly discover myriad takes on testing. Some question its fundamental value, others will have opinionated takes that challenge everything above.

There is no one right way to test. It's a tool, like any other, and if done well should be an asset to your code.

Also, etiquette for many projects demands that you accompany your code with tests. Try to contribute to some projects, and you'll in many cases be expected to provide backing tests for your changes - consider this example CONTRIBUTING.md file, for instance, whose guidelines include the following point:

Include test coverage. Add unit tests or UI tests when possible. Follow existing patterns for implementing tests.

In short, whether or not you believe in adding tests to your own code, you may find yourself having to do so regardless if your career in software development is to flourish.

Michael

Feb 20, 2024

As data engineers, most of our functions deal with dataframes. It’d be great to cover how to do testing with polars or pyspark. To me, the added complexity of dataframes makes testing most data pipelines very impractical.

Expand full comment

Adrian Pasek

How about "data-aware" tests where we are not testing functionality per se but the data flow itself, e.g if I'm ingesting 100 records do I have these records in my final data-delivery layer or am I missing something? Any good approaches?

2 more comments...