Unleashing Chaos: The Power of Random Testing in Scientific Software Development

As a domain scientist doing high-performance computing, you might find the need to code yourself. Be it that you need custom scripts for efficient file handling, extending the functionality of pre-existing software, or even coding a larger software project from scratch. In any case, it is not enough just to write the code. Instead, software development also involves testing and documenting the produced code.

In this article, we want to introduce a technique that expands your arsenal of tried and tested methods for software testing. What we will not do is go into the details of software testing in general.

Property-based testing

Contrary to what one might assume based on the name, random testing does not mean manually inputting random variables and hoping for the best. Instead, using a random testing strategy, you will test against a specification, i.e., against properties you believe to hold true. This specification is used as a test oracle and for test case generation. Based on the properties of your function, you define invariants which should hold. Then, you test if this assumption holds by randomly generating a large number of test cases. This specific variety of property testing is implemented in the „QuickCheck“ software library. Originally written in Haskell, it is now available in many programming languages, including Python.

Concrete example: DNA sequence alterations

Ensure to install the required packages:

pip install pytest pytest-quickcheck biopython

Let us assume we are writing a Python software that takes a DNA sequence and alters it in such a way that the resulting protein sequence remains unmodified.

import pytest
from Bio.Seq import Seq


def alter_dna(seq):
    """Synonymously alter a DNA sequence"""
    # Custom function logic here
    altered_seq = seq
    return altered_seq


def translate(dna):
    """Translate DNA into protein"""
    return str(Seq(dna).translate())


@pytest.mark.randomize(seq=str, choices=["A", "C", "G", "T"], ncalls=100)
def test_altered_seq(seq):
    altered_seq = alter_dna(seq)
    original_protein = translate(seq)
    altered_protein = translate(altered_seq)
    assert original_protein == altered_protein

Save the test script to a file named seqtest.py and invoke it as follows:

pytest seqtest.py

This code snippet generates 100 random DNA strings, passes them to a custom function that is expected to introduce synonymous mutations (i.e., altering the DNA sequence in a way that leaves the produced amino acid sequence unmodified), and, finally, checks against the specification: are the protein translations of the original and altered sequence identical? If the translation property is not preserved in any of the generated test cases, the test will fail and print information about the failing case.

This is an overly simplified example, lacking many of the subtleties a real QuickCheck-style test would have. Still, this should give you the idea that testing assumed properties against random input will help in exploring unforeseen edge cases and ultimately lead to better code quality.

Summary

QuickCheck is a software library that assists in software testing by generating test cases for test suites. It is a way to do property-based testing against assertions about logical properties a function should fulfill using randomly generated input. It can help identify bugs and defects that might not be found through other testing methods.

Acknowledgements

I first learned about QuickTest at the CCC 2019, in the talk “Getting software right with properties, generated tests, and proofs” by Mike Sperber. The concept dates back to a Haskell tool written by Koen Claessen and John Huges in 1999.

Author

Stefanie Mühlhausen