Notebook code quality

by Ty Myrddin

Published on April 11, 2022

A read–eval–print loop (REPL) such as Jupyter Notebook offers, is a simple interactive computer programming environment that takes single user inputs, executes them, and returns the result. This facilitates quick feedback during development and allows for developing code at a faster speed than if it had to be run as one monolithic piece of code.

The way Jupyter Notebook works with code in blocks, which can be run independently, makes it easier to experiment and trial certain operations without having to re-run the full workflow. Ideal for analysis tasks that require both numerical and visual outputs.

Not so ideal when trying to leverage notebooks for production use rather than an analytical use case. And in our case, we wish to produce a quality snippet code base reuseful for future production projects.

So we want Quality.

Black-Jupyter

We do code formatting with black-jupyter. Running Black Jupyter on all notebooks from the root of a repo is as simple as:

$ black .

Flake8_nb

We check PEP8 compliance with flake8_nb. Running Flake8 NB on all notebooks from the root of a repo:

$ flake8_nb notebooks

Automation with pre-commit

We automated the code formatting and PEP8 compliance processes by using the pre-commit framework and its hooks.

It runs a short script before committing. If the script passes, then the commit is made, else, the commit is denied (and we fix whatever is up).

Mypy

For additional type annotations we installed mypy with data-science-types for libraries like matplotlib, numpy and pandas that do not have type information, and nbqa. Run on all notebooks in the root directory of a repo with:

$ nbqa mypy .

To be continued ...

Oh well. Last orders, please. Waiter