Code recipes

setup-time dependencies in setup.py

(last updated: March 2018)

Setup-time dependencies in setup.py are a well-known problem, especially in the scientific Python world. Consider the classic example of distributing a Cython extension that needs numpy: one typically writes a setup.py that looks like

from setuptools import setup
from Cython.Build import cythonize
import numpy as np

setup(
    name="some-name",  # ... other kwargs ...
    ext_modules=cythonize("**/*.pyx"),
    include_dirs=[np.get_include()],
)

which means that the dependencies (Cython and numpy) need to be manually installed first (putting them in setup_requires doesn’t work).

While there are various projects going on to improve the situation (e.g. PEP518’s pyproject.toml), it is in fact relatively easy to work around the issue, without resorting to simply documenting the need for manual dependency installation, as long as one is willing to only use pip install (which is recommended these days anyways) rather than directly invoking setup.py: one can declare the dependencies normally, declare only a dummy extension (which is needed for setuptools to even attempt extension-building), and then, once setuptools is about to perform extension building, swap in the correct extensions. Indeed, at that point, the dependencies will have been installed: this is a guaranteed behavior of pip.

An actual implementation looks like

from setuptools import Extension, build_ext, find_packages, setup

class build_ext(build_ext):
    def build_extensions(self):
        # The key point: here, Cython and numpy will have been installed by
        # pip.
        from Cython.Build import cythonize
        import numpy as np

        self.distribution.ext_modules[:] = cythonize("**/*.pyx")
        # Sadly, this part needs to be done manually.
        for ext in self.distribution.ext_modules:
            ext.include_dirs = [np.get_include()]

        # Call `finalize_options` a second time (it sets some required
        # private attributes on the extensions).
        self.swig_opts = None  # Because distutils is not idempotent :-(
        super().finalize_options()

        super().build_extensions()

setup(
    name="some-name",  # ... other kwargs ...
    cmdclass={"build_ext": build_ext},
    ext_modules=[Extension("", [])],  # dummy module
    install_requires=[
       "Cython",
       "numpy",
    ]
)

Of course you may use either install_requires or setup_requires there. Semantically, it is correct to use setup_requires (as the end user doesn’t actually need Cython installed), but setup_requires relies on the old easy_install, which means that it’ll e.g. try to build numpy from source – rarely something worth your time. So I much prefer using install_requires there (for numpy, generally, it is not a problem as it is indeed also a runtime dependency).

Variants of this solution have been invented many times, of course. Matplotlib’s build script used to rely on an Extension subclass whose include_dirs is a descriptor that computes itself upon access (the idea being that access will occur in build_extensions, after the dependencies are installed):

class DelayedExtension(Extension):
    # Some code elided here.

    class DelayedMember(property):
        def __init__(self, name):
            self._name = name

        def __get__(self, obj, objtype=None):
            result = getattr(obj, '_' + self._name, [])
            if obj._finalized:
                if self._name in obj._hooks:
                    # The key point: compute the include path at this time.
                    result = obj._hooks[self._name]() + result
            return result

        def __set__(self, obj, value):
            setattr(obj, '_' + self._name, value)

    include_dirs = DelayedMember('include_dirs')

pybind11 suggests to populate include_dirs, not directly with path represented as strings, but with custom objects that compute their representations as strings by importing the dependency (again, the idea is that this occurs after the dependency has been installed):

# Modified for concision.
class get_pybind_include:
    def __str__(self):
        # The key point: compute the include path at this time.
        import pybind11
        return pybind11.get_include()

However, neither of these solutions work in combination with Cython, a case in which there isn’t even a list of extensions available to being with, and thus where a dummy extension is needed.