close menu

Mostly painless packaging for Python applications

20 June 2019

This article was originally published on my Medium page .

I think Python is a weird language. It is dynamic. It is interpreted. It is wildly popular. It is… multi-multi-paradigm. You can write Python a thousand and one different ways.

Supporting anything and everything from straight-line scripting to functional programming to object-orientation, with a loose module system, dynamic imports, and arbitrary entry points. Python projects do not have a universally defined structure!

With no definitive One Way, this wild-west can make it a pain to reason about structuring Python applications, and then eventually packaging them for distribution and clean installation/uninstallation.

For anyone not at all familiar with Python development, there are a couple things about it that I find surprizing and mildly uncomfortable. Namely:

  1. The weird ceremony of if __name__ == "__main__": that says, "Hey interpreter! Please don’t execute this code block unless I am the main program!".

    If you see this often, it's because Python makes no distinction between library code and application scripts. You are free to import any module that contains application code as a library. This condition just guards application entry points from running if someone decides to use your code as a library. As someone who's used to writing one form of int main() or another, this is cool, but weird.

  2. The presence of this platform call: sys.path.insert(idx, directory)

    This function mutates the PYTHONPATH, affecting the resolution order for all modules/files/classes in the current application. This feels gross and dangerous, and at first glance it feels like shouldn’t be touched by anyone, ever.

Using the latter platform call has some major consequences. The entries of the PYTHONPATHare special — they define where your Python installation is — yet you can override it, change its ordering and subsequently break stuff.

A Python 3.5 installation on Ubuntu might have the following default path.

>>> import sys
>>> print(sys.path)
['', '/usr/lib/', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']

Programmer carelessness aside, the fact that such an important path is so easily mutable makes me feel nasty because I can imagine how easy this could make it insert a malicious version of a class into the path before a nice version of a same class. I basically think that things like sys.path should be avoided like the plague because for any legitimate work, we have nice, safe tools like virtualenv for path management.

So it surprised me when I initialized a new project with GNOME Builder and I saw something like this in a Meson template file!

# Meson replaces this with a value from its configuration data.
pgkdatadir = '@pkgdatadir@'
sys.path.insert(1, pkgdatadir)
if __name__ == '__main__':
    from app import Application

I was confused and appalled! ... Until I understood what the generated script was doing. It’s actually a neat hack that hides away arbitrary project structures and makes packaging a dream.

The pkgdatadir

When you’re building and installing a Meson project, you often do something like this:

meson _build --prefix=/usr/share
ninja -C _build install

In my case, my file sets the configuration variable pkgdatadir to the prefix specified above. So, the above incantation goes ahead and installs all the application code there (no surprise!) — but it also does something a bit clever that I didn’t expect!

On Unix-like systems, the /usr/bin/ family of directories ideally just contain a bunch of binaries. It's the first entry in your system's PATH, which defines where system commands are resolved from, and in what order. Python projects don't typically compile down to a single binary file though, and you really don’t want to clutter your bin directories up with a bunch of gunk because it makes it harder to maintain your system.

For applications that need other resources (like configuration or data files), on Linux systems it's not uncommon to claim a namespace under /usr/share as the installation path (or as the Meson template refers to it: the pkgdatadir). For a python project shipped independently from Pypy, we might want to group all our application code into that directory namespace for easy filesystem management. So that’s what we do. My build script should move all my application code into a directory called myapp/ rooted under the /usr/share prefix. This ensures that package managers and sysadmins know exactly where all my app code and resources are. No surprises. No polluted directories. Tidy system!

So how do we easily run the code from the installation path? We certainly don’t want to be typing /usr/share/myapp/ to run it — that’s practically medieval. Instead we put a shim for the application in the /usr/bin directory which correctly starts up your app.

So that sketchy templated code that I came across ends up looking like this:

#!/usr/bin/env python3
import sys
pgkdatadir = '/usr/share/myapp'
sys.path.insert(1, pkgdatadir)
if __name__ == '__main__':
    from myapp import Application
Essentially what it's done is just move the main function. To make sure we're importing the correct Application, we change the PYTHONPATH to first resolve our application code from the place its been installed to, then immediately after resolve the system libraries. If we invoke the install command from ninja, this shim gets copied to /usr/share/bin/myapp so that one can easily run myapp from the command line.

The takeaway

If you're trying to package a random Python app for a Linux distro, here's a strategy: find its main entry point, write a shim like this, and have your packaging script just copy the whole project to an application directory and install the shim to /usr/bin/$appname.

And of course, GNOME Builder will do this for you if start a new GTK3 Python project