This article was originally published on my Medium page .
I think Python is a weird language. It is dynamic. It is interpreted. It is wildly popular. It is… multi-multi-paradigm. You can write Python a thousand and one different ways.
Supporting anything and everything from straight-line scripting to functional programming to object-orientation, with a loose module system, dynamic imports, and arbitrary entry points. Python projects do not have a universally defined structure!
With no definitive One Way, this wild-west can make it a pain to reason about structuring Python applications, and then eventually packaging them for distribution and clean installation/uninstallation.
For anyone not at all familiar with Python development, there are a couple things about it that I find surprizing and mildly uncomfortable. Namely:
-
The weird ceremony of
if __name__ == "__main__":
that says, "Hey interpreter! Please don’t execute this code block unless I am the main program!".If you see this often, it's because Python makes no distinction between library code and application scripts. You are free to import any module that contains application code as a library. This condition just guards application entry points from running if someone decides to use your code as a library. As someone who's used to writing one form of
int main()
or another, this is cool, but weird. -
The presence of this platform call:
sys.path.insert(idx, directory)
This function mutates the
PYTHONPATH
, affecting the resolution order for all modules/files/classes in the current application. This feels gross and dangerous, and at first glance it feels like shouldn’t be touched by anyone, ever.
Using the latter platform call has some major consequences. The entries of the
PYTHONPATH
are special — they define where your Python installation is —
yet you can override it, change its ordering and subsequently break stuff.
A Python 3.5 installation on Ubuntu might have the following default path.
>>> import sys
>>> print(sys.path)
['', '/usr/lib/python35.zip', '/usr/lib/python3.5', '/usr/lib/python3.5/plat-x86_64-linux-gnu', '/usr/lib/python3.5/lib-dynload', '/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages']
Programmer carelessness aside, the fact that such an important path is so easily mutable makes me feel nasty
because I can imagine how easy this could make it insert a malicious version of a class into the path before a
nice version of a same class. I basically think that things like sys.path
should be avoided like
the plague because for any legitimate work, we have nice, safe tools like virtualenv for path management.
So it surprised me when I initialized a new project with GNOME Builder and I saw something like this in a Meson template file!
# Meson replaces this with a value from its configuration data.
pgkdatadir = '@pkgdatadir@'
sys.path.insert(1, pkgdatadir)
if __name__ == '__main__':
from app import Application
sys.exit(Application.main())
I was confused and appalled! ... Until I understood what the generated meson.build
script was
doing.
It’s actually a neat hack that hides away arbitrary project structures and makes packaging a dream.
The pkgdatadir
When you’re building and installing a Meson project, you often do something like this:
meson _build --prefix=/usr/share
ninja -C _build install
In my case, my meson.build
file sets the configuration variable pkgdatadir
to the
prefix specified above. So, the above incantation goes ahead and installs all the application code there (no
surprise!) — but it also does something a bit clever that I didn’t expect!
On Unix-like systems, the /usr/bin/
family of directories ideally just contain a bunch of
binaries. It's the first entry in your system's PATH
, which defines where system commands are
resolved from, and in what order. Python projects don't typically compile down to a single binary file though,
and you really don’t want to clutter your bin directories up with a bunch of gunk because it makes it harder
to maintain your system.
For applications that need other resources (like configuration or data files), on Linux systems it's not
uncommon to claim a namespace under /usr/share
as the installation path (or as the Meson template
refers to it: the pkgdatadir). For a python project shipped independently from Pypy, we might want to group
all
our application code into that directory namespace for easy filesystem management. So that’s what we do. My
build script should move all my application code into a directory called myapp/
rooted
under the /usr/share
prefix. This ensures that package managers and sysadmins know exactly where
all my app code and resources are. No surprises. No polluted directories. Tidy system!
So how do we easily run the code from the installation path? We certainly don’t want to be typing
/usr/share/myapp/Application.py
to run it — that’s practically medieval. Instead we put a
shim for the application in the /usr/bin
directory which correctly starts up your app.
So that sketchy templated code that I came across ends up looking like this:
#!/usr/bin/env python3
import sys
pgkdatadir = '/usr/share/myapp'
sys.path.insert(1, pkgdatadir)
if __name__ == '__main__':
from myapp import Application
sys.exit(Application.main())
Essentially what it's done is just move the main
function. To make sure we're importing the
correct Application
,
we change the PYTHONPATH
to first resolve our application code from the place its been installed
to, then immediately after resolve the system libraries. If we invoke the install
command
from ninja
, this shim gets copied to /usr/share/bin/myapp
so that one can easily run
myapp
from the command line.
The takeaway
If you're trying to package a random Python app for a Linux distro, here's a strategy:
find its main entry point, write a shim like this, and have your packaging script just copy
the whole project to an application directory and install the shim to /usr/bin/$appname
.
And of course, GNOME Builder will do this for you if start a new GTK3 Python project