Python Packaging Like a Master
Packaging and code distribution are the weak point of the Python language. I’m not the first to feel this way, but I will try to add fuel to the argument here. Defining an interface for others to use is tough; many well-established and popular Python packages have unnecessarily complicated import syntaxes. This is part of the language design e.g. Python doesn’t support well separating source code file structure from module structure.
A Java learner can’t execute a simple “Hello World” without dealing with packaging, distribution, and usually a few environment configuration errors. So those languages have a stronger ecosystem of programmers sharing and distributing code, and more importantly: accessible packages available to language learners.
Python has a higher barrier to entry than these alternatives (for comparison, check out this Node packaging tutorial). When it’s time to adapt to new technologies and recruit coders, Javascript will excel over Python. Packaging and distribution shouldn’t be an advanced skill left to the Python “Masters”. It should be as approachable as in other languages.
I’m going to show you how to create good Python packages for PyPI, the Python Package Index. Here’s a quick-reference for distributing a Python package with a detailed breakdown after.
Why Another Tutorial?
While there is official help and an official tutorial, they are slightly out of date. Most refer to older packaging systems and distribution utilities. I’m going to show you current best practices, featuring distutils, PyPI, and version control. Specifically, we’ll:
- Integrate a single README file for all documentation needs
1. package docstring
2. PyPI project description
3. other typical README usage such as on Github - Backup distribution on Github to improve download speeds
- Packaging and linking to create a simpler interface
Quick Reference Template
Here’s an example workflow I’ve used to package RTSP:
PREPARATION
python3 -m pip install —-user --upgrade setuptools wheel
python3 -m pip install —-user --upgrade twine
SETUP.PY
from setuptools import setup
from os import path### include README as PyPI description
with open(‘README.md’) as f:
long_description = f.read()name = ‘rtsp’
version = ‘1.0.11’### include README as main package docfile
from shutil import copyfile
_workdir = path.abspath(path.dirname(__file__))
copyfile(_workdir+’/README.md’,_workdir+’/{0}/__doc__’.format(name))setup(name=name
, version=version
, description=’RTSP client’
, long_description=long_description
, long_description_content_type=’text/markdown’
, author = ‘Michael Stewart’
, author_email = ‘michael@domain.net’
, url=’https://github.com/statueofmike/rtsp'
, download_url=”https://github.com/statueofmike/rtsp/archive/{0}.tar.gz".format(version)
, license=’MIT’
, packages=[‘rtsp’]
, include_package_data=True
, classifiers=[
‘Development Status :: 5 — Production/Stable’,
‘License :: OSI Approved :: MIT License’,
‘Programming Language :: Python :: 3’,
‘Topic :: Multimedia :: Video’,
‘Topic :: Multimedia :: Video :: Capture’,
‘Topic :: System :: Networking’
]
, keywords=’rtsp image stream’
, install_requires=[‘pillow’,’opencv-python’]
, python_requires=’>=3.5'
, zip_safe=False
)
PRIMARY MODULE __INIT__.PY
_path = _os.path.abspath(_os.path.dirname(__file__))+'/__doc__'
with open(_path,’r’) as _f:
__doc__ = _f.read()
MANIFEST.IN
include README.md
include rtsp/__doc__
DISTRIBUTION
- Version: update version to 1.2.3 in
setup.py
- Build:
rm dist/*; python3 setup.py sdist bdist_wheel
- Distribute PyPI:
twine upload — repository-url https://test.pypi.org/legacy/ dist/*
Previewtwine upload dist/*
Preview - Distribute Github:
1.git tag 1.2.3 -m "1.2.3 tag for PyPI"
2.git push — tags remotename branchname
Packaging and Distribution Overview
Now let’s slow down and explain what’s going on.
PACKAGING
“Packaging” refers to archiving your code in a way that it can be moved somewhere. Technically you could throw all your python files into an archival directory and call it a package, and that’s almost exactly what all packaging systems do. Good packaging tools will give us methods to help set up the package when it’s later deployed, without requiring us to write messy or error-prone scripting code ourselves to do it. For example, you might want to make sure that a dependent file is available on the file system for your code to access after it’s deployed. I’ll use a README example, but the same methods can be used to deal with things like binary dependencies or serialized machine learning models.
The recommended way to package Python code is using distutils. In the reference above, we use python3 setup.py sdist bdist_wheel
to create a wheel “whl” file as well as “tar.gz” archives in the dist
directory.
Wheel archives are the new standard for Python packaging replacing the old egg standard. They include version and package information for the package manager as well as some space for custom scripts to execute during deployment. Pip understands them well, so pip install newshiny.whl
will install our shiny new package into the local environment’s library.
DISTRIBUTING
We want the installation of our package to be as quick and seamless as possible. Since we are using PIP, we’re distributing through PyPI as well as Github. PyPI distribution is handled through twine
. Github distribution is handled through a git
call.
The Details
PACKAGING DATA
If we want to include any files that aren’t python or C source files, we package them as data.
Update: As pointed out by this helpful Redditor, setup.py
and MANIFEST.in
are also an outdated method. For a newer and more robust method, see importlib.resources.
I use setup.py
and MANIFEST.in to do so. It’s a two-stage process to support C dependencies that have to be compiled at the destination. The first step is informing setuptools what to include in our package. The second step controls where those files go during deployment of the package. We aren’t doing anything so complicated here, so we only have one step.
In setup.py:
include_package_data=True # includes files from MANIFEST.in
README
Let’s use the rtsp
package as an example here. I’ve packaged it so no opportunity is wasted to show off the sweet ascii art I put in the README. A single README.MD file will be used for:
- It’s main job as source code Read-Me
- PyPI Description
- Python module docstring
First we include it as the official PyPI “description” so it appears on the official PyPI.org project page.
In setup.py
:
### include README as PyPI description
with open(‘README.md’) as f:
long_description = f.read()
Next we are going to make that same README the top-level Python docstring. Python docstrings are vintage Python documentation, supporting them is good karma and will result in fewer bugs. To accomplish this, we include the README as a text file in a location where it can be referenced from inside our package:
In setup.py
:
### include README as main package docfile
from shutil import copyfile
_workdir = path.abspath(path.dirname(__file__))
copyfile(_workdir+’/README.md’,_workdir+’/{0}/__doc__’.format(name))
Then in the __init__.py
for our main package:
_path = _os.path.abspath(_os.path.dirname(__file__))+'/__doc__'
with open(_path,’r’) as _f:
__doc__ = _f.read()
GITHUB DISTRIBUTION
The next thing we want to do is use a github repository to support faster distribution. We need to do two things:
- Add a git tag defining the release version
*git tag 1.2.3 —m "tag for PyPI"
*git push -tags remotename branchname
- Tell PyPI & pip where the git distribution is located, via
setup.py
:download_url=”https://github.com/statueofmike/rtsp/archive/{0}.tar.gz".format(version)
OTHER CONFIG DETAILS
- The
classifiers
list refers to trove classifiers, predefined tags for organizing PyPI packages. - The
install_requires
lists packages that will be installed as dependencies during deployment. python_requires
is the python version requiredzip_safe
flags whether the package can be stored as an archive file or must be unpacked when deployed.
Simplifying the Interface
I recommend the unix standard “could someone figure out how to use this from the command line if their internet wasn’t working?” I recommend following some Python conventions to make things less confusing to people:
- name internal variables, functions, and classes with a leading underscore
- rename imports to use a leading underscore
- write docstrings for items you expect users to interact with
I also recommend this unPythonic convention to make things less confusing to people: Use nested code to more easily define interfaces. (While a guiding principle of Python is “Flat is better than nested”, flatness breaks just about every other Python credo in this case.) You can streamline an interface with underscores and by deleting child packages after the useful parts have been imported:
from .messymodule import UsefulStuff as _MyStuff
del(messymodule)
In this case, I’ve hidden my messy code in messymodule.py
and then written a wrapper interface around it based on UsefulStuff
.
Users should never have to write
from usefulthing import UsefulThing
when they could write
import UsefulThing
or
from module.usefulthing import UsefulThing
when they could write
from module import UsefulThing
There are times when submodules are useful for readability and organization, but there are frequent times when they only exist because the source code file structure made it awkward to remove them.
Conclusion
There are other packaging and distribution systems (e.g. you don’t have to use Twine
at all, you can distribute to PyPI with distutils
, but Twine
has nice support for storing your username and password securely). I like to keep the quick reference material in my projects’ source code as a Packaging.md
note for easy reference.