Python Packaging Like a Master

Cory O. Root
6 min readAug 1, 2018

--

Packaging and code distribution are the weak point of the Python language. I’m not the first to feel this way, but I will try to add fuel to the argument here. Defining an interface for others to use is tough; many well-established and popular Python packages have unnecessarily complicated import syntaxes. This is part of the language design e.g. Python doesn’t support well separating source code file structure from module structure.

A Java learner can’t execute a simple “Hello World” without dealing with packaging, distribution, and usually a few environment configuration errors. So those languages have a stronger ecosystem of programmers sharing and distributing code, and more importantly: accessible packages available to language learners.

Python has a higher barrier to entry than these alternatives (for comparison, check out this Node packaging tutorial). When it’s time to adapt to new technologies and recruit coders, Javascript will excel over Python. Packaging and distribution shouldn’t be an advanced skill left to the Python “Masters”. It should be as approachable as in other languages.

I’m going to show you how to create good Python packages for PyPI, the Python Package Index. Here’s a quick-reference for distributing a Python package with a detailed breakdown after.

Why Another Tutorial?

While there is official help and an official tutorial, they are slightly out of date. Most refer to older packaging systems and distribution utilities. I’m going to show you current best practices, featuring distutils, PyPI, and version control. Specifically, we’ll:

  • Integrate a single README file for all documentation needs
    1. package docstring
    2. PyPI project description
    3. other typical README usage such as on Github
  • Backup distribution on Github to improve download speeds
  • Packaging and linking to create a simpler interface

Quick Reference Template

Here’s an example workflow I’ve used to package RTSP:

PREPARATION

python3 -m pip install —-user --upgrade setuptools wheel
python3 -m pip install —-user --upgrade twine

SETUP.PY

from setuptools import setup
from os import path
### include README as PyPI description
with open(‘README.md’) as f:
long_description = f.read()
name = ‘rtsp’
version = ‘1.0.11’
### include README as main package docfile
from shutil import copyfile
_workdir = path.abspath(path.dirname(__file__))
copyfile(_workdir+’/README.md’,_workdir+’/{0}/__doc__’.format(name))
setup(name=name
, version=version
, description=’RTSP client’
, long_description=long_description
, long_description_content_type=’text/markdown’
, author = ‘Michael Stewart’
, author_email = ‘michael@domain.net
, url=’https://github.com/statueofmike/rtsp'
, download_url=”https://github.com/statueofmike/rtsp/archive/{0}.tar.gz".format(version)
, license=’MIT’
, packages=[‘rtsp’]
, include_package_data=True
, classifiers=[
‘Development Status :: 5 — Production/Stable’,
‘License :: OSI Approved :: MIT License’,
‘Programming Language :: Python :: 3’,
‘Topic :: Multimedia :: Video’,
‘Topic :: Multimedia :: Video :: Capture’,
‘Topic :: System :: Networking’
]
, keywords=’rtsp image stream’
, install_requires=[‘pillow’,’opencv-python’]
, python_requires=’>=3.5'
, zip_safe=False
)

PRIMARY MODULE __INIT__.PY

_path = _os.path.abspath(_os.path.dirname(__file__))+'/__doc__'
with open(_path,’r’) as _f:
__doc__ = _f.read()

MANIFEST.IN

include README.md
include rtsp/__doc__

DISTRIBUTION

  1. Version: update version to 1.2.3 in setup.py
  2. Build: rm dist/*; python3 setup.py sdist bdist_wheel
  3. Distribute PyPI:
    twine upload — repository-url https://test.pypi.org/legacy/ dist/*
    Preview
    twine upload dist/*
    Preview
  4. Distribute Github:
    1. git tag 1.2.3 -m "1.2.3 tag for PyPI"
    2. git push — tags remotename branchname

Packaging and Distribution Overview

Now let’s slow down and explain what’s going on.

PACKAGING

“Packaging” refers to archiving your code in a way that it can be moved somewhere. Technically you could throw all your python files into an archival directory and call it a package, and that’s almost exactly what all packaging systems do. Good packaging tools will give us methods to help set up the package when it’s later deployed, without requiring us to write messy or error-prone scripting code ourselves to do it. For example, you might want to make sure that a dependent file is available on the file system for your code to access after it’s deployed. I’ll use a README example, but the same methods can be used to deal with things like binary dependencies or serialized machine learning models.

The recommended way to package Python code is using distutils. In the reference above, we use python3 setup.py sdist bdist_wheel to create a wheel “whl” file as well as “tar.gz” archives in the dist directory.

Wheel archives are the new standard for Python packaging replacing the old egg standard. They include version and package information for the package manager as well as some space for custom scripts to execute during deployment. Pip understands them well, so pip install newshiny.whl will install our shiny new package into the local environment’s library.

DISTRIBUTING

We want the installation of our package to be as quick and seamless as possible. Since we are using PIP, we’re distributing through PyPI as well as Github. PyPI distribution is handled through twine. Github distribution is handled through a git call.

The Details

PACKAGING DATA

If we want to include any files that aren’t python or C source files, we package them as data.

Update: As pointed out by this helpful Redditor, setup.py and MANIFEST.in are also an outdated method. For a newer and more robust method, see importlib.resources.

I use setup.py and MANIFEST.in to do so. It’s a two-stage process to support C dependencies that have to be compiled at the destination. The first step is informing setuptools what to include in our package. The second step controls where those files go during deployment of the package. We aren’t doing anything so complicated here, so we only have one step.

In setup.py:

include_package_data=True # includes files from MANIFEST.in

README

Let’s use the rtsp package as an example here. I’ve packaged it so no opportunity is wasted to show off the sweet ascii art I put in the README. A single README.MD file will be used for:

  1. It’s main job as source code Read-Me
  2. PyPI Description
  3. Python module docstring

First we include it as the official PyPI “description” so it appears on the official PyPI.org project page.

In setup.py:

### include README as PyPI description
with open(‘README.md’) as f:
long_description = f.read()

Next we are going to make that same README the top-level Python docstring. Python docstrings are vintage Python documentation, supporting them is good karma and will result in fewer bugs. To accomplish this, we include the README as a text file in a location where it can be referenced from inside our package:

In setup.py:

### include README as main package docfile
from shutil import copyfile
_workdir = path.abspath(path.dirname(__file__))
copyfile(_workdir+’/README.md’,_workdir+’/{0}/__doc__’.format(name))

Then in the __init__.py for our main package:

_path = _os.path.abspath(_os.path.dirname(__file__))+'/__doc__'
with open(_path,’r’) as _f:
__doc__ = _f.read()

GITHUB DISTRIBUTION

The next thing we want to do is use a github repository to support faster distribution. We need to do two things:

  1. Add a git tag defining the release version
    * git tag 1.2.3 —m "tag for PyPI"
    * git push -tags remotename branchname
  2. Tell PyPI & pip where the git distribution is located, via setup.py:
    download_url=”https://github.com/statueofmike/rtsp/archive/{0}.tar.gz".format(version)

OTHER CONFIG DETAILS

  • The classifiers list refers to trove classifiers, predefined tags for organizing PyPI packages.
  • The install_requires lists packages that will be installed as dependencies during deployment.
  • python_requires is the python version required
  • zip_safe flags whether the package can be stored as an archive file or must be unpacked when deployed.

Simplifying the Interface

I recommend the unix standard “could someone figure out how to use this from the command line if their internet wasn’t working?” I recommend following some Python conventions to make things less confusing to people:

  • name internal variables, functions, and classes with a leading underscore
  • rename imports to use a leading underscore
  • write docstrings for items you expect users to interact with

I also recommend this unPythonic convention to make things less confusing to people: Use nested code to more easily define interfaces. (While a guiding principle of Python is “Flat is better than nested”, flatness breaks just about every other Python credo in this case.) You can streamline an interface with underscores and by deleting child packages after the useful parts have been imported:

from .messymodule import UsefulStuff as _MyStuff
del(messymodule)

In this case, I’ve hidden my messy code in messymodule.py and then written a wrapper interface around it based on UsefulStuff.

Users should never have to write

from usefulthing import UsefulThing

when they could write

import UsefulThing

or

from module.usefulthing import UsefulThing

when they could write

from module import UsefulThing

There are times when submodules are useful for readability and organization, but there are frequent times when they only exist because the source code file structure made it awkward to remove them.

Conclusion

There are other packaging and distribution systems (e.g. you don’t have to use Twine at all, you can distribute to PyPI with distutils, but Twine has nice support for storing your username and password securely). I like to keep the quick reference material in my projects’ source code as a Packaging.md note for easy reference.

--

--

Cory O. Root
Cory O. Root

Written by Cory O. Root

I love drop caps and I'm ecstatic to find them in the Medium editor.

No responses yet