Building and publishing documentation for Ansible Collections

I had a draft of this article for about two months, but never really managed to polish and finalize it, partially due to some nasty hacks needed down the road. Thankfully, one of my wishes was heard and I had now the chance to revisit the post and try a few things out. Sadly, my wish was granted only partially and the result is still not beautiful, but read yourself ;-)

UPDATE: I've published a follow up post on building documentation for Ansible Collections using antsibull, as my wish was now fully granted.

As part of my day job, I am maintaining the Foreman Ansible Modules - a collection of modules to interact with Foreman and its plugins (most notably Katello). We've been maintaining this collection (as in set of modules) since 2017, so much longer than collections (as in Ansible Collections) existed, but the introduction of Ansible Collections allowed us to provide a much easier and supported way to distribute the modules to our users.

Now users usually want two things: features and documentation. Features are easy, we already have plenty of them. But documentation was a bit cumbersome: we had documentation inside the modules, so you could read it via ansible-doc on the command line if you had the collection installed, but we wanted to provide online readable and versioned documentation too - something the users are used to from the official Ansible documentation.

Building HTML from Ansible modules

Ansible modules contain documentation in form of YAML blocks documenting the parameters, examples and return values of the module. The Ansible documentation site is built using Sphinx from reStructuredText. As the modules don't contain reStructuredText, Ansible hashad a tool to generate it from the documentation YAML: build-ansible.py document-plugins. The tool and the accompanying libraries are not part of the Ansible distribution - they just live in the hacking directory. To run them we need a git checkout of Ansible and source hacking/env-setup to set PYTHONPATH and a few other variables correctly for Ansible to run directly from that checkout.

It would be nice if that'd be a feature of ansible-doc, but while it isn't, we need to have a full Ansible git checkout to be able to continue.The tool has been recently split out into an own repository/distribution: antsibull. However it was also a bit redesigned to be easier to use (good!), and my hack to abuse it to build documentation for out-of-tree modules doesn't work anymore (bad!). There is an issue open for collections support, so I hope to be able to switch to antsibull soon.

Anyways, back to the original hack.

As we're using documentation fragments, we need to tell the tool to look for these, because otherwise we'd get errors about not found fragments. We're passing ANSIBLE_COLLECTIONS_PATHS so that the tool can find the correct, namespaced documentation fragments there. We also need to provide --module-dir pointing at the actual modules we want to build documentation for.

ANSIBLEGIT=/path/to/ansible.git
source ${ANSIBLEGIT}/hacking/env-setup
ANSIBLE_COLLECTIONS_PATHS=../build/collections python3 ${ANSIBLEGIT}/hacking/build-ansible.py document-plugins --module-dir ../plugins/modules --template-dir ./_templates --template-dir ${ANSIBLEGIT}/docs/templates --type rst --output-dir ./modules/

Ideally, when antsibull supports collections, this will become antsibull-docs collection … without any need to have an Ansible checkout, sourcing env-setup or pass tons of paths.

Until then we have a Makefile that clones Ansible, runs the above command and then calls Sphinx (which provides a nice Makefile for building) to generate HTML from the reStructuredText.

You can find our slightly modified templates and themes in our git repository in the docs directory.

Publishing HTML documentation for Ansible Modules

Now that we have a way to build the documentation, let's also automate publishing, because nothing is worse than out-of-date documentation!

We're using GitHub and GitHub Actions for that, but you can achieve the same with GitLab, TravisCI or Jenkins.

First, we need a trigger. As we want always up-to-date documentation for the main branch where all the development happens and also documentation for all stable releases that are tagged (we use vX.Y.Z for the tags), we can do something like this:

on:
  push:
    tags:
      - v[0-9]+.[0-9]+.[0-9]+
    branches:
      - master

Now that we have a trigger, we define the job steps that get executed:

    steps:
      - name: Check out the code
        uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.7"
      - name: Install dependencies
        run: make doc-setup
      - name: Build docs
        run: make doc

At this point we will have the docs built by make doc in the docs/_build/html directory, but not published anywhere yet.

As we're using GitHub anyways, we can also use GitHub Pages to host the result.

      - uses: actions/checkout@v2
      - name: configure git
        run: |
          git config user.name "${GITHUB_ACTOR}"
          git config user.email "${GITHUB_ACTOR}@bots.github.com"
          git fetch --no-tags --prune --depth=1 origin +refs/heads/*:refs/remotes/origin/*
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.7"
      - name: Install dependencies
        run: make doc-setup
      - name: Build docs
        run: make doc
      - name: commit docs
        run: |
          git checkout gh-pages
          rm -rf $(basename ${GITHUB_REF})
          mv docs/_build/html $(basename ${GITHUB_REF})
          dirname */index.html | sort --version-sort | xargs -I@@ -n1 echo '<div><a href="@@/"><p>@@</p></a></div>' >> index.html
          git add $(basename ${GITHUB_REF}) index.html
          git commit -m "update docs for $(basename ${GITHUB_REF})" || true
      - name: push docs
        run: git push origin gh-pages

As this is not exactly self explanatory:

  1. Configure git to have a proper author name and email, as otherwise you get ugly history and maybe even failing commits
  2. Fetch all branch names, as the checkout action by default doesn't do this.
  3. Setup Python, Sphinx, Ansible etc.
  4. Build the documentation as described above.
  5. Switch to the gh-pages branch from the commit that triggered the workflow.
  6. Remove any existing documentation for this tag/branch ($GITHUB_REF contains the name which triggered the workflow) if it exists already.
  7. Move the previously built documentation from the Sphinx output directory to a directory named after the current target.
  8. Generate a simple index of all available documentation versions.
  9. Commit all changes, but don't fail if there is nothing to commit.
  10. Push to the gh-pages branch which will trigger a GitHub Pages deployment.

Pretty sure this won't win any beauty contest for scripting and automation, but it gets the job done and nobody on the team has to remember to update the documentation anymore.

You can see the results on theforeman.org or directly on GitHub.

Scanning with a Brother MFC-L2720DW on Linux without any binary blobs

Back in 2015, I've got a Brother MFC-L2720DW for the casual "I need to print those two pages" and "I need to scan these receipts" at home (and home-office ;)). It's a rather cheap (I paid less than 200€ in 2015) monochrome laser printer, scanner and fax with a (well, two, wired and wireless) network interface. In those five years I've never used the fax or WiFi functions, but printed a scanned a few pages.

Brother offers Linux drivers, but those are binary blobs which I never really liked to run.

The printer part works just fine with a "Generic PCL 6/PCL XL" driver in CUPS or even "driverless" via AirPrint on Linux. You can also feed it plain PostScript, but I found it rather slow compared to PCL. On recent Androids it works using the built in printer service or Mopria Printer Service for older ones - I used to joke "why would you need a printer on your phone?!", but found it quite useful after a few tries.

However, for the scanner part I had to use Brother's brscan4 driver on Linux and their iPrint&Scan app on Android - Mopria Scan wouldn't support it.

Until, last Friday, I've seen a NEW package being uploaded to Debian: sane-airscan. And yes, monitoring the Debian NEW queue via Twitter is totally legit!

sane-airscan is an implementation of Apple's AirScan (eSCL) and Microsoft's WSD/WS-Scan protocols for SANE. I've never heard of those before - only about AirPrint, but thankfully this does not mean nobody has reverse-engineered them and created something that works beautifully on Linux. As of today there are no packages in the official Fedora repositories and the Debian ones are still in NEW, however the upstream documentation refers to an openSUSE OBS repository that works like a charm in the meantime (on Fedora 32).

The only drawback I've seen so far: the scanner only works in "Color" mode and there is no way to scan in "Grayscale", making scanning a tad slower. This has been reported upstream and might or might not be fixable, as it seems the device does not announce any mode besides "Color".

Interestingly, SANE has an eSCL backend on its own since 1.0.29, but it's disabled in Fedora in favor of sane-airscan even though the later isn't available in Fedora yet. However, it might not even need separate packaging, as SANE upstream is planning to integrate it into sane-backends directly.

Using Ansible Molecule to test roles in monorepos

Ansible Molecule is a toolkit for testing Ansible roles. It allows for easy execution and verification of your roles and also manages the environment (container, VM, etc) in which those are executed.

In the Foreman project we have a collection of Ansible roles to setup Foreman instances called forklift. The roles vary from configuring Libvirt and Vagrant for our CI to deploying full fledged Foreman and Katello setups with Proxies and everything. The repository also contains a dynamic Vagrant file that can generate Foreman and Katello installations on all supported Debian, Ubuntu and CentOS platforms using the previously mentioned roles. This feature is super helpful when you need to debug something specific to an OS/version combination.

Up until recently, all those roles didn't have any tests. We would run ansible-lint on them, but that was it.

As I am planning to do some heavier work on some of the roles to enhance our upgrade testing, I decided to add some tests first. Using Molecule, of course.

Adding Molecule to an existing role is easy: molecule init scenario -r my-role-name will add all the necessary files/examples for you. It's left as an exercise to the reader how to actually test the role properly as this is not what this post is about.

Executing the tests with Molecule is also easy: molecule test. And there are also examples how to integrate the test execution with the common CI systems.

But what happens if you have more than one role in the repository? Molecule has support for monorepos, however that is rather limited: it will detect the role path correctly, so roles can depend on other roles from the same repository, but it won't find and execute tests for roles if you run it from the repository root. There is an undocumented way to set MOLECULE_GLOB so that Molecule would detect test scenarios in different paths, but I couldn't get it to work nicely for executing tests of multiple roles and upstream currently does not plan to implement this. Well, bash to the rescue!

for roledir in roles/*/molecule; do
    pushd $(dirname $roledir)
    molecule test
    popd
done

Add that to your CI and be happy! The CI will execute all available tests and you can still execute those for the role you're hacking on by just calling molecule test as you're used to.

However, we can do even better.

When you initialize a role with Molecule or add Molecule to an existing role, there are quite a lot of files added in the molecule directory plus an yamllint configuration in the role root. If you have many roles, you will notice that especially the molecule.yml and .yamllint files look very similar for each role.

It would be much nicer if we could keep those in a shared place.

Molecule supports a "base config": a configuration file that gets merged with the molecule.yml of your project. By default, that's ~/.config/molecule/config.yml, but Molecule will actually look for a .config/molecule/config.yml in two places: the root of the VCS repository and your HOME. And guess what? The one in the repository wins (that's not yet well documented). So by adding a .config/molecule/config.yml to the repository, we can place all shared configuration there and don't have to duplicate it in every role.

And that .yamllint file? We can also move that to the repository root and add the following to Molecule's (now shared) configuration:

lint: yamllint --config-file ${MOLECULE_PROJECT_DIRECTORY}/../../.yamllint --format parsable .

This will define the lint action as calling yamllint with the configuration stored in the repository root instead of the project directory, assuming you store your roles as roles/<rolename>/ in the repository.

And that's it. We now have a central place for our Molecule and yamllint configurations and only need to place role-specific data into the role directory.

Automatically renaming the default git branch to "devel"

It seems GitHub is planning to rename the default brach for newly created repositories from "master" to "main". It's incredible how much positive PR you can get with a one line configuration change, while still working together with the ICE.

However, this post is not about bashing GitHub.

Changing the default branch for newly created repositories is good. And you also should do that for the ones you create with git init locally. But what about all the repositories out there? GitHub surely won't force-rename those branches, but we can!

Ian will do this as he touches the individual repositories, but I tend to forget things unless I do them immediately…

Oh, so this is another "automate everything with an API" post? Yes, yes it is!

And yes, I am going to use GitHub here, but something similar should be implementable on any git hosting platform that has an API.

Of course, if you have SSH access to the repositories, you can also just edit HEAD in an for loop in bash, but that would be boring ;-)

I'm going with devel btw, as I'm already used to develop in the Foreman project and devel in Ansible.

acquire credentials

My GitHub account is 2FA enabled, so I can't just use my username and password in a basic HTTP API client. So the first step is to acquire a personal access token, that can be used instead. Of course I could also have implemented OAuth2 in my lousy script, but ain't nobody have time for that.

The token will require the "repo" permission to be able to change repositories.

And we'll need some boilerplate code (I'm using Python3 and requests, but anything else will work too):

#!/usr/bin/env python3

import requests

BASE='https://api.github.com'
USER='evgeni'
TOKEN='abcdef'

headers = {'User-Agent': '@{}'.format(USER)}
auth = (USER, TOKEN)

session = requests.Session()
session.auth = auth
session.headers.update(headers)
session.verify = True

This will store our username, token, and create a requests.Session so that we don't have to pass the same data all the time.

get a list of repositories to change

I want to change all my own repos that are not archived, not forks, and actually have the default branch set to master, YMMV.

As we're authenticated, we can just list the repositories of the currently authenticated user, and limit them to "owner" only.

GitHub uses pagination for their API, so we'll have to loop until we get to the end of the repository list.

repos_to_change = []

url = '{}/user/repos?type=owner'.format(BASE)
while url:
    r = session.get(url)
    if r.ok:
        repos = r.json()
        for repo in repos:
            if not repo['archived'] and not repo['fork'] and repo['default_branch'] == 'master':
                repos_to_change.append(repo['name'])
        if 'next' in r.links:
            url = r.links['next']['url']
        else:
            url = None
    else:
        url = None

create a new devel branch and mark it as default

Now that we know which repos to change, we need to fetch the SHA of the current master, create a new devel branch pointing at the same commit and then set that new branch as the default branch.

for repo in repos_to_change:
    master_data = session.get('{}/repos/evgeni/{}/git/ref/heads/master'.format(BASE, repo)).json()
    data = {'ref': 'refs/heads/devel', 'sha': master_data['object']['sha']}
    session.post('{}/repos/{}/{}/git/refs'.format(BASE, USER, repo), json=data)
    default_branch_data = {'default_branch': 'devel'}
    session.patch('{}/repos/{}/{}'.format(BASE, USER, repo), json=default_branch_data)
    session.delete('{}/repos/{}/{}/git/refs/heads/{}'.format(BASE, USER, repo, 'master'))

I've also opted in to actually delete the old master, as I think that's the safest way to let the users know that it's gone. Letting it rot in the repository would mean people can still pull and won't notice that there are no changes anymore as the default branch moved to devel.

So…

announcement

I've updated all my (those in the evgeni namespace) non-archived repositories to have devel instead of master as the default branch.

Have fun updating!

code

#!/usr/bin/env python3

import requests

BASE='https://api.github.com'
USER='evgeni'
TOKEN='abcd'

headers = {'User-Agent': '@{}'.format(USER)}
auth = (USER, TOKEN)

session = requests.Session()
session.auth = auth
session.headers.update(headers)
session.verify = True

repos_to_change = []

url = '{}/user/repos?type=owner'.format(BASE)
while url:
    r = session.get(url)
    if r.ok:
        repos = r.json()
        for repo in repos:
            if not repo['archived'] and not repo['fork'] and repo['default_branch'] == 'master':
                repos_to_change.append(repo['name'])
        if 'next' in r.links:
            url = r.links['next']['url']
        else:
            url = None
    else:
        url = None

for repo in repos_to_change:
    master_data = session.get('{}/repos/evgeni/{}/git/ref/heads/master'.format(BASE, repo)).json()
    data = {'ref': 'refs/heads/devel', 'sha': master_data['object']['sha']}
    session.post('{}/repos/{}/{}/git/refs'.format(BASE, USER, repo), json=data)
    default_branch_data = {'default_branch': 'devel'}
    session.patch('{}/repos/{}/{}'.format(BASE, USER, repo), json=default_branch_data)
    session.delete('{}/repos/{}/{}/git/refs/heads/{}'.format(BASE, USER, repo, 'master'))