Contents

mamba saved my CI

We moved from Azure to github actions to run the continuous integration tests in PopPUNK about a year ago. It’s been working pretty well and wasn’t too bad to set up, and integrates nicely into the pull requests.

However, in the past month two things happened:

  • joblib v1.2 introduced a breaking security change which meant that hdbscan errored. Solving a conda environment pinning joblib to 1.1 takes about 12 hours (😱) to solve (longer than the 4 hour github limit).
  • Even without the pin, environment resolution increased from about an hour to 3-4hrs, so the CI would only sometimes run.

Fixing conda on the CI

About 4/5 years ago I’d tried using mamba as a replacement for conda. It worked really well and was much faster, but I’d since read that some of its techniques were being merged into conda, and in general I stopped having hugely long times while solving the environment. But times seemed to be getting longer again (especially on the CI, not sure why), and I think mamba actually remains fundamentally different to the conda solver.

Anyway, replacing conda with mamba in github actions turned out to be trivial:

name: Run tests

on: [push]

jobs:
  test:

    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8]

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: ${{ matrix.python-version }}
    - name: Install Conda environment from environment.yml
      uses: mamba-org/provision-with-micromamba@main
      with:
        cache-env: true
    - name: Install and run_test.py
      shell: bash -l {0}
      run: |
        python -m pip install --no-deps --ignore-installed .
        cd test && python run_test.py        

This was in fact far simpler than the conda version I had previously (thanks mostly to better documentation of the shell option).

Resolving, downloading and installing the environment now takes all of about two minutes, and is even cached so that subsequent pushes work in about 30s. An incredible increase, and a reminder to use mamba for everything in the future!

Fixing HDBSCAN

The author of HDBSCAN pretty quickly put a patch out, which was great. However, there was about a six week author absence where the versions on conda and PyPI remained unpatched (breaking progressively more downstream pacakges).

Open source contributors can hardly be expected to be on hand to patch these breaking changes as soon as they happen, but it did (for me) expose some vulnerabilities in this package architecture.

It turns out it’s possible to ask to be added to a conda-forge feedstock, and if after a week the author hasn’t responded you can then make a patch to update the source. (Although, I’m not sure how acceptable this would be for larger changes.) I wish I’d asked for this immediately! But ultimately in this case, the maintainer did fix the upstream source after not too much of a delay.