/images/jl11_lots.jpg

John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

You know, I'm something of a modeller myself

I recently read a blog post written by Joel Hellewell, who worked on the COVID-19 response team at LSHTM. The post is here https://jhellewell14.github.io/2021/11/16/forecasting-projecting.html. I found it particularly interesting to hear the perspective of someone who had worked on (mathematical) modelling infectious diseases for a longer time, and how the response to the pandemic compared to these activities.

I wanted to write a reply which turned out to be a bit too long for a tweet, so here it is. (background: I spent Apr-Dec 2020 working part-time with one of the COVID-19 modelling groups at Imperial’s department of infectious disease epidemiology; my contribution was pretty much entirely to the software/programming side, not the modelling.)

When to use log scales

log scales seem to have caused a lot of debate recently. When should we use them?

I think there are two reasons:

  • Your data is exponentially distributed.
  • You want to transform your data to make certain regions easier to see.

The first is a bit of a tautology. For independent variables (usually the x-axis) it’s easy. Perhaps you have collected data at points 1, 2, 4, 8, 16 etc. Makes sense to plot this on log2 scale (0, 1, 2, 3, 4). Collected at 0.1, 1, 10, 100 etc? Plot on log10 (-1, 1, 2, 3). An example is minimum inhibitory concentrations (MIC) which are directly measured in log increases, or parameter searches which are often over large ranges.

Easy debugging of C/C++/CUDA python extensions

Writing an extension called by python (in C, C++ or CUDA)? Not working? Typical.

When doing the same from R it’s pretty easy to debug, just run with R -d <debugger name> e.g. R -d valgrind or R -d gdb you get into the debugger, continue, then run interactively as usual. (For a more complex example using both at once see this blog post).

Doing this from python seems trickier to me. I started off following this guide: https://johnfoster.pge.utexas.edu/blog/posts/debugging-cc%2B%2B-libraries-called-by-python/ But I’m not really an ipython user, and prefer gdb over lldb (due to familiarity). I think this is a good way to do this if you need an interactive python session, but really this is overcomplicated for my typical use case.

p-value < 2.2e-16

A claim: 2.2e-16 is the most popular p-value in research papers, even more popular than 0.05 (or if you’re being cynical 0.049).

Why?

2.2e-16 happens to be the epsilon of a double-precision float (i.e. a decimal number stored using 64 bits). Roughly, this means that if you try to calculate 1 - epsilon, with anything smaller than epsilon, the answer will be 1.

In R, you can calculate this by running the following code (+2 due to convention):

Honey Roast Parsnips (frozen), Iceland

£1.75, 750g

This weekend I wanted to buy some parsnips to roast, but they were absent from the produce section (except in a pack coming with four unwanted carrots, and one considerably more unwanted turnip). However, as I was shopping at Iceland, there was a handy pre-prepared frozen alternative:

/images/IMG_20210329_182003-241x300.jpg

Screamadelica, Primal Scream

Why is it only in 2021 that I am listening to Primal Scream’s Screamadelica for the first time?

/images/Screamadelica.png

A lot of critically acclaimed music from the 1980s maintains a pop appeal that means it still gets radio play, is featured in club nights, and is heavily promoted in my Youtube home. However, perhaps the post-rock, trip-hop and grunge of the early 1990s doesn’t have the same enduring commercial appeal. Whatever the reason, I’ve been missing out.