John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

Learning rust as a C++/python programmer (advent of code 2021)

John Lees published on 2021-12-27 included in programming

This year I used rust to do advent of code (AoC). I got up to day 17 before Christmas caught up with me, but I hope to come back to the last week at some point soon. Here’s my code so far: https://github.com/johnlees/advent-of-code-2021. I’ve never done any rust programming before, so started off by reading these two pages which got me going fairly quickly: https://fasterthanli.me/articles/a-half-hour-to-learn-rust https://github.com/nrc/r4cppp At the moment I usually code in C++ and python (and increasingly CUDA), though I do know a few other languages to various levels.

Review: Video games 2020-2021

John Lees published on 2021-12-02 included in reviews

The past couple years have been a great time to play video games. Me in 2020 (and 2021). Of course, it’s always been a great time to play video games, but I’ve never written about them before. Here are some of my favourite games from the past couple of years (in rough ranked order in each of the two categories). Puzzles etc Disco Elysium (mystery) No gameplay to speak of, only a few pre-rendered backgrounds, and lots of clicking and reading text.

You know, I'm something of a modeller myself

John Lees published on 2021-11-23 included in science

I recently read a blog post written by Joel Hellewell, who worked on the COVID-19 response team at LSHTM. The post is here https://jhellewell14.github.io/2021/11/16/forecasting-projecting.html. I found it particularly interesting to hear the perspective of someone who had worked on (mathematical) modelling infectious diseases for a longer time, and how the response to the pandemic compared to these activities. I wanted to write a reply which turned out to be a bit too long for a tweet, so here it is.

When to use log scales

John Lees published on 2021-07-19 included in statistics

log scales seem to have caused a lot of debate recently. When should we use them? I think there are two reasons: Your data is exponentially distributed. You want to transform your data to make certain regions easier to see. Exponentially distributed data The first is a bit of a tautology. For independent variables (usually the x-axis) it’s easy. Perhaps you have collected data at points 1, 2, 4, 8, 16 etc.

Easy debugging of C/C++/CUDA python extensions

John Lees published on 2021-07-15 included in cuda programming python

Writing an extension called by python (in C, C++ or CUDA)? Not working? Typical. When doing the same from R it’s pretty easy to debug, just run with R -d <debugger name> e.g. R -d valgrind or R -d gdb you get into the debugger, continue, then run interactively as usual. (For a more complex example using both at once see this blog post). Doing this from python seems trickier to me.

p-value < 2.2e-16

John Lees published on 2021-05-06 included in statistics

A claim: 2.2e-16 is the most popular p-value in research papers, even more popular than 0.05 (or if you’re being cynical 0.049). Why? 2.2e-16 happens to be the epsilon of a double-precision float (i.e. a decimal number stored using 64 bits). Roughly, this means that if you try to calculate 1 - epsilon, with anything smaller than epsilon, the answer will be 1. In R, you can calculate this by running the following code (+2 due to convention):