John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

Honey Roast Parsnips (frozen), Iceland

A review of ‘Honey Roast Parsnips’ - available from Iceland £1.75, 750g This weekend I wanted to buy some parsnips to roast, but they were absent from the produce section (except in a pack coming with four unwanted carrots, and one considerably more unwanted turnip). However, as I was shopping at Iceland, there was a handy pre-prepared frozen alternative: These cost roughly double the amount of buying raw parsnips. I’d estimate there are around four large portions in this bag, you can probably get double that if you’re using a small amount.

Screamadelica, Primal Scream

Why is it only in 2021 that I am listening to Primal Scream’s Screamadelica for the first time? A lot of critically acclaimed music from the 1980s maintains a pop appeal that means it still gets radio play, is featured in club nights, and is heavily promoted in my Youtube home. However, perhaps the post-rock, trip-hop and grunge of the early 1990s doesn’t have the same enduring commercial appeal. Whatever the reason, I’ve been missing out.

Porting a bioinformatics tool to the web using WebAssembly, React and javascript

We recently released a beta version of PopPUNK-web (https://web.poppunk.net). This is a WebAssembly (WASM) version of pp-sketchlib which sketches an user-input genome assembly in the browser; transmits this sketch as a JSON to a server running PopPUNK using gunicorn and flask; runs query assignment against a large database of genomes from the GPS project; returns a JSON containing strain assignment, a tree and network; these are then displayed using a react app.

Thoughts on 'Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species'

I was happy to see that this paper, which originally appeared as a preprint back in April 2019 (!), was published earlier this month. I thought it was one of the most thought-provoking papers I’ve read recently, so suggested a journal club on the final version (it’s long paper – over 80 pages). There were some parts that I liked a lot, and some parts I didn’t like, which I wanted to summarise here.

Things I have learnt about porting algorithms to GPUs (using CUDA)

I’ve recently ported one of my algorithms onto a GPU using CUDA. Here are some things I’ve learnt about the process (geared towards an algorithm dealing with genomic data). Firstly, the documentation that helped me most: Getting started: https://devblogs.nvidia.com/even-easier-introduction-cuda/ https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/ Understanding device memory: https://devblogs.nvidia.com/unified-memory-cuda-beginners/ https://devblogs.nvidia.com/how-access-global-memory-efficiently-cuda-c-kernels/ https://devblogs.nvidia.com/using-shared-memory-cuda-cc/ Putting it all together: https://devblogs.nvidia.com/efficient-matrix-transpose-cuda-cc/ Optimising your own code: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ Start small, add complexity in slowly I started off following the ’even easier introduction to cuda’ guide to get a basic version of my algorithm working.

Things I have learnt about using R (as a python/C++ programmer)

For loops are ok. I thought these were impossibly slow in R, and you always had to vectorise code – but not so. A common error is to try and append to arrays. It always best to preallocate an array/list with result <- rep(NA, length) or similar, and then write to it by index. strings aren’t the easiest thing to deal with, but stringr might help, and there is a regular expression package built-in.