/images/jl11_lots.jpg

John Lees' blog

Pathogens, informatics and modelling at EMBL-EBI

Parallel MCMC

On github: https://github.com/johnlees/pMCMC

Parallel implementation of MCMC using MPI - coded by Hákon Jónsson, John Lees and Tobias Madsen

Code is available as C++ Under testing implementations in R and Perl do not provide speedups due to execution overheads, but are included as easier to read ‘pseudocode’ if required.

Details can be found in this draft paper: pMCMC

Acknowledgements

This work was completed for the Oxford Summer School in Computational Biology 2012 (http://www.stats.ox.ac.uk/research/genome/summer_school_2013/osscb12), and was supervised by Geoff Nicholls, Joe Herman and Jotun Hein.

Display env variable, tmux and zsh over ssh

I have been using zsh within tmux, and found upon reattaching tmux X forwarding wasn’t working. For example when trying to launch gvim I’d get the error:

E233: cannot open display

The problem, a quick google determined, is that each time I ssh into my sever a new $DISPLAY environment variable is set. When I run ’tmux attach’ the new $DISPLAY variable is passed through (see http://stackoverflow.com/questions/8645053/how-do-i-start-tmux-with-my-current-environment) so any new windows within tmux will have the correct environment. However the environment of any existing windows can’t be changed, causing the problem.

Compiling Stampy v1.0.23 for use with cortex - error: unrecognized command line option ‘-Wl’

To assemble illumina sequence data I am currently trialling assembly with cortex. To be able to use their Perl script to automate the pipeline between reads in and variant calls requires vcftools and stampy to be installed, and you provide the installation paths as input to the script.

However when running make using the default downloaded stampy makefile I got the following error from g++ (v4.8.1):

g++ \`python2.7-config --ldflags\` -pthread -shared -Wl build/linux-x86\_64-2.7-ucs4/pyx/maptools.o build/linux-x86\_64-2.7-ucs4/c/map
utils.o build/linux-x86\_64-2.7-ucs4/c/alignutils.o build/linux-x86\_64-2.7-ucs4/readalign.o build/linux-x86\_64-2.7-ucs4/algebras.o build/linux-x86\_64-2.7-ucs4/frontend.o -o maptools.so
g++: error: unrecognized command line option ‘-Wl’

The solution was straightforward to find, as ever thanks to stackoverflow: http://stackoverflow.com/questions/21305309/g-doesnt-recognize-the-option-wl All you need to do is edit lines 44 and 46 in the makefile, replacing the space after -Wl with a comma:

Impute your whole genome from 23andme data

23andme is a service which types 602352 sites on your chromosomal DNA and your mtDNA. It is possible, by comparing to a reference panel in which all sites have been typed, to impute (fill in statistically) the missing sites and thus get an ’estimation’ of your whole genome.

The piece of software impute2 written by B. N. Howie, P. Donnelly, and J. Marchini gives good accuracy when using the 1000 Genome Project as a reference. However, there is some difficulty in providing the data in the right input format, using all the correct options and interpreting the output from this piece of software.

A new direction for leesjohn

Since October 2013 I have stopped using Fedora, and instead use machines running Ubuntu 12.04/13.10, Windows 8 and OS X 10.8.5. As these OSs have a larger user base than Fedora, many of the issues I encounter are well documented and easy to fix (i.e. there is a stackexchange post as one of the top three google results), hence there haven’t been many things for me to post under the original remit of this blog.