Creating a conda package with compilation and dependencies

I’ve just finished, what was for me, a difficult compiler/packaging attempt – creating a working bioconda package for seer. You can look at the pull request to see how many times I failed: //

(I would note I have made this package for legacy users. I would direct anyone interested in the software itself to the reimplementation pyseer)

The reason this was difficult was due to my own inclusion of a number of packages, all of which also need compilation, further adding to the complexity. While most of them were already available in some form in anaconda it turned out there were some issues with using the defaults. I want to note some of the things I had to modify here.

gcc and glibc version

seer requires gcc >4.8 to compile, and glibc > 4.9 to run. The default compiler version in conda is 4.8. Simply add a ‘conda_build_config.yaml’ file containing:

  - gcc # [linux]
  - gxx # [linux]


I had dlib and gzstream as submodules. If you use a git_url as the source these clone recursively, but not with a standard url in meta.yaml. I needed to do ‘git clone –recursive’ with repository and tarball it myself to include these directories in the git hub release.


Is not available on the bioconda channels so I had to compile myself. I included this as a submodule, but rather than using the default Makefile I needed to add the conda defined compiler flags to ensure these were consistent with later steps (particularly -fPIC in CPPFLAGS).



I was attempting to link boost_program_options using either the boost or boost-cpp anaconda packages, which unlike most boost libraries requires compiling. This led to undefined symbols at the linking stage, which I think are due to incompatible compiler (options) used to make the dynamic libraries in the versions currently available on anaconda. This turned out to be the most difficult thing to fix, requiring me to compile boost as part of the recipe.

Rather than downloading and compiling everything, I followed the boost github examples and made a shallow clone, with a fully copy of the boost library I’m using:

git clone --depth 1 // boost
rmdir libs/program_options
cd boost
git clone --depth 50 // libs/program_options
git submodule update -q --init tools/boostdep
git submodule update -q --init tools/build
git submodule update -q --init tools/inspect

I then included this in the release tarball. A better way may be to use submodules so this is done automatically using –recursive.

This library needed to be built, but I did so in a work directory to avoid installing unexpected packages with the recipe. Following the conda-forge for boost-cpp:

pushd boost
python2 tools/boostdep/depinst/ program_options --include example
cat < tools/build/src/site-config.jam
using gcc : custom : ${CXX} ;
./ --prefix="${BOOST_BUILT}" --with-libraries=program_options --with-toolset=gcc
./b2 -q \
variant=release \
address-model="${ARCH}" \
architecture=x86 \
debug-symbols=off \
threading=multi \
runtime-link=shared \
link=static,shared \
toolset=gcc-custom \
include="${INCLUDE_PATH}" \
cxxflags="${CXXFLAGS}" \
linkflags="${LINKFLAGS}" \
--layout=system \
-j"${CPU_COUNT}" \

The python2 line sorts out the header libraries required to compile, not included in the shallow clone. The rest are standard methods to install boost, ensuring the same compiler flags as the other compiled code and using the conda compiler.

I then needed to link this boost library statically (leaving the rest dynamic), so modified the make line as follows:

  SEER_LDLIBS="-L../gzstream -L${BOOST_BUILT}/lib -L/usr/local/hdf5/lib \
  -lhdf5 -lgzstream -lz -larmadillo -Wl,-Bstatic -lboost_program_options \
  -Wl,-Bdynamic -lopenblas -llapack -lpthread"


The final trick was linking armadillo correctly. Confusingly it built and linked ok, tested ok locally, but on the bioconda CI I got undefined symbols to lapack at runtime:

seer: symbol lookup error: seer: undefined symbol: wrapper_dgbsv_

This was due to armadillo’s wrapper around its include which links in the versions of blas/openblas and lapack defined at the time it was compiled, which I think must be slightly different from what is now included with the armadillo package dependencies on conda. Easy enough to fix, use a compiler flag to turn the wrapper off and link the libraries manually:

  LDFLAGS="${LDFLAGS} -larmadillo -lopenblas -llapack"

After all of that, it finally worked!

conda build: libarchive: cannot open shared object file: No such file or directory

I was getting the following error, attempting to run conda-build on a package, using a conda env:

Traceback (most recent call last):
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/bin/conda-build", line 7, in <module>
from conda_build.cli.main_build import main
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/conda_build/cli/", line 18, in <module>
import conda_build.api as api
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/conda_build/", line 22, in <module>
from conda_build.config import Config, get_or_merge_config, DEFAULT_PREFIX_LENGTH as _prefix_length
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/conda_build/", line 17, in <module>
from .variants import get_default_variant
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/conda_build/", line 15, in <module>
from conda_build.utils import ensure_list, trim_empty_keys, get_logger
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/conda_build/", line 10, in <module>
import libarchive
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/libarchive/", line 1, in <module>
from .entry import ArchiveEntry
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/libarchive/", line 6, in <module>
from . import ffi
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/site-packages/libarchive/", line 27, in <module>
libarchive = ctypes.cdll.LoadLibrary(libarchive_path)
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/ctypes/", line 426, in LoadLibrary
return self._dlltype(name)
File "/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/python3.6/ctypes/", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/libarchive: cannot open shared object file: No such file or directory

The /lib directory does contain libarchive, both as a dynamic (.so) and static (.a) library. There turned out to be two relevant environment variables:


A workaround is then to run

export LIBARCHIVE=/nfs/users/nfs_j/jl11/pathogen_nfs/large_installations/miniconda3/envs/conda_py36/lib/

There is probably a proper reason why this has happened and a permanent solution to the issue, but this works for now.

Installing phyx without sudo

I saw this phylogenetics package today, phyx: //

To install without admin rights/sudo I needed to do the following (my software is installed in my home ~/software, rather than e.g. /usr, /usr/local):

Compile armadillo as follows

cmake -DINSTALL_PREFIX=$(HOME)/software
make install

Compile nlopt as follows

./configure --with-cxx --without-octave --without-matlab --prefix=$(HOME)/software
make install

Compile phyx as follows (slightly hacky, maybe there’s a ‘proper’ way)

./configure --prefix=$(HOME)/software

change line 11 of the Makefile (CPP_LIBS) to add the library path:

CPP_LIBS = -L$(HOME)/software/lib -llapack -lblas -lpthread -lm

change line 23 of the Makefile (OPT_FLAGS) to add the include path/CPPFLAGS:

OPT_FLAGS := -O3 -std=c++11 -fopenmp -I$(HOME)/software/include

then you can run

make install




Sorting a massive file

I want to count the number of unique patterns in a vcf file. First I convert it to text with bcftools query:

bcftools query -f '[%GT]\n' vcf_in.vcf.gz > patterns.txt

The resulting patterns.txt is about 100Gb. The best way I found to count the unique patterns in this was with the following command:

LC_ALL=C sort -u --parallel=4 -S 990M -T ~/tmp_sort_files patterns.txt | wc -l

This used 1063Mb RAM, took 1521s and used a maximum of around 75Gb tmp space on my home (as the /tmp drive on the cluster ran out of space).

With thanks to //

R packages break after OS X upgrade

I recently upgraded from OS X 10.10 to 10.11. This has upgraded the version of the gfortran dynamic library from 2 to 3 (in /Library/Frameworks/R.framework/Resources/lib), which in turn causes problems in various R packages (msm, ape).

For those which give an error along the lines of

unable to load shared object

the solution seems to be to use install.packages recursively. Use it on the package that failed. If a dependency fails, use it on that too. Then restart R.

Some packages requiring compilation which link libgfortran (-lgfortran) fail, as the linker line does not give the correct directory through -L. I also have gfortran installed as part of gcc through homebrew, at /Users/john/homebrew/lib/gcc/4.9 (to do this, use ‘brew install gcc’).

Using this, add the line


to the file ~/.R/Makevars. This should work, as long as when you load the library you have this directory either indexed through OS X’s equivalent of ldconfig (if there is one?), or it is in LD_LIBRARY_PATH.


Installing PEER executable peertool

PEER (probabilistic estimation of expression residuals) is a tool to determine hidden factors from expression data, for use in genetic association studies such as eQTL mapping.

The method is first discussed in a 2010 PLOS Comp Bio paper:
and a guide to its applications and use in a 2012 Nature Protocols paper:

To install a command line version of the tool, you can clone it from the github page

git clone //

When installing, it won’t install the executable binary peertool by default, nor will it use a user directory as the install location (though the latter is addressed at the end of documentation). To install these, use the following commands:

cd peer && mkdir build && cd build
make install

Which will install peertool to ~/software/bin, which you can put on your path.


Display env variable, tmux and zsh over ssh

I have been using zsh within tmux, and found upon reattaching tmux X forwarding wasn’t working. For example when trying to launch gvim I’d get the error:

E233: cannot open display

The problem, a quick google determined, is that each time I ssh into my sever a new $DISPLAY environment variable is set. When I run ‘tmux attach’ the new $DISPLAY variable is passed through (see // so any new windows within tmux will have the correct environment. However the environment of any existing windows can’t be changed, causing the problem.

The best solution I found was proposed by Alex Teichman here: //
However I had two problems:

  1. It doesn’t seem to work with zsh rather than bash. I guess this is due to the behaviour of preexec() being different, but I couldn’t quickly work this out from the zsh manual.
  2. It maybe felt slightly inelegant to update $DISPLAY every single time a command is run

My solution is pretty similar. I add the following to ~/.zshrc:

echo $DISPLAY > ~/.display.txt
alias up_disp='export DISPLAY=`cat ~/.display.txt`'

This writes the correct $DISPLAY variable to a hidden file when a session is started (i.e. when I connect to the server). When I find forwarding isn’t working, I just run up_disp in that window.
Not the perfect solution, but it works ok for me

Compiling Stampy v1.0.23 for use with cortex – error: unrecognized command line option ‘-Wl’

To assemble illumina sequence data I am currently trialling assembly with cortex. To be able to use their Perl script to automate the pipeline between reads in and variant calls requires vcftools and stampy to be installed, and you provide the installation paths as input to the script.

However when running make using the default downloaded stampy makefile I got the following error from g++ (v4.8.1):

g++ `python2.7-config --ldflags` -pthread -shared -Wl build/linux-x86_64-2.7-ucs4/pyx/maptools.o build/linux-x86_64-2.7-ucs4/c/map
utils.o build/linux-x86_64-2.7-ucs4/c/alignutils.o build/linux-x86_64-2.7-ucs4/readalign.o build/linux-x86_64-2.7-ucs4/algebras.o build/linux-x86_64-2.7-ucs4/frontend.o -o
g++: error: unrecognized command line option ‘-Wl’

The solution was straightforward to find, as ever thanks to stackoverflow: //
All you need to do is edit lines 44 and 46 in the makefile, replacing the space after -Wl with a comma:

 43 ifeq ($(platform),linux-x86_64)
 44    g++ `$(python)-config --ldflags` -pthread -shared -Wl,$(objs) -o
 45 else
 46    g++ `$(python)-config --ldflags` -pthread -dynamiclib -Wl,$(objs) -o
 47 endif

As you can see from the surrounding if statement, this is only an issue on 64-bit linux platforms

I also tried compiling cortex with icc, but the compilation failed after a lot of errors. Rather than pursuing this further, I used gcc and only got warnings of unused variables in compilation