Things I have learnt about using R (as a python/C++ programmer)

  • For loops are ok. I thought these were impossibly slow in R, and you always had to vectorise code – but not so. A common error is to try and append to arrays. It always best to preallocate an array/list with result <- rep(NA, length) or similar, and then write to it by index.
  • strings aren’t the easiest thing to deal with, but stringr might help, and there is a regular expression package built-in. Use paste0 to concatenate.
  • That being said, if you can easily vectorise code you should do so.
  • It’s really easy to set a function as a variable and pass these around, bind its arguments (a partial in python) etc. Use this feature!
  • A matrix type has to contain the same data type, so if you have different data types and convert to a matrix, they will all be converted to a compatible type silently.
  • Use ‘%*%’ for matrix multiplication.
  • Use ‘% in %’ for sets.
  • Use array[array$column == value, ] or similar, for selecting values from a data.frame.
  • Use a single ‘&’ or ‘|’ when combining conditions in the above. Double ‘&&’ will short circuit.
  • data.frame columns must have the same type. If you want to mix types you’ll likely need a list.
  • Don’t use apply, as the first step is to convert to a matrix, making all your types the same. Use vapply, which defines the expected return type
  • R will use functions which are partial matches to the name you called without commenting on this behaviour (!?!). Add ‘options(warnPartialMatchAttr=TRUE,
    warnPartialMatchDollar=TRUE,
    warnPartialMatchArgs=TRUE)’ to ~/.Rprofile to turn this strange default off.
  • A numeric (float) and an integer are different types. Use 1L to make an integer ‘1’.
  • furrr is a nice library for parallelisation.
  • In general, tidyverse packages offer good alternatives to many data science functions in base R.
  • Extremely basic OOP is available using ‘S3’ objects (pretty much inheritance, and overriding of some typical functions such as print, summary based on type). ‘S4’ is to be avoided, apparently. ‘R6’ gives you more typical features.
  • Use the devtools package for your development, it automates most building and testing of the code.
  • RStudio has a nice built-in profiler.
  • R has a good FFI with C and can automatically sort out compiling for you. C++ is also possible with RCpp, but I believe a little more involved.

R packages break after OS X upgrade

I recently upgraded from OS X 10.10 to 10.11. This has upgraded the version of the gfortran dynamic library from 2 to 3 (in /Library/Frameworks/R.framework/Resources/lib), which in turn causes problems in various R packages (msm, ape).

For those which give an error along the lines of

unable to load shared object

the solution seems to be to use install.packages recursively. Use it on the package that failed. If a dependency fails, use it on that too. Then restart R.

Some packages requiring compilation which link libgfortran (-lgfortran) fail, as the linker line does not give the correct directory through -L. I also have gfortran installed as part of gcc through homebrew, at /Users/john/homebrew/lib/gcc/4.9 (to do this, use ‘brew install gcc’).

Using this, add the line

LDFLAGS=-L/Users/jl11/homebrew/lib/gcc/4.9

to the file ~/.R/Makevars. This should work, as long as when you load the library you have this directory either indexed through OS X’s equivalent of ldconfig (if there is one?), or it is in LD_LIBRARY_PATH.