Contents

Things I have learnt about using R (as a python/C++ programmer)

Contents
  • For loops are ok. I thought these were impossibly slow in R, and you always had to vectorise code – but not so. A common error is to try and append to arrays. It always best to preallocate an array/list with result <- rep(NA, length) or similar, and then write to it by index.
  • strings aren’t the easiest thing to deal with, but stringr might help, and there is a regular expression package built-in. Use paste0 to concatenate.
  • That being said, if you can easily vectorise code you should do so.
  • It’s really easy to set a function as a variable and pass these around, bind its arguments (a partial in python) etc. Use this feature!
  • A matrix type has to contain the same data type, so if you have different data types and convert to a matrix, they will all be converted to a compatible type silently.
  • Use %*% for matrix multiplication.
  • Use % in % for sets.
  • Use array[array$column == value, ] or similar, for selecting values from a data.frame.
  • Use a single & or | when combining conditions in the above. Double && will short circuit.
  • data.frame columns must have the same type. If you want to mix types you’ll likely need a list.
  • Don’t use apply, as the first step is to convert to a matrix, making all your types the same. Use vapply, which defines the expected return type
  • R will use functions which are partial matches to the name you called without commenting on this behaviour (!?!). Add options(warnPartialMatchAttr=TRUE, warnPartialMatchDollar=TRUE, warnPartialMatchArgs=TRUE) to ~/.Rprofile to turn this strange default off.
  • A numeric (float) and an integer are different types. Use 1L to make an integer ‘1’.
  • furrr is a nice library for parallelisation.
  • In general, tidyverse packages offer good alternatives to many data science functions in base R.
  • Extremely basic OOP is available using ‘S3’ objects (pretty much inheritance, and overriding of some typical functions such as print, summary based on type). ‘S4’ is to be avoided, apparently. ‘R6’ gives you more typical features.
  • Use the devtools package for your development, it automates most building and testing of the code.
  • RStudio has a nice built-in profiler.
  • R has a good FFI with C and can automatically sort out compiling for you. C++ is also possible with RCpp, but I believe a little more involved.