Rust take two -- I guess I'm a rust guy now?

2023-02-08 920 words 5 minutes

Contents

About a year ago I had my first foray into rust, which I used to do advent of code. I wrote up my thoughts at the time, but in summary although there were some immediately nice features such as the build and dependency system, I wasn’t immediately convinced and decided to stick with C++. I also said I’d give it a go for my next WASM project.

That time arrived at the end of last year, and what I intended to be a web implementation has turned into a full CLI tool where I used rust over C++. If you’re interested, the code is here and documentation at https://docs.rs/ska.

Briefly, the method is for constructing sequence alignments rapidly between closely related pathogen genomes, and uses a k-mer + wobble base mapping strategy. More details are in the original preprint and in the README.

But here I wanted to talk more specifically about our use of rust over C++, and why I’ve changed my mind. Advent of code has its problems for learning a language. You ultimately want to get to the answer as quickly as possible, and a new language can feel like it’s just getting in your way. I enjoyed the experience much more writing a larger project that I wanted to take a longer-term interest in.

Maintenance

A more sustainable future for the software is really the main reason I moved from C++ to rust for ska.

Will a change in the API of a dependency, automatically pulled in by conda/mamba, break my code? Not with cargo.

We just produce a single binary with no dynamic libraries to link to, which is much easier for users to install.

A smaller point, but the safety system and use of iterators also made me more confident that no nasty undescribed segfaults are going to happen.

I’ve been finding it increasingly difficult to maintain all the software packages we have developed, especially as people leave and projects move on, so this has become a major consideration.

I’m also quite pleased with the github actions I set up. As well as testing, release, release notes, updating crates.io, and compiling binaries for the release are all automated on pushing a tag. Definite credit to Rich FitzJohn to the one he’s got set up in dust, from which I borrowed heavily.

Smoother development

The squiggles ‘just working’ in VSCode, and giving informative errors at the right point made development a lot quicker. I have a bad habit of writing large amounts of C++ before trying to compile it – doing it dynamically was a big upgrade.

The debugger also worked in the IDE too, which made fixing some of my errors much quicker. This had become quite clunky for the python plus C++ packages I’ve been writing recently: recompile with debug flags, find the right arguments, run with gdb, set breakpoints, repeat.

Rust from here on out?

I’ve still not managed to try the WASM part which was my original goal, but I enjoyed writing ska.rust.

I’ll see how it goes in practice, but if it really does reduce the maintainence burden and avoid installation issues I can see myself switching over to rust for most projects.

Other changes in opinion since last year

I found parallelisation via rayon very good, and in fact it allowed me to do recursion, which would have been challenging with openMP.
Lots of packages and libraries seem to be available, I surprisingly didn’t find myself missing C++ for external support.
I got used to the documentation (some libarires of course do a better job than others).
ndarray is perfectly fine, maybe I even like it more than Eigen.
I got the hang of String and str when I worked out the relationship with [u8], char and UTF-8.
I thought cargo fmt and cargo clippy were very useful. The latter in particular made some very impressive suggestions e.g. using built-in functions such as .skip() to replace unusual for loops with iterators; using a built-in .saturating_add() rather than a more manual type limit check.

Still frustrating me

Generics and templates don’t seem as powerful as in C++, and I’ve struggled to use them effectively. I found this a helpful guide. Specifically, type specialisations are not directly supported.
Setting up testing using input/output files was a pain, but now I’ve got it going it would be easier next time.

Some advice I wish I’d had earlier when learning

Use iterators rather than for loops (the bounds are carefully implemented).
Take slices/borrows as function arguments rather than their owned versions.
rust has an explicit empty type None, which is actually really useful. It’s not necessarily the most straightforward thing to use, but solves a lot of problems. Its counter part is Some(value), and together these are the Option struct.
rust also has an explicit error type, which is similar to Option and supports a lot of the same methods. However you have Ok rather than Some, and the Err can contain a value (usually an error description) unlike None. You can also end lines with ?; to propogate errors forward and not deal with them on every call, which is useful for example when writing to a file.
Both the Option and Error types are used frequently by library code, so while it might be a while before you implement them yourself, you can’t really avoid interacting with them. A reasonable first strategy is just to use .unwrap() to extract the value or panic on None/Err, while you get used to this.