Learning Rust for bioinformatics, part I

rust-logo-256x256I’m taking a micro-sabbatical (ha!) after dragging two major projects to publication/submission, and as a way to relax, I decided to start learning Rust.

Rust makes a compelling case for use in bioinformatics. Its tooling and language features emphasize good practices for scientific applications:

  • Tooling: Cargo, its package manager and build system, is a strict adherent to semantic versioning (unlike Go, which, as far as I can tell, doesn’t care at all). This means that you know exactly what version of your dependencies were required to build a package, and all versions of a Rust package (called a crate) are stored permanently in the crates.io online repository. This is essential for reproducibility and makes it easy to distribute Rust source code.
  • Memory safety: The language is built around memory safety and robust error handling, such that a whole class of errors are prevented by the compiler. Encouraging more robust code seems like a good idea for the wild west of scientific programming.
  • Data manipulation: Certain language features in Rust also make writing it feel similar to higher-level languages. In particular, its use of iterators and method chaining let you implement lazy data processing through chaining together map() and filter() functions, and reminds me of working with dplyr in R.
  • Documentation: It’s trivial to add documentation to a library in Rust – similar to, and as easy as, docstrings in Python, and there’s automatic rendering to a browsable HTML form using docs.rs. This to me feels like the best of Python and R+Roxygen2. All Cargo crates have their docs hosted on docs.rs.
  • Promising libraries: rust-bio is a growing set of well-implemented bioinformatics algorithms and format parsers. And nom is a library for building extremely fast parsers, which could be easily leveraged to create efficient parsers for lots of other formats. Some of the things that are missing right now are good equivalents to Python’s scipy and numpy libraries.

Rust isn’t a replacement for data analysis in R and Python, but from what I’ve seen, I think it will be a viable replacement for applications that would otherwise require C/C++ in high-performance scientific computing.

The Rust logo is licensed by Mozilla under the CC-BY license.