Learning Rust for bioinformatics, part I

rust-logo-256x256I’m taking a micro-sabbatical (ha!) after dragging two major projects to publication/submission, and as a way to relax, I decided to start learning Rust.

Rust makes a compelling case for use in bioinformatics. Its tooling and language features emphasize good practices for scientific applications:

  • Tooling: Cargo, its package manager and build system, is a strict adherent to semantic versioning (unlike Go, which, as far as I can tell, doesn’t care at all). This means that you know exactly what version of your dependencies were required to build a package, and all versions of a Rust package (called a crate) are stored permanently in the crates.io online repository. This is essential for reproducibility and makes it easy to distribute Rust source code.
  • Memory safety: The language is built around memory safety and robust error handling, such that a whole class of errors are prevented by the compiler. Encouraging more robust code seems like a good idea for the wild west of scientific programming.
  • Data manipulation: Certain language features in Rust also make writing it feel similar to higher-level languages. In particular, its use of iterators and method chaining let you implement lazy data processing through chaining together map() and filter() functions, and reminds me of working with dplyr in R.
  • Documentation: It’s trivial to add documentation to a library in Rust – similar to, and as easy as, docstrings in Python, and there’s automatic rendering to a browsable HTML form using docs.rs. This to me feels like the best of Python and R+Roxygen2. All Cargo crates have their docs hosted on docs.rs.
  • Promising libraries: rust-bio is a growing set of well-implemented bioinformatics algorithms and format parsers. And nom is a library for building extremely fast parsers, which could be easily leveraged to create efficient parsers for lots of other formats. Some of the things that are missing right now are good equivalents to Python’s scipy and numpy libraries.

Rust isn’t a replacement for data analysis in R and Python, but from what I’ve seen, I think it will be a viable replacement for applications that would otherwise require C/C++ in high-performance scientific computing.

The Rust logo is licensed by Mozilla under the CC-BY license.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s