Keeping a lab notebook with org-mode, git, Papers, and Pandoc (Part I)

Lab notebooks for computational biologists can be tricky. In my experience, I often don’t work in a linear, experimental fashion that’s conducive to writing down methods and results the way we’ve been taught to do as biology graduate students. My usual workflow involves a lot of reading, a lot of data exploration in R, and by-hand data munging.

With the exception of R and its suite of reproducible research tools such as Sweave and Knitr, it’s very difficult to lay down a cohesive, immutable “paper trail” of work that serves as a record the same way a formal lab notebook does. Moreover, it’s hard to keep track of ideas and thoughts over time. There’s nothing quite as… disheartening? as coming across what you thought was a new approach or idea and realizing you had been there a year ago.

My ideal requirements for a lab notebook would then be as follows:

  1. Permanence. This has two aspects, the first being immutability- I can’t go back and alter what I’ve written without some sort of record and explanation of the change. The second is durability- it should not live only on my local machine, but should ideally be instantly and immediately backed up to a server.
  2. Citations. I organize all the papers I’m reading with Papers on the Mac. It’s not a perfect piece of software, but it works well in the areas I need it to. I would need a lab notebook to integrate with Papers so that I can easily insert references to papers and know months later a) what I was talking about and b) how to easily find them in my library.
  3. Ease-of-use. I live in my terminal, like many computational people. I want my lab notebook to be accessible in the terminal, just like a wet-lab scientist would have their notebook by them on the bench. Moreover, I want it to be searchable using the same tools I use to work with data, not some ugly XML or binary format that I cannot parse on the command line.
  4. Discoverability. While at this point it’s unlikely anybody would find something particularly illuminating in my lab notebook, anybody who inherits my projects or wants to continue my work should be able to read some version of my notebook easily and find relevant information quickly. (This person could easily be me after working on some other project for a while.) In this sense, the notebook should have a coherent structure and internal format so that you could produce a table of contents or list of entries and their associated topics. For thesis meetings, it should be able to be converted to a printable, readable document.

The tools that fulfill these requirements all exist already. SVN or Git on a remote server solves the issue of permanence. Papers includes a citekey insertion service to add Latex or Pandoc citekeys to any document that accepts keyboard input, and its internal library format is exportable as a Bibtex library. Emacs is my preferred text editor and is primarily a command-line application (and an awesome one at that), and includes hooks to version control systems. Finally, Emacs includes org-mode, a to-do list and note-taking mode that provides document structure, tags, and easy export to Latex or HTML. What’s even better is that Pandoc, a command-line document converter, can work with org-mode .org files and convert them to a huge number of alternate formats, as well as handle citations (in case you don’t want to export to Latex).

I’ve just finished writing a set of glue scripts and Emacs lisp that makes a manageable lab notebook. As of now, my notebook:

  • Automatically exports my Papers library to a Bibtex file in the same directory as my notebook every time I save (using Papers’ Applescript hooks)
  • Automatically commits and uploads both the notebook and the Bibtex file to SVN on save (but is agnostic to what VC system is being used)
  • Exports to a PDF with citations and a table-of-contents with a single keypress. (using ox-pandoc and pandoc-citeproc)
  • Is entirely contained within plain-text .org files, which are entirely readable outside Emacs (the org format looks a little like mutant Markdown)
  • Can parse and execute inline code blocks in bash script, R, and Python, so that I can document what I do outside of the reproducible reports generated with Sweave, using the org-mode Babel extension.
  • Works with tables and inline spreadsheets so I can quickly and easily summarize data in the document itself (another amazing org-mode function).

In Part II I’ll publish the code I used to accomplish all of this, a lot of which will be pretty inelegant and hackish. However, I think that it’ll be a useful starting point for anybody else who wants to adopt this workflow.