Keeping a lab notebook with org-mode, git, Papers, and Pandoc (Part II)

I promised a while ago that I’d post the code I used to make org-mode a powerful lab notebook, and here it is. In Step 1, I present the emacs lisp for the lab-notebook minor mode. In Step 2, I show you how to leverage Papers to work with Emacs and Pandoc. In Step 3, I show you how to hook Pandoc up with Emacs.

As I mentioned previously, these are all hacked-together solutions. My requirements and setup will likely be different from yours, and I have probably forgotten a few key steps. For instance, while I use a Mac, the only parts that are specific to Macs are the hooks with Papers, a reference library application. Emacs, Pandoc, and bibtex are all fairly cross-platform solutions, I think.

Step 1: lab-notebook mode in emacs

The code in the gist below provides a minor-mode for emacs that enables checkin-on-save and auto-export from Papers (using the Applescript shown in Step 2). Load this in your .emacs file and initiate by M-x lab-notebook-mode, or add -*- eval: (lab-notebook-mode) -*- to the top of your notebook file.

(defun ensure-in-vc-or-checkin ()
(interactive)
(if (file-exists-p (format "%s" (buffer-file-name)))
(progn (vc-next-action nil) (message "Committed"))
(ding) (message "File not checked in.")))
(defun export-bibtex ()
"Exports Papers library using a custom applescript."
(interactive)
(message "Exporting papers library...")
(shell-command "osascript ~/notebook/papers_bibtex_export.scpt"))
;;;###autoload
(define-minor-mode lab-notebook-mode
"Toggle lab notebook mode.
In this mode, org files that are saved are automatically committed by the VC
system in Emacs. Additionally, Papers library export to bibtex is hooked to
the <f5> key.
Future additions may hook a confluence upload or Org export to this mode as well."
;; initial value
nil
;; indicator
:lighter " LN"
;; keybindings
:keymap (let ((map (make-sparse-keymap)))
(define-key map (kbd "<f5>") 'export-bibtex)
map)
;; body
(add-hook 'after-save-hook 'ensure-in-vc-or-checkin nil 'make-it-local))
(provide 'lab-notebook-mode)
view raw lab-notebook.el hosted with ❤ by GitHub

Step 2: Papers auto-export and citekeys

Papers comes with a Citations tool that allows you to insert a citekey in any text field, including Emacs. A citekey is a short, unique reference to a publication or entry in your reference library, used by tools to automatically format your citations upon printing or export. To use, summon Citations using whatever keybinding you chose in Papers’ preferences and add a citekey like you would in a Word document. In our setup, we’ll be using Pandoc, so be sure to change the citekey type to ‘Pandoc’ (Papers -> Preferences -> Citations -> Format: Pandoc Citation).

Pandoc doesn’t talk to Papers, but Papers can export its library in a format that Pandoc will understand, called bibtex. We will want this to happen automatically before Pandoc attempts to render our notebook. The applescript below exports my Papers library as a bibtex file in the location specified by “path_to_notebook_folder” (replace with whatever you want). Save this file in the same location (or wherever) and update the part of the Gist above that starts (shell-command "osascript ... with the right path and name that matches whatever you saved it as.

tell application "Papers"
set outFile to "path_to_notebook_folder/library.bib"
export ((every publication item) as list) to outFile
end tell

The benefit of adding this into Emacs is that you can either press a key (by default, F5) to automatically keep your bibtex file up-to-date, or add a hook so that when you render your notebook using Pandoc, it automatically updates your bibtex file first before trying to build your citations.

Step 3: Use ox-pandoc in emacs to export a pretty version of your notebook.

While org-mode does include its own LaTeX and HTML export commands, I found it easier to use Pandoc and the ox-pandoc Emacs package. The benefits of Pandoc is that it exports to a much wider variety of formats and has a simple and easy citations manager, pandoc-citeproc. Citeproc can use the bibtex library we exported in Step 2- all we have to do is add the following somewhere in our notebook org file:
#+PANDOC_OPTIONS: bibliography:library.bib
Make sure there is a library.bib file in the same folder as your notebook file, of course. Now, to export to Latex with formatted references, simply use M-x org-pandoc-export-to-latex-pdf and it will produce a beautiful PDF copy of your notebook. Note: in case it wasn’t clear, you will need to install Pandoc and ox-pandoc since they are not built in. Only recent versions of Pandoc handle org-mode files, so make sure you’re up-to-date. ox-pandoc is available on the Emacs package repo Marmelade.

Example notebook.org file

An example notebook.org file. Since Github automatically renders org-mode files when they’re displayed, this is a link to the raw source so you can see the markup used.

Pandoc can export to Github-flavored markdown, so here is the same file rendered:

<2014-09-25 Thu>

Reading this (Wang et al. 2010), in particular Supp. report 4. Has to do with bootstrapping population estimates using the different enzymes used to find integration sites.

<2014-09-26 Fri>

Notes on (Wang et al. 2010) :popest:petersen:intsites:

The review (here A. Chao 2001) indicates sample coverage approaches may be most appropriate for models where the capture probabilities can vary both over time and between species (which is certainly the case for estimating populations from longitudinal T cell sequencing data). I'll need to read more (esp. (Tsay and Chao 2001) and (Chao and Tsay 1998)).

Also found the package Rcapture which appears to fit a Poisson regression to incident counts. I've gotten it to work (kind of) but R crashed trying to plot the model so I'll need to come back to this. If memory serves, it was substantially more conservative in its estimates than the Chao2 estimate.

References

Chao, A, and P K Tsay. 1998. “A sample coverage approach to multiple-system estimation with application to census undercount.” Journal of the American Statistical Association 93 (441): 283–93. doi:10.2307/2669624.

Chao, Anne. 2001. “An overview of closed capture-recapture models.” Journal of Agricultural, Biological, and Environmental Statistics 6 (2). Springer-Verlag: 158–75. doi:10.1198/108571101750524670.

Tsay, P K, and A Chao. 2001. “Population size estimation for capture - recapture models with applications to epidemiological data.” Journal of Applied Statistics 28 (1): 25–36. http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=mekentosj&SrcApp=Papers&DestLinkType=FullRecord&DestApp=WOS&KeyUT=000165844500002.

Wang, Gary P, Charles C Berry, Nirav Malani, Philippe Leboulch, Alain Fischer, Salima Hacein-Bey-Abina, Marina Cavazzana-Calvo, and Frederic D Bushman. 2010. “Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trial.” Blood 115 (22): 4356–66. doi:10.1182/blood-2009-12-257352.

Further reading

Keeping a lab notebook with org-mode, git, Papers, and Pandoc (Part I)

Lab notebooks for computational biologists can be tricky. In my experience, I often don’t work in a linear, experimental fashion that’s conducive to writing down methods and results the way we’ve been taught to do as biology graduate students. My usual workflow involves a lot of reading, a lot of data exploration in R, and by-hand data munging.

With the exception of R and its suite of reproducible research tools such as Sweave and Knitr, it’s very difficult to lay down a cohesive, immutable “paper trail” of work that serves as a record the same way a formal lab notebook does. Moreover, it’s hard to keep track of ideas and thoughts over time. There’s nothing quite as… disheartening? as coming across what you thought was a new approach or idea and realizing you had been there a year ago.

My ideal requirements for a lab notebook would then be as follows:

  1. Permanence. This has two aspects, the first being immutability- I can’t go back and alter what I’ve written without some sort of record and explanation of the change. The second is durability- it should not live only on my local machine, but should ideally be instantly and immediately backed up to a server.
  2. Citations. I organize all the papers I’m reading with Papers on the Mac. It’s not a perfect piece of software, but it works well in the areas I need it to. I would need a lab notebook to integrate with Papers so that I can easily insert references to papers and know months later a) what I was talking about and b) how to easily find them in my library.
  3. Ease-of-use. I live in my terminal, like many computational people. I want my lab notebook to be accessible in the terminal, just like a wet-lab scientist would have their notebook by them on the bench. Moreover, I want it to be searchable using the same tools I use to work with data, not some ugly XML or binary format that I cannot parse on the command line.
  4. Discoverability. While at this point it’s unlikely anybody would find something particularly illuminating in my lab notebook, anybody who inherits my projects or wants to continue my work should be able to read some version of my notebook easily and find relevant information quickly. (This person could easily be me after working on some other project for a while.) In this sense, the notebook should have a coherent structure and internal format so that you could produce a table of contents or list of entries and their associated topics. For thesis meetings, it should be able to be converted to a printable, readable document.

The tools that fulfill these requirements all exist already. SVN or Git on a remote server solves the issue of permanence. Papers includes a citekey insertion service to add Latex or Pandoc citekeys to any document that accepts keyboard input, and its internal library format is exportable as a Bibtex library. Emacs is my preferred text editor and is primarily a command-line application (and an awesome one at that), and includes hooks to version control systems. Finally, Emacs includes org-mode, a to-do list and note-taking mode that provides document structure, tags, and easy export to Latex or HTML. What’s even better is that Pandoc, a command-line document converter, can work with org-mode .org files and convert them to a huge number of alternate formats, as well as handle citations (in case you don’t want to export to Latex).

I’ve just finished writing a set of glue scripts and Emacs lisp that makes a manageable lab notebook. As of now, my notebook:

  • Automatically exports my Papers library to a Bibtex file in the same directory as my notebook every time I save (using Papers’ Applescript hooks)
  • Automatically commits and uploads both the notebook and the Bibtex file to SVN on save (but is agnostic to what VC system is being used)
  • Exports to a PDF with citations and a table-of-contents with a single keypress. (using ox-pandoc and pandoc-citeproc)
  • Is entirely contained within plain-text .org files, which are entirely readable outside Emacs (the org format looks a little like mutant Markdown)
  • Can parse and execute inline code blocks in bash script, R, and Python, so that I can document what I do outside of the reproducible reports generated with Sweave, using the org-mode Babel extension.
  • Works with tables and inline spreadsheets so I can quickly and easily summarize data in the document itself (another amazing org-mode function).

In Part II I’ll publish the code I used to accomplish all of this, a lot of which will be pretty inelegant and hackish. However, I think that it’ll be a useful starting point for anybody else who wants to adopt this workflow.