Time traveling through the Gene Ontology Annotations

Thanks to the wonders of version control, Gene Ontology human gene annotations can be found stretching all the way back to 2004:


That’s quite cool if you wanted to a historical analysis of how these annotations are changing with time (which I do). For instance, if you wanted to see how many terms have been marked as obsolete since 2004, and you’ve downloaded the current gene_ontology_ext.obo file and the goa file from ’04:

To see how many total annotations we’ve got:

$ cat gene_association.goa_human_18 | gawkt '{print $5}' | sort | uniq | wc -l

(by the way, gawkt=’awk -F”\t” -v OFS=”\t”‘)

To see how many we’ve got sans obsolescence:

$ cat gene_association.goa_human_18 | python filter_obs.py | gawkt '{print $5}' | sort | uniq | wc -l

That python script (filter_obs.py) is available below.

So we’ve got 381/3989 – 9.5%- that have been retired since ’04. That’s not too shabby, although I imagine the GO hierarchy and overall structure has changed more significantly since then. Still, it makes it plausible to track the gene annotations of the majority of terms over the last 8 years.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s