Statistical Mechanics of Complex Networks (Barabasi, 2002)
A modern primer on complex networks across numerous disciplines.
Statistical Mechanics of Complex Networks (Barabasi, 2002)
A modern primer on complex networks across numerous disciplines.
On Random Graphs (Erdos, Renyi 1959)
One of the fundamental papers on random graphs is available here. Lays the framework for analysis of emergent properties of random graphs.
So I’ve been working a lot with NCBI GEO recently for a paper on the Gene Ontology. During the course of this work I wound up implementing about 70% of the famous R package GEOQuery in Python (as I’m much more fluent in Python than R) and decided that it might be worthwhile to submit to the BioPython project. Their existing GEO parser is woefully inadequate and slightly buggy (I don’t believe it can handle the curated GEO Dataset format, it has no programmatic access to NCBI GEO, and offers no way to do any statistical analysis on the resulting microarray data).
My fork, which is available here, revamps the Geo package to provide the following features:
I still haven’t written unit tests for it all yet (a persistent failing- one of many, I’ll admit) mostly because it was developed a bit on-the-fly during my work. However, I also know that it works for at least a subsection of uses, and it’s well-documented.
The two modified files are here for the morbidly curious:
You have no idea the pain I feel when I sit down to program. I’m walking on razor blades and broken glass. You have no idea the contempt I feel for C++, for J2EE, for your favorite XML parser, for the pathetic junk we’re using to perform computations today. There are a few diamonds in the rough, a few glimmers of beauty here and there, but most of what I feel is simply indescribable nausea.
— Steve Yegge, Moore’s Law is Crap
For command line tools: Clint (https://github.com/kennethreitz/clint)
For HTTP requests: Requests (https://github.com/kennethreitz/requests)
For tabular data: Tablib (https://github.com/kennethreitz/tablib)
Apparently Kenneth Reitz is the man.
Thanks to the wonders of version control, Gene Ontology human gene annotations can be found stretching all the way back to 2004:
http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/gene_association.goa_human.gz
That’s quite cool if you wanted to a historical analysis of how these annotations are changing with time (which I do). For instance, if you wanted to see how many terms have been marked as obsolete since 2004, and you’ve downloaded the current gene_ontology_ext.obo file and the goa file from ’04:
To see how many total annotations we’ve got:
$ cat gene_association.goa_human_18 | gawkt '{print $5}' | sort | uniq | wc -l 3989
(by the way, gawkt=’awk -F”\t” -v OFS=”\t”‘)
To see how many we’ve got sans obsolescence:
$ cat gene_association.goa_human_18 | python filter_obs.py | gawkt '{print $5}' | sort | uniq | wc -l 3608
That python script (filter_obs.py) is available below.
So we’ve got 381/3989 – 9.5%- that have been retired since ’04. That’s not too shabby, although I imagine the GO hierarchy and overall structure has changed more significantly since then. Still, it makes it plausible to track the gene annotations of the majority of terms over the last 8 years.