Genome Informatics 2016

This annual conference alternates between the Wellcome Genome Campus in Hinxton, UK, and Cold Springs Harbour, USA. While the CSHL campus is quite lovely and prestigious, I was thrilled to finally have a chance to visit the vaunted Sanger Institute; they were pioneers of genomics from the very beginning, and are still a leading source of innovation in the field. The campus was a great mix of older original buildings and modern construction, and the conference drew some of the leaders in genomics.

One of the keynote speakers was Richard Durbin, a leader of such projects as 1000 genomes, Pfam, WormBase, and Ensembl. His book on Biological Sequence Analysis is the standard in the field, and his sequence similarity matching software is probably the most widely used after BLAST. Here, he was speaking about a new way to store genome information, one that the field will likely be using in the not-so-distant future. Today, a genome is typically stored as a sequence of characters, which works very well. But if you sequence thousands of individuals from a population, the vast majority of these characters will be repeated. This ends up wasting terabytes of storage, and further, the collection of files tells you nothing about the population itself without extensive processing. A graphical representation of the population of genomes is far more efficient, allows common errors to be easily identified and corrected, inherently contains information about the population, and lends itself to efficient computation and manipulation. This may seem like a minor and overly-technical detail, but it is likely to change the field in the near future.

There were, of course, many other things to be learned. As an example, RNAseq is a widespread technique used to study gene expression in some cell or organism. Unfortunately, analysis of these data can be misleadingly difficult, and mistreatment can easily result in incorrect conclusions. A Venn diagram is a common way to display complex membership data, and research groups have been using these with RNAseq data for years. However, binning the data in this way completely discards all relative information, and often lower-level membership information as well. A group from Melbourne has developed an excellent tool to make these analyses much easier, even for non-specialists, and adoption of this or similar software could make RNAseq studies more reliable and predictive in the future.

The conference was excellent, and I would absolutely encourage others to attend future iterations if they are interested in genomics. I sincerely appreciate the support of the Office of Graduate Education and the HPI for allowing me to go.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s