Bioinformatics in Toronto

139322084-lysanyqu-_dsc21070001In May of this year the University of Toronto hosted the first Canadian Computational Biology Conference, and bioinformatics geeks like myself flocked to their beautiful campus from all over the continent and across the pond. There was a lot of cool new work being presented, and it’s always interesting to get an update on the current state of our art. It was especially striking as the conference took place at Victoria College, which is essentially a castle sitting on on the U of T campus. I’ve never before heard about new applications of set cover machines or the pitfalls of failing to treat metagenomics data as compositional, while the speaker is being framed by beautiful 180 year old stained glass windows.

me_presenting_2I was fortunate enough to be able to present my work in a talk, which was somewhat nerve-wracking as my session was chaired by Dannie Durand, the very researcher whose work I have adapted and modified for my own purposes. She spoke about their recent work analyzing gene families by considering the evolution of each domain separately. We also heard about the staggering complexity involved in trying to predict transcription factor binding behavior, about how representing bacterial genomes as profiles of k-mers yields better phenotypic information than standard phylogenetic analyses, and how disordered regions on eukaryotic proteins are vital to their functions (especially transcription factors).

In addition, we heard about several projects out of Greg Gloor’s group, with his talk titled “We’ve been analyzing high throughput sequence data in the wrong geometric space”. It raised a fundamental problem with how abundance sequencing data is commonly handled, essentially suggesting that most of the metagenomics field have been analyzing their data incorrectly. It boils down to the fact that 10% of one sample can’t be assumed to be the same as 10% of an independent sample, when you have no idea about how many total data you have in each sample. If your data are ratios, you can’t treat them the same as if they were actual counts. Spurious correlations will absolutely appear simply due to the structure of the data, and even simple operations like addition and subtraction no longer really work. Researchers have not really properly respected this fundamental issue, but there is now a tool available called CoDaSeq to help them handle it in the future.

Overall I found the conference very valuable, and I am very grateful to the HPI for the funding and support that sent me there. Hopefully this conference will continue in the future!

 

Advertisements

One thought on “Bioinformatics in Toronto

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s