A biologist playing the numbers game

A biologist playing the numbers game

Biologists generally dislike numbers, as a rule. Probably because numbers require you to do all the work before anything interesting happens. Numbers don’t metabolise, or synthesise, or secrete, or replicate. They don’t behave in different ways under the same conditions. They are, in a word, reliable. We like what they represent, but we don’t like that they are an abstraction of what actually interests us about biology.

But this is 2017. Gone are the days of Leidy, Manson and Darwin where a biologist could spend their life avoiding the numbers game and still rise to the top of their field. Biology means data, and data means statistics! At some point, every young biologist goes through the realisation that they have to bite the bullet and actually learn a bit of stats in R (or MatLab, if you would rather use something with a price tag), rather than subsist on the vague idea of statistical analysis all those modules you took furnished you with. After all, wouldn’t it be nice to be an author on one of those nice shiny papers with all that important looking multivariate analysis in it.

So what to do? Well go on a course of course. I did exactly that. I found myself an interesting and relevant looking course ran by PR Statistics, looking at analysis of population genomics data in R. The course took me through how to use various packages available in R, particularly Adegenet [1], to reveal structure in your data and was instructed by the developers behind Adegenet: Thibaut Jombart & Zhian N Kamvar, who were as knowledgeable and skilled instructors as you could encounter. If you perform statistical analysis on allelic frequencies in population data sets, then I would highly recommend this package, it contains everything that you could need to elucidate even the most subtle structure in data. Set in the almost idyllic location of Margam country park, east of Swansea, it was a week which did not leave me, or any of my course mates (many of whom had travelled from as far as the USA) wanting. I feel compelled to also mention the cake that the cooks set out for us every day, which resulted in me leaving Margam a few pounds heavier, as well as a week wiser. If this course is representative of all courses ran by PR Statistics, then I can highly recommend them.
So armed with my new, more informed view on statistical analysis in R, I can go forth and see what I can make of my own data sets and see if I can’t produce some of those oh so aesthetic graphs myself. As it happens, I quite like numbers now.

Many thanks to Oliver Hooker for organising the course, and to Thibaut and Zhian for their expert instruction.

By Arthur Morris

Home Page

[1] T. Jombart, “Adegenet: A R package for the multivariate analysis of genetic markers,” Bioinformatics, vol. 24, no. 11, pp. 1403–1405, 2008.