A biologist playing the numbers game

A biologist playing the numbers game

Biologists generally dislike numbers, as a rule. Probably because numbers require you to do all the work before anything interesting happens. Numbers don’t metabolise, or synthesise, or secrete, or replicate. They don’t behave in different ways under the same conditions. They are, in a word, reliable. We like what they represent, but we don’t like that they are an abstraction of what actually interests us about biology.

But this is 2017. Gone are the days of Leidy, Manson and Darwin where a biologist could spend their life avoiding the numbers game and still rise to the top of their field. Biology means data, and data means statistics! At some point, every young biologist goes through the realisation that they have to bite the bullet and actually learn a bit of stats in R (or MatLab, if you would rather use something with a price tag), rather than subsist on the vague idea of statistical analysis all those modules you took furnished you with. After all, wouldn’t it be nice to be an author on one of those nice shiny papers with all that important looking multivariate analysis in it.

So what to do? Well go on a course of course. I did exactly that. I found myself an interesting and relevant looking course ran by PR Statistics, looking at analysis of population genomics data in R. The course took me through how to use various packages available in R, particularly Adegenet [1], to reveal structure in your data and was instructed by the developers behind Adegenet: Thibaut Jombart & Zhian N Kamvar, who were as knowledgeable and skilled instructors as you could encounter. If you perform statistical analysis on allelic frequencies in population data sets, then I would highly recommend this package, it contains everything that you could need to elucidate even the most subtle structure in data. Set in the almost idyllic location of Margam country park, east of Swansea, it was a week which did not leave me, or any of my course mates (many of whom had travelled from as far as the USA) wanting. I feel compelled to also mention the cake that the cooks set out for us every day, which resulted in me leaving Margam a few pounds heavier, as well as a week wiser. If this course is representative of all courses ran by PR Statistics, then I can highly recommend them.
So armed with my new, more informed view on statistical analysis in R, I can go forth and see what I can make of my own data sets and see if I can’t produce some of those oh so aesthetic graphs myself. As it happens, I quite like numbers now.

Many thanks to Oliver Hooker for organising the course, and to Thibaut and Zhian for their expert instruction.

By Arthur Morris

Home Page

[1] T. Jombart, “Adegenet: A R package for the multivariate analysis of genetic markers,” Bioinformatics, vol. 24, no. 11, pp. 1403–1405, 2008.

Social Media Scientific Collaborations?

My only foray into social media during my scientific career has been limited to one platform involving the use of 140 characters at a time. I use no other social media platforms. Since I joined in 2014 I have gradually engaged to greater extent. During this time I’ve found the small snippets of information displayed before my eyes to be a highly suitable source to binge on science news. Of course this has required me to be selective about the people or groups that I follow not to be swamped by general news from the masses. To this end, my news feed features all of those things I find exciting or interesting around science, with the occasional fun thing thrown in. On a daily basis I read new findings from the fields of biochemistry, protein science and, of course, parasitology. Admittedly, I confess the majority of my feeds are related to the later topic, with my screen regularly filled with amazing images and video snippets of parasitic worms and the cool things that they do.
As I have posted new research coming out of my research group around the biochemistry and molecular biology of parasitic worms it is highly addictive to see other social media users interacting with these instant updates of research. Whether it is an update on the role of a glutathione transferase (GST) or an insight into the fatty acid binding protein family I am continually excited to both promote our research and simultaneously interact with scientists across the world. One of the best aspects of this is that the scientists are from all stages of their careers from undergraduates through to established and highly respected Profs. A second confession of this blog is that a lot of scientists I follow are established collaborators and there may be the suggestion that we are self-promoting our work to one another. This aside, it is great bringing new science to the world (or at least the 400 followers I have) and seeing the progress of others – especially satisfying interacting with undergraduates succeeding with their first protein gels or postgraduates uploading SDS-PAGE ‘Gel-fies’.
However, I had never thought of social media as a platform to establish new connections and collaborations…until today. After being delighted with image after image of wriggling Parascaris parasites (worms of horses that are approximately 15-20 cm in length) I was compelled to contact the owner of these worms…a fellow wormer based across the pond in the USA. I have a particular passion to collect some Parascaris to provide one of my PhD students some samples to extract their GSTs and compare with related worms. This fellow worm guru was one of those people you follow in case something interesting pops up on their feed…and sure enough. It did! I have since established formal communications/collaborations and will be waiting on a shipment of worms to arrive in the post. What a nice gift for my PhD student! The wonders of establishing social media collaborations…

Post by Dr. Russ Morphew

Do We Use Science at Home?

Do We Use Science at Home?
Before Easter we had a lab meeting, where everyone presents a paper they found interesting. There were some interesting bioinformatics ones, some yeast genomics, the usual high impact or highly relevant papers. I, on the other hand, chose a different type of paper. It was published in the British Food Journal and it was analysing the quality of messages of food safety in cook books. With Easter just around the corner and the knowledge that likely meals would be turkey or lamb based, I tried to strike up discussion about what kind of science people use at home.

Are you as careful with your knife that has just cut up raw chicken breast as you are with that pipette tip that was just in a tube of *insert sample type here*? Do you wipe down your surfaces before and after with *insert choice of antimicrobial surface cleaner here* like you do in the lab? Is there the need to be just as careful?

I suppose the answer to that depends on what you are doing. If the meat is about to be cooked through in the oven or a frying pan, then no, probably not. If you are going to be using the same knife and chopping board that just cut up meat to then cut off a hunk of cheese to nibble at whilst food is cooking, then yes, probably (although you do have an immune system, so raw meat does not always equal infection).

But this is all from the eyes of a microbiologist. So how do non-scientists get this information? This article highlighted sources such as education, cook books and recipes, friends or family, TV shows and media, and government organisations as sources for safe food preparation advice. A survey in America found that up to a third of participants used cookbooks and recipes for this. You’d think that this would be good, as surely the authors of the books would know safe ways of handling food prone to contamination. However, this study went on to analyse recipes that contained risky foods (meat, raw eggs, fish etc), and searched for safety messages which placed the recipe into a ‘correct’ or ‘incorrect’ category.

They found that many recipes used subjective doneness indicators, such as colour of meat, flakiness or falling off of bone. Some also used ‘Unusual language to explain doneness included “meltingly,” “soft curds,” and “totally done.”’
This made me think of all the recipes I had ever used. I don’t own a thermometer for measuring food and I don’t remember ever seeing the need for one when using a recipe. Next time you read a recipe, look out for the kinds of safety information and advice it gives you. Is it subjective? Does it use odd language? Does it advise you on preventing cross-contamination? Hopefully it does, or if not, your inner-scientist will kick in and you will avoid tummy problems!

You can find the paper here: http://dx.doi.org/10.1108/BFJ-02-2017-0066

Katrina Levine, Ashley Chaifetz, Benjamin Chapman, (2017) “Evaluating food safety risk messages in popular cookbooks”, British Food Journal, Vol. 119 Issue: 5, doi: 10.1108/BFJ-02-2017-0066

by Jess Friedersdorff

Distractions and procrastination.

Distractions and procrastination.
There are lots of things to be embraced about being a Principal Investigator in Higher Education. You are free to direct your own research, you can be creative when devising your teaching sessions, and you can indulge your curiosity and passions, for example through public-engagement or immersing oneself in the literature.
But everyone knows that there is also the less enjoyable side to academic life – marking, ticking off marking criteria, providing student feedback, filling in marks moderation forms, attending exam boards – in general, the auditing and administration of mark-awarding.
These are things that need to be done to keep the external examiners happy, but they seem to have little obvious direct impact on the education of students.
And although ‘important’, they are tedious. So tedious that many, including myself, would succumb to any temptation to procrastinate during marking season.

At the best of times I love a good dataset to pore over – they usually jump right to the top of my ‘to do’ pile. But it’s heart-breaking when they arrive during marking season, when I’m most prone to distraction and procrastination, and yet subject to tight deadlines to get the mark-awarding paperwork completed.
So why do all the best datasets arrive during that marking season?
This marking season I’ve received ten genomes of novel bacterial isolates, the results of antimicrobial activity assays for 25 novel compounds, and a large set of transcriptome analyses, all of which need urgent analysis.
It’s like being a modern Tantalus, desperate to reach up to open those spreadsheets of insight and start analysing, while the chains of administration keep you grounded with moderation forms and marksheets.
So instead, I do neither and write a blog post.

Post by Dave Whitworth

Invasive Weed Species as a Source of Antimicrobials – Making the Best of a Bad Situation

Invasive Weed Species as a Source of Antimicrobials – Making the Best of a Bad Situation

Humans have always been dependant on nature to cater for their basic needs such a food and shelter but also for medicines. Initially medicines were in the form of crude treatments such as tinctures, teas, poultices, powders and other herbal formulations. The specific plants and methods of applications were originally passed down through oral history untill the information were recorded in herbals. In more recent history the use of natural products as medicines involves the isolation of active compounds [1]. The first active compound to be isolated in this way was morphine from opium by Friedrich Setürner in 1804 [2]. Drug discovery from plants also led to the isolation of many early drugs such as cocaine, codeine, digitoxin and quinine; some of which are still in use today. Due to the vast diversity of natural products ranging from teraestrial plants to marine organisms also incuding microorganisms and their infinite possible applications the isolation and characterisation for medicinal purposes continues today.
Plants have been the single most productive source of leads for the development of drugs, particularly as anti-cancer agents and anti-infectives [3]. Eventhough natural products have been a plentyful and continuous stream of useful drugs their use has dimished in the past two decade due to the major pharaceutical companies deminishing their interest in natural products. Due to slow nature of natural product discovery and its incompatiblity with high throughput screening (HTS) directed at moleculat targets [4]. Many large screening collections have been dissapointing in practice (these libraries containing a range of compounds from many different sources) natural products are the most diverse class of compounds with a significantly higher hit rate compared to fully synthetic and combinatorial libraries [5]. Furthermore, it has been shown that 83% of core ring scaffolds that are present in natural products are not present in commercially available screening libraries leading to fewer drug leads [6]. It is unsurprisingly that even with the introduction of new methods and technologies natural products have contributed massively to the drugs which have been approved in recent years (see Fig.1).

Figure 1: Contributuion of Natural Products to Approved Drugs between 1981-2010; n=1355. (Adapted from Newman and Cragg 2012 [7])

My PhD funded by the Life Sciences Research Network Wales (http://www.lsrnw.ac.uk/). The project is based on the discovery of antimicrobal compounds form invasive weed species. Invasive non-native weed species are a significant global concern. These are resposible for a loss of biodiversity, altering ecological processes, impacting ecosystem services resulting in a cost of $35 billion annually in the USA [8-10]. If antimicrobial or any bioactive compounds could be sources from these problematic plants then we could at least draw one positive from their unwanted presence within our environment. This project includes the traditional extraction, isolation and characterisation of active compounds form plants followed by biological assays to test a range of biological activites of the compounds extracted. These techinques are also combined with the genomic and bioinformational approaches to aid and improve drug discovery. A wide range of plants were selected for this study and a range of compounds have been extracted from each with a range of interesting biological activites; especially antimicrobial activity. The most active plants tested were Japanese knotweed and Himalayan balsam.

image 2
Resveratrol was found to be the most active antimicrobial compounds present in Japanese Knotweed. This compounds is also found in spermatophytes, such as grapevines and has been linked to a wide variety of biological activites. It has been reported to have antioxidant, anticancer, anti-inflammatory, prevent post-menopausal bone loss, and a range of positive metabolic effects. Resveratrol has also been suggested as the causal link between increased red wine consuption and decreased risk of heart disease [11].
A key compound has been found in Himalayan balsam which is by far the most potant antimicrobial compound in all the plants studied. It has a minimum inhibitor concentration of between 3-15 µg/mL agaisnt a range of Staphylococcal species. This compound has also been found to be non-toxic against mammlian cells. Similar compounds have also been show to have anti-cancer and anti-fungal activity.
The mode of action of these compounds are currently being elucidated using genomic, metabolomic and proteominc approaches combined with novel assays and cytometric techniques. In addition to this I aim to improve the activity of these compounds using computer aided drug design (CADD) through the Life Sciences Reseach Network Wales CADD Platform (http://www.lsrnw.ac.uk/platform-technologies/welsh-computer-aided-drug-design-cadd-platform/).
Natural products have been a source of drugs which have revolutionalised treatment of disease. It is clear that natural sources will contiune to play a significant role in the fight against disease and should be combined with new inovative methods which are currently being developed to form a multidisciplinary approach to treat disease.

1. Balunas, M.J. and A.D. Kinghorn, Drug discovery from medicinal plants. Life sciences, 2005. 78(5): p. 431-441.
2. Schmitz, R., Friedrich Wilhelm Sertürner and the discovery of morphine. Pharmacy in history, 1985. 27(2): p. 61-74.
3. Harvey, A.L., Natural products in drug discovery. Drug discovery today, 2008. 13(19): p. 894-901.
4. Harvey, A.L., R. Edrada-Ebel, and R.J. Quinn, The re-emergence of natural products for drug discovery in the genomics era. Nature Reviews Drug Discovery, 2015. 14(2): p. 111-129.
5. Sukuru, S.C.K., et al., Plate-based diversity selection based on empirical HTS data to enhance the number of hits and their chemical diversity. Journal of biomolecular screening, 2009. 14(6): p. 690-699.
6. Hert, J., et al., Quantifying biogenic bias in screening libraries. Nature chemical biology, 2009. 5(7): p. 479-483.
7. Newman, D.J. and G.M. Cragg, Natural products as sources of new drugs over the 30 years from 1981 to 2010. Journal of natural products, 2012. 75(3): p. 311-335.
8. Simberloff, D., et al., Impacts of biological invasions: what’s what and the way forward. Trends in ecology & evolution, 2013. 28(1): p. 58-66.
9. Hulme, P.E., et al., Bias and error in understanding plant invasion impacts. Trends in ecology & evolution, 2013. 28(4): p. 212-218.
10. Pimentel, D., R. Zuniga, and D. Morrison, Update on the environmental and economic costs associated with alien- invasive species in the United States. Ecol. Econ., 2005. 52(3): p. 273-288.
11. King, R.E., J.A. Bomser, and D.B. Min, Bioactivity of resveratrol. Comprehensive Reviews in Food Science and Food Safety, 2006. 5(3): p. 65-70.

Post by Dai Fazakerley.
Dai is a PhD student with Prof. Luis Mur and is one of our Biochemistry BSc graduates.

Workarounds, KISS and the dangers of overcomplicating things

Workarounds, KISS and the dangers of overcomplicating things


Around 6 months ago I was asked to look into the program PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) for a data set produced by an amplicon sequencing run on the Ion Torrent. Basically, PICRUSt takes a 16S OTU table that has been classified (taxonomically, not some top-secret government thing, unfortunately) and uses information from sequenced genomes of known or closely related organisms to predict the potential genomic and metabolic functionality of the community identified in the 16S dataset. As complex as the process sounds, in reality, it actually only consists of three steps: Normalisation of the dataset by 16S copy number, this corrects for any potential under/over representation of functionality due to variation in the number of copies of the 16S gene in different bacterial genomes; prediction of the metagenome, basically multiplying the occurrences of functional genes, in this case KOs (KEGG orthologs), within the known genomes by the corrected OTU table from the previous step; finally categorising by function, collapsing the table of thousands of functional KOs into hierarchical KEGG pathways:


Despite my background in sequencing, population genetics and phylogenetics, and having learned/taught myself many different analysis packages and programs over the course of my career, and having a solid, reliable, method of producing and analysing OTU tables from the data obtained from the ion torrent and other sequencing platforms, I’ve never considered myself as a bioinformatician. But four steps should be easy enough… right?

The little yellow arrow in the workflow now represents around 2-3 months of probably the steepest learning curve I have ever ventured on to.

OTU tables are simply a table of counts of OTUs (operational taxonomic units i.e. species/observations etc.) for each sample within your dataset. Despite their simplicity, the method used by myself and others in the research group to construct the OTU table was different to that in the online PICRUSt guide and the information contained therein, also different. I could already sense the increase in learning curve gradient, but carried on forward anyway not quite realising the dangers that lay ahead!

Operating system (UNIX/Windows) cross-compatibility, version control, system resources, version control, new programming languages, version control, more system resources, overloading system resources, complete system failure, starting from scratch again, installation-and-compilation-of-new-algebraic-libraries-for-your-systems-mathematical-calculations, version control, new programming languages, manual editing of enormous databases, scripts, packages and version control. These points are a hint at some of the processes I went through and the problems I had to deal with, in the creation of what I now call ‘The Workaround’. I still don’t consider myself a bioinformatician.

‘The Workaround’
The workaround consists of a small number of R scripts and processes for the reformatting and production of new files that are so simple they belie the amount of work that went into producing them. The steps are straightforward enough that anyone with any sort of experience of working with OTU tables or sequencing data should be able to complete them. The entire workflow is robust and repeatable and I have since worked with a few different ways of visualising and representing the data for publication.

pic 2
Using STAMP to identify SEED subsystems which are differentially abundant between Candidatus Accumulibacter phosphatis sequences obtained from a pair of enhanced biological phosphorus removal (EBPR) sludge metagenomes(data originally described in Parks and Beiko, 2010).

PICRUSt appears to becoming an ever more popular tool in the analysis of microbiomes and one that compliments many of the studies and analyses already performed by many of the members of our research group. I am currently in the process of writing up ‘The Workaround’ into a step-by-step guide to be placed on the bioinformatics wiki for anyone to access but in the meantime if anyone would like to speak to me about the possibilities of applying this type of analysis to any existing or future experiments I’m more than happy to help!

Post by Toby Wilkinson.
About Toby:
I am a postdoc and perpetual resident of Aberystwyth, having come here as an undergrad in 2002 to study Zoology, worked through a PhD in parasitology starting in 2005 and then holding various positions as technician/research assistant/PDRA since, I’ve never quite been able to bring myself to leave Aberystwyth. Over the last few years I’ve worked in various roles in the Herbivore Gut Environment group working on the microbiome of ruminants building up my experience in NGS and bioinformatics, and more recently with Sharon Huws on the further characteristaion of novel antimicrobial peptides, but also continuing work in NGS and the study of the dynamics of various bacterial communities in a number of environments.

Why Are Some Virus Capsids So Geometric?

Why Are Some Virus Capsids So Geometric?

Keywords – viral capsid, assembly, symmetric, geometric, icosahedron, subunits, pentamers, hexamers

phage heads
Upon doing some literature searching for phages, I came across a paper written in 1967 on the topic of ultrastructure of phages. Scrolling through, there were some subjectively pretty microscopy images of infecting phages and other diagrams. The diagram of phage head shapes in particular caught my eye. I began to think about how nice and satisfying the symmetry and geometry is in bacteriophages. I then wondered why phage heads have this characteristic and what advantages it has.

The general structure of a phage head:
A phage head is formed of either 2 or 3 parts. All freshly formed virions have a core of genomic material. This could be double or single stranded RNA or DNA. This is surrounded by the capsid. This is a proteinaceous coat, formed of number of identical subunits, which may be formed by even smaller molecular subunits. These subunits are called capsomeres [1].

Why the patterns?
It seems I am not the only one to question this, and in fact the question was almost fully answered by Crick and Watson back in the 1950s. They were studying small viruses, and hypothesised that the virus requires the protein coat to protect its genomic material. The best and most efficient way to do that is to form the coat from lots of small identical molecular subunits. These are then easier to produce when inside the host cell than say, one or two large molecules. These subunits also have the added advantage that they can only arrange themselves in so many ways around the core to create a shell [2].

So, this explains why capsids tend to have such a regular shape. But what other advantages does this confer?
Perhaps this can be put down to evolutionary adaptations. The easier the molecule is to reproduce; the more virion offspring are produced. It also makes sense that identical subunits can only attach in so many ways, and that this results in a pattern, seen in all those offspring. But this formation seems to be like puzzle pieces, in that this method does not require any energy [3]. This is ideal for spontaneous formation of virion capsids in the host cell, as the virus does not have to concern itself with sequestering energy from elsewhere.

Phage heads tend to take on the shape of a platonic body; one of five regular shapes, of which only octahedrons and icosahedrons have been seen using microscopy. Icosahedrons, as seen in the image below, are commonly seen in phages, and are 20 sided shapes with 12 vertices. This geometry offers stability and strength [4].

Another way is to look at this from the genetic material’s point of view. It needs to be protected from stray or attacking enzymes and needs a way to nicely organise itself. These structures not only allow for nice neat organisation of the genome, but also by doing this, it can create secondary characteristics. For example, it is thought that some regions can take on translation and replication roles, brought about by the way that the nucleic acids are stored within the capsid, allowing them to form double-stranded loops, such as in Leviviridae viruses [5].

It seems that these phage heads are well adapted to their purpose. I’m sure we can all agree that they are very clever and the stuff of nightmares!!!


Post by Jessica Friedersdorff.

[1] Bradley DE. Ultrastructure of bacteriophage and bacteriocins. Bacteriol Rev 1967;31:230–314.
[2] Crick FH, Watson JD. P1-Structure of small viruses. Nature 1956;177:473–5.
[3] Bruinsma RF, Gelbart WM, Reguera D, Rudnick J, Zandi R. Viral self-assembly as a thermodynamic process. Phys Rev Lett 2003;90:248101. doi:10.1103/PhysRevLett.90.248101.
[4] Mannige R V., Brooks CL. Periodic table of virus capsids: Implications for natural selection and design. PLoS One 2010;5:1–7. doi:10.1371/journal.pone.0009423.
[5] Morais MC. Breaking the symmetry of a viral capsid. Proc Natl Acad Sci 2016;113:201613612. doi:10.1073/pnas.1613612113.

But what does it mean?!?


But what does it mean?!?

During practical-based modules, I often ask undergraduates to start their practical reports with a statement of their hypothesis. This usually throws them into a mild panic, as class practicals are primarily about generating data rather than proving/disproving a hypothesis and they cannot easily negotiate that apparent disparity.

The relationship between data-generating and hypothesis-driven research is a troubled one. Twenty years ago, a loud and often-heard cry of the experimentalist after a ‘big data’ or ’-omics’ talk was ‘but what IS the hypothesis?’. Testing a hypothesis was the mantra of every bench scientist, and even today some funding agencies and scientific publishers still insist on placing hypotheses front and centre of all submissions.

But what was the hypothesis being tested when the E. coli genome was sequenced? Should we look down our noses disapprovingly at the humble genome, denigrated as a mere ‘fishing trip’, or ‘stamp collection’, because of its lack of a noble hypothesis? Do we emulate my poor undergraduates and struggle valiantly to find a hidden rationale behind the data-collecting exercise and justify its existence? Or should we celebrate the diversity, abundance and scale of the datasets that we can now generate, with or without accompanying hypothesis?

We can’t all be the ones to discover the next cure for cancer, or the novel antibiotic to which there is no possibility of resistance. However, we can all contribute resources to aid those explorers in their search. Those resources can be new knowledge, acquired through the steadfast testing of hypotheses, or they can be collections of datasets, alongside the tools and knowhow to interrogate those data.
The genome is the ultimate blueprint of an organism’s biology, however we have barely begun learning how to look inside a genome, and from its sequence deduce salient features of the host’s biology. Hypothesis-led experimentation is one way to improve our understanding of the sequence/function relationship, and now increasingly we find ourselves testing hypotheses that have themselves come directly from big datasets.

In essence, big datasets are trying to tell us everything we want to know, but to get there we need to find out what questions to ask, for which they are the answer.

Post by Dr. Dave Whitworth.