But what does it mean?!?

e.coli

But what does it mean?!?

During practical-based modules, I often ask undergraduates to start their practical reports with a statement of their hypothesis. This usually throws them into a mild panic, as class practicals are primarily about generating data rather than proving/disproving a hypothesis and they cannot easily negotiate that apparent disparity.

The relationship between data-generating and hypothesis-driven research is a troubled one. Twenty years ago, a loud and often-heard cry of the experimentalist after a ‘big data’ or ’-omics’ talk was ‘but what IS the hypothesis?’. Testing a hypothesis was the mantra of every bench scientist, and even today some funding agencies and scientific publishers still insist on placing hypotheses front and centre of all submissions.

But what was the hypothesis being tested when the E. coli genome was sequenced? Should we look down our noses disapprovingly at the humble genome, denigrated as a mere ‘fishing trip’, or ‘stamp collection’, because of its lack of a noble hypothesis? Do we emulate my poor undergraduates and struggle valiantly to find a hidden rationale behind the data-collecting exercise and justify its existence? Or should we celebrate the diversity, abundance and scale of the datasets that we can now generate, with or without accompanying hypothesis?

We can’t all be the ones to discover the next cure for cancer, or the novel antibiotic to which there is no possibility of resistance. However, we can all contribute resources to aid those explorers in their search. Those resources can be new knowledge, acquired through the steadfast testing of hypotheses, or they can be collections of datasets, alongside the tools and knowhow to interrogate those data.
The genome is the ultimate blueprint of an organism’s biology, however we have barely begun learning how to look inside a genome, and from its sequence deduce salient features of the host’s biology. Hypothesis-led experimentation is one way to improve our understanding of the sequence/function relationship, and now increasingly we find ourselves testing hypotheses that have themselves come directly from big datasets.

In essence, big datasets are trying to tell us everything we want to know, but to get there we need to find out what questions to ask, for which they are the answer.

Post by Dr. Dave Whitworth.