## The Use and Misuse of p-values in Biology

By John McLaughlin

After completing an experiment, most of us dutifully perform statistical tests to determine whether our results are “significant.” These tests heavily determine whether experimental findings are considered robust, interesting, and publishable. P-values are commonly used to report statistical significance in the biology literature, but biologists have been chastised in recent years for misunderstanding and misusing this statistic. Underscoring this problem, a recent paper in PLOS Biology surveyed the scientific literature and found widespread evidence of  “p-hacking”, or the manipulation of experimental parameters, such as sample size and the removal of outlier data points, for the sole purpose of obtaining statistically significant p-values.

What is the precise definition of the p-value, as it is most commonly used in biological research? It is important to note that there are several different interpretations of the concept of “probability”, perhaps the two most notable belonging to the Bayesian and Frequent schools of statistics. According to the Bayesian approach (developed by 18th century mathematician Thomas Bayes), probability is best thought of as the likelihood of a particular outcome, given our prior knowledge of the situation in addition to newly acquired data. To give a commonplace example: when searching for a lost set of keys in your home, you will want to estimate the probability that they are in a given location — most likely by remembering previous occasions that the keys were lost and where they were recovered. This “prior” knowledge will factor heavily into your probability estimate. You can then contribute new data to update this probability estimate, for example if it is known with certainty that the keys are not in one of these locations. The Bayesian interpretation of probability accords more with our common, everyday usage of the term.

However, the understanding of probability that dominates in the biological sciences is known as Frequentism; most p-value statistics in biological research are computed using this school’s methods. According to frequentist statistics, the probability of a given event is simply the frequency with which it occurs. To give a simple example: If a coin is flipped 100 times and lands “heads” on 58 flips, the probability of the coin’s landing heads is 0.58. Presumably, as the number of coin flips approaches infinity, the observed frequency of heads will approach the “true” probability of 0.5. Frequentism is based on the notion that repeated randomized trials, or experiments, will in the long run approximate the true probability of an event.

When running an experiment in the lab, a biologist may want to know the probability of her hypothesis being true, given the experimental data she observes. A p-value calculated using a standard t-test, however, would tell her the converse of this: the probability of observing the experimental data, given the null hypothesis being true. A common experimental  “null hypothesis” is a statement of no relationship between the variables under observation (e.g. the means of two data sets are roughly equal). The p-value is therefore the probability of observing the experimental data or a data set more extreme, when assuming that this null hypothesis is correct – a lower p-value makes a stronger case to reject this null hypothesis.

There are a few things that the p-value statistic definitely does not tell a scientist. First, do experimental results with a low p-value tell a scientist that her hypothesis is correct? No. Rejecting the statistical null hypothesis is not equivalent to accepting her particular biological hypothesis. Is the p-value the probability that the null hypothesis is correct? Again, no. Biologists and statisticians use the term “hypothesis” very differently. When the statistician and evolutionary biologist Ronald Fisher popularized use of the p-value in the 1920s, it was never intended as a metric for confirming or refuting biological hypotheses. It was meant to be a general heuristic for judging whether a data set might warrant a second look or follow-up experiments; the p-value itself does not decisively settle any experimental questions.

What should researchers do to avoid p-hacking? One recent paper on this topic recommends choosing the experimental sample sizes in advance, detailing the removal of any outlier data points, and allowing other researchers access to the raw data. P-value statistics can be useful when employed properly, but they are not the whole story. As scientists face continued pressure to report “significant” findings and publish in high-tier journals, understanding procedures for proper data interpretation will be increasingly important. Hopefully, the trend towards open access publication will encourage greater transparency and scrutiny of experimental data reporting, along with a better understanding of p-value statistics and their applications.

## From Bed[side] to Bench: Involving Patients and the Public in Biomedical Research

By Celine Cammarata

Many of us doing biomedical science never really see patients, the very people our work will hopefully one day help. But what if we did – what if those individuals who will eventually be using our research on a daily basis were in fact involved in the work from the start? How would research change?

This is the concept underlying the movement toward Patient and Public Involvement or PPI, a title that (logically enough) refers to efforts by researchers and institutions to engage patients and members of the public in the process of biomedical research and, in doing so, fundamentally change the way scientific information is created and disseminated. Traditionally, the flow if information between science and society was seen as relatively unidirectional, with researchers passing scientific knowledge down to an uninformed, receptive public. More recently, however, there has been a growing recognition that information flow from the end-users of research back to investigators is also critical.

One way to accomplish this is to directly incorporate those users – broadly defined as patients, caregivers, members of the public rather than clinicians or practitioners – into the research process. A prominent definition of PPI is “research being carried out ‘with’ or ‘by’ members of the public rather than ‘to’, ‘about’ or ‘for’ them” (INVOLVE). Individual instances of PPI can be quite variable, though most engage users in some form of advisory role, often through interviews, surveys, focus groups, and hosting users alongside researchers on regularly-meeting advisory groups (Domecq et al., 2014). PPI is represented at all stages of research, from inception of project ideas through the data collection process to implementation of findings and evaluation and is most prevalent in research that is either directly related to health or social issues and services.

A primary driving force behind PPI is the belief that input from users will push research toward questions that are more relevant to those users. Individuals with first-hand experience of an illness or other condition are thought to hold a particular kind of expertise and therefore able to craft more immediately relevant research questions than an academic investigator in the field might.

One important stage at which patients and the public are having an impact is by working with funding agencies to establish research priorities. For instance, the UK’s NHS Health Technology Assessment program involves users alongside clinicians and researchers in the development and prioritization of research priority questions. Members of the public were engaged in several different stages of the process, from initial suggestion of research ideas through to selecting topics that would be developed into solicitations for research. Analysis revealed that overall these lay members exerted an influence on the research agenda approximately equal to that of academic and clinical professionals (Oliver, Armes, & Gyte, 2009).

PPI can also increase the relevance of individual studies, with specific examples including: users of mental health services shifting outcome measurement in a study of therapies to improve cognition away from psychological tests in favor measuring performance on daily activities; the investigation of environmental factors such as radiation, which researchers originally considered negligible, in a study of breast cancer; and the development of new assessment tools to measure the mental and psychological condition of stroke victims in a study that initially planned to focus only on physical health outcomes (Staley, 2009).

Users may express particular suspicions or hunches about their condition that they believe should receive further investigation, may increase pressure on investigators to clearly state how their work will contribute to the public, and may challenge whether a project is even conceptualized in a way relevant to those who experiencing the situation in question, helping to determine whether a research problem is truly a “problem” at all. An excellent example of the impacts of PPI in research commissioning is the Head Up project, an entirely user-driven project in which patients with motor neuron disease working with one of the CCF programs pushed for research on an improved supportive neck collar.

PPI may also help increase the up-take of research findings because user’s are generally able to relate to and communicate with other users and practitioners in a uniquely meaningful way. Patients and members of the public may help to write up study findings, present at conferences or, importantly, bring findings directly to the user community.

Of course, nothing comes without a cost. A number of challenges in conducting PPI have consistently been identified, including: insufficient time and funding; tension over roles on the project and difficult relations between academic researchers and users; lack of training for both users and researchers; and a tokenistic attitude toward PPI on the part of investigators. Still relatively little is known about the precise effects of PPI or best practices. However, these are active areas of scholarship. Also of note is the relative lack of PPI in basic science research; PPI is predominantly relegated to applied health and social research. An important step in furthering PPI would be to establish who the “users” of basic research are, whether PPI in basic research is likely to be beneficial, and how the practice could be implemented.

Overall, it is clear that the end-users of research can be incorporated into setting the research agenda, designing studies and communicating results, and suggests that such user engagement can increase the relevance of research and the dissemination and adoption of findings.

## Open access: the future of science publishing?

### By Florence Chaverneff

On the eve of receiving the Nobel Prize in Physiology or Medicine in 2013, Randy Schekman shook the scientific world in an altogether different manner when he announced in the Guardian newspaper he and his group would boycott the three leading scientific journals. These bastions of scientific publishing have long been held on a pedestal by the research community the world over and regarded as depositories of excellence in science. Their reputation is tightly associated with high ‘impact factors’, a parameter determined by article citations, and which Schekman judges to be a “gimmick” and a “deeply flawed measure, pursuing which has become an end in itself – and is damaging to science”. Yet, career advancement in academic research is heavily – if not exclusively– reliant on individuals getting their work published in these high impact scientific journals, which Schekman calls “luxury journals”, comparing them to bonuses common on Wall Street, and from which “science must break [away] “. He deems that “the result [of such a change] will be better research that better serves science and society”. The Nobel Prize awardee touts the open access model for scientific publishing, presenting it as all-around anti-elitist, which…it is.

In 2001, the Budapest Open Access Initiative defined open access for peer-reviewed journal articles by its “free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself”.

This is how open access makes for a more level playing field: by allowing immediate dissemination of scientific findings without restrictions, and by accepting articles without highly demanding criteria, while maintaining sound peer-review practices. This comes in sharp contrast to the 300 year old model of subscription-based scientific publishing, accepting limited numbers of articles in each issue, and requiring exceedingly demanding standards for acceptance. This results in significant publication delays and considerable time effort spent polishing articles for publication. Time which could be spent… doing research.

While many in the community will agree on the benefits granted by this still recent and evolving model of science publishing, open access journals, being less established than older household names, and lacking in their majority an impact factor, may not appear as prime choice for researchers. The question then can be posed: what would it take to bring about a shift in attitudes where open access publishing would be favored? Granting agencies and academic institutions, which contribute to setting the standards for scientific excellence need to start being more accepting of non-traditional models of scientific publications, and judge on quality of research, and not solely on journal impact factor. National policies encouraging open access publishing are also paramount to support such a shift. Moves in that direction are underway in the UK with a policy formulated by the Research Councils, and in the European Union with the Horizon 2020 Open Research Data Pilot project, OpenAire. In the US, the Fair Access to Science and Technology Research Act and the Public Access to Public Science Act aiming “to ensure public access to published materials concerning scientific research and development activities funded by Federal science agencies”, if passed, would be a step in the right direction.  All else that is needed might be a little time.

## How the Flintstones can Help the Jetsons: History Lessons for Modern Medicine

### By Lori Bystrom, PhD

Many of us look forward to a future of convenience with magical gadgets and miracle cures, perhaps something akin to the lifestyle of the cartoon characters on The Jetsons. The show’s optimistic portrayal of the future depicts our fascination with modern technology – an interest that stems not only from our desire for new and improved modes of transportation and communication, but also from our desire for new and better medicine.

The future of medicine may seem promising, but understanding the past may be vital for making medical dreams come true. Just as the stone-age characters from The Flinstones are capable of helping the futuristic characters of The Jetsons fix their time machine (see The Jetsons Meet The Flinstones clip from 1:00 to 1:17), so too can our long-departed ancestors help us in ways that will benefit us in the future (perhaps in less barbaric ways than hitting something with a club). In other words, medical advancements, although conventionally based on research using modern technology, can also be derived from medical information of the ancient past.

Nowhere is this better exemplified than in the recent discovery of a plant-based eye infection remedy found in a 1,000 year old medical text. This finding was recently presented at the British Society for General Microbiology Annual Conference by researchers at the University of Nottingham in England and Texas Tech University in the United States. They found that the 9th century Anglo-Saxon book, known as Bald’s Leechbook, contained a remedy for an eye infection that consisted of a mixture of garlic, onion or leeks, wine, and bile (from cow’s stomach) that was boiled and fermented in a brass vessel. Amazingly, the recreation of this ancient remedy proved to be effective against the resilient methicillin-resistant Staphylococcus aureus (MRSA), both in vitro and on wounds. In fact, it was found to be more effective than one of the antibiotics (vancomycin) currently used to treat the modern day superbug (see this article). Although clinical trials need to be conducted to confirm the beneficial effects of this medicinal preparation, this is an extraordinary start for a potential drug.

Should we be surprised that some of these ancient remedies actually have therapeutic value? Back in the day, when clinical trials did not exist and ethical practices were not necessarily enforced, there was probably a great deal of trial and error as people tried medicines on each other. The only medicines that were recorded were probably those that worked, while ineffective treatments may or may not have been noted. Interestingly, some of the traditional medicines may have been inspired by how animals treated their ailments (an area of study known as zoopharmacognosy). There also may have been minimal repercussions for failed treatments (no lawsuits?), and therefore maybe more freedom for finding medical cures. Moreover, if a treatment was found to be effective nobody probably had to wait for approval from any organization such as the Food and Drug Administration (FDA).

Regardless of what happened in the past, it is apparent there are valuable lessons we can learn from our ancestors. For instance, the ancient practice of fecal transplantation is now gaining acceptance in modern medicine. As far back as the 4th century, Ge Hong, a traditional Chinese medicine doctor, used fecal material to treat his patients with food poisoning or severe diarrhea. Just recently, the FDA approved the use of fecal transplants for specific gastrointestinal problems. The use of leeches for the treatment of venous congestion, among other ailments, is another example of modern medicine embracing old technology (see this article). There are numerous conventional medications that also have roots in the distant past (e.g. aspirin). Any book on the history of medicine will provide more information on this subject matter.

All of these examples suggest that medical research is limited if it turns a blind eye to the past. Moreover, the medical community needs to address the polar opposite views on traditional/natural medicines: those that think all natural products/traditional remedies are safe and those that think all traditional medicines/natural therapies are inherently bad. What it really comes down to is what is effective and not what resonates better to different patients or doctors. More scientific research needs to assess whether these treatments are safe and effective, while identifying those that may be snake oil. The journalist and information designer, David McCandless, beautifully illustrates some of these differences on his website.

Modern medicine should keep an open mind while researchers continue to investigate ancient remedies and screen out the good from the bad. It is appropriate that a small division of the National Institute of Health, known formerly as the National Center for Complementary and Alternative Medicine, was renamed as the National Center for Complementary and Integrative Health. Unconventional or traditional medicines that are effective are not the ‘alternative’, but perhaps the best option or one that can be integrated with other medical treatments.

As we move forward in medicine, we might want to keep digging up the past so we are prepared to combat new diseases and improve current treatments. The future of medicine may just need, as George Jetson puts it nicely, “a little stone-age technology.”