From 1 to 100,000 human genomes: the challenges faced



Sophia David

In 2003, the Human Genome Project was completed and the final version of the first human genome sequence became available as a free scientific resource. Having taken 13 years to complete, the total cost of the project was US$2.7 billion. Since then, rapid technological advancements have allowed genome sequencing costs to plummet and, just ten years later, a human genome can now be fully sequenced for less than $10,000. Moreover, the process takes just days, or even hours. Costs are expected to fall even further to just $1000 per genome within the next few years.


Of course, as the costs have fallen, scientists have sequenced an increasing number of genomes.  The 1000 Genomes Project, launched in 2008, has allowed scientists to characterise the variation within human genome sequences at a high resolution.


Remarkably, just tens years after the first human genome was sequenced, several projects are now set up with ambitions of sequencing 100,000 human genomes. One such project is the Personal Genomes Project, set up back in 2005 by George Church, a Professor of Genetics at Harvard Medical School. Although initially a US-based project, it is expanding and now also has branches in the UK and Canada. Combined, they hope to sequence 100,000 volunteers over the next ten years.  The data will be freely available for all scientists to use.


Meanwhile, the UK government have also pledged £100 million to sequence the genomes of 100,000 patients of the NHS (National Health Service) by the end of 2017. The data obtained from this project will be for primarily clinical use, making the UK one of the first countries to be moving genomics into the clinic on a large-scale. The UK government has set up a company called Genomics England that is responsible for undertaking this ambitious task.


While the translation of genomic data into useful clinical information has been slower than once expected, genome sequencing is undoubtedly impacting on medical diagnoses and treatments. For example, by examining the mutational changes within a patient’s tumour cells, clinicians are able to better characterise some cancers and consequently provide a more appropriate treatment.


However, the developments in genome sequencing have not come without their challenges. Currently, a major debate revolves around how open or private genomic data should be. On the one hand, sharing data is hugely important for enabling scientific discoveries. Thousands of studies have been published as a result of freely available data made by the Human Genome Project and 1000 Genomes Project. On the other hand, a whole genome sequence can reveal a large amount of information about the respective individual, as well as their family members. While such data may be published anonymously with no personal information attached, a study by scientists at MIT shower earlier this year that it is still possible to link genomic data back to the individual using Y-chromosome data and geneology websites. There are fears that genomic information linked to a specific individual could be used in malicious ways.


Scientists at the Personal Genome Project are only too aware of this dilemma and have devised their own solution. They state on their website that, “We feel the most ethical and practical solution to this dilemma is to turn the privacy problem on its head and collaborate with individuals who are willing to share their data publicly with the understanding that re-identification is possible.”


Thus, data produced by the Personal Genome Project will be freely available to everyone. However, in order to take part in the study, participants must pass tests to prove that they fully understand the risks of having their genomic information shared with the world before taking part.


Meanwhile, genome sequence data from NHS patients will not be publicly available and instead be stored inside the NHS firewall. It will be linked to patient records for clinical use, and anonymised data will also be available in a restricted place for scientists to use for research purposes.


Another potential problem that arises from the analysis of genomic data regards how incidental or secondary findings are managed. These could occur when a genome sequence is used to answer a particular question about a patient’s health but something unrelated and unexpected arises. While incidental findings are nothing new in medicine, the risks of such occurrences in genomics are probably higher than most areas of medicine. Earlier this year, the American College of Medical Genetics and Genomics released their recommendations on incidental findings that occur through genome sequencing. They suggest that all labs performing clinical sequencing should test for well-studied mutations in 57 genes that have a strong association with disease. These include BRCA1 and BRCA2 mutations that are linked to hereditary breast and ovarian cancers. They believe that people should not be able to opt out of knowing these results unless they refuse clinical sequencing.


There is also the risk that genomic information could actually cause harm to patients through misdiagnoses, or that clinicians or scientists could fail to identify clinical useful genetic variants.  However, it is likely that these risks will diminish as more genomes are sequenced and clinicians and scientists gain experience in the application of genomic data to medicine.


Finally, one of the great challenges may lie in managing expectations. While the last decade has seen remarkable progress in genomics, the application of genomics to medicine will be a much longer road. Politicians must understand that they will not see a quick return on their investment and patients offered genome sequencing should not always expect a straightforward cure. And lastly, clinicians and scientists should not expect to see medicine transformed overnight by genomics.