Menu

Big data: The impact of the Human Genome Project

The Human Genome Project led to a paradigm shift in the way science is conducted and data is shared, says Rehma Chandaria.
Guest contributor Rehma Chandaria
In 1996, an international group of scientists came together in Bermuda to discuss how sequence data from the Human Genome Project (HGP) should be released. The meeting concluded in the formation of the‘Bermuda Principles’, a set of rules ensuring the data would be immediately shared on publicly accessible databases as it was generated. This ground-breaking accord contravened the conventional practice of releasing data only after publication in scientific journals. It changed the way we see data sharing, and ultimately, changed the way science research was conducted.
Its success demonstrated how a global community of scientists could collectively produce and use data far more efficiently than an individual could. This greatly benefited scientific progress and led to many important new insights and discoveries. For example, information of 30 genes associated with disease was published prior to publication of the draft sequence in 2001.
Recognising its ability to accelerate progress, there is an enormous push for all scientists to make raw data publicly available for others to analyse and use. As a prerequisite for publication or receiving grants, it is becoming increasingly common for journals and funding bodies to insist that data is shared openly.
The tools and infrastructure to support this are improving all the time. Shared data are only valuable if they are searchable and usable. Numerous data repositories for different types of data have been set up. General data repositories, such as Dryad and Figshare, can be used for any data type. The addition of digital object identifiers (DOI) to resources in these public repositories can also make searching for specific entries easier. Furthermore, the DOI means that data can be cited in a way that properly credits the producers of the data. The majority of journals and publishers welcome research articles reporting analysis and conclusions based on previously published datasets with a DOI, for example F1000 Research. Ensuring that the scientists who produce and share their data are rewarded with citations is crucial for encouraging this practice.
The HGP took 13 years and US$3 billion to sequence the first human genome. It is now possible to sequence a human genome in a matter of days for only $1000. This rise of ‘big data’ (the rapid collection of large volumes of information) requires researchers from different backgrounds and specialities to work together. There are now interdisciplinary research centres like Cambridge Big Data, which bring together expertise from laboratory based life science researchers with computer scientists and mathematicians. This is essential for processing, analysing, storing and utilising vast quantities of data.
Con tecnología de Blogger.