Results from Uncovering the Genome Mystery Project, The Story of Life in 30 TB

in #gridcoin6 years ago (edited)

Gridcoin Research is a Proof-of-Stake cryptocurrency that rewards research done on the distributed BOINC platform. Today, 26 projects can be attached and gain rewards for work, where World Community Grid is one of them.

Uncovering Genome Mysteries Project

At the end of November, the now completed project released an update about their collected research done by the World Community Grid users, a soaring 30 TB, highly compressed, processed and compared genome data.

Brief Background

The project was launched on the BOINC platform of World Community Grid in late 2014 and ran for two years, ending in November of 2016.

The primary goal of the project is to screen decoded DNA genome data for possible natural protein structures with medical and industrial applications. All living material around us, from microorganism to plants and animals, has over millions of years developed their physical defense against diseases and other outside threats.

With recent years development, scientists have been able to decode DNA from nature, with far less time-consuming and much cheaper methods. As a result, there was a boom of available decoded DNA structures that now needed screening for their potential functions. Each gene has specific sequences of amino acids to assemble it into a molecular chain which is then folded into a protein molecule, also known as a protein sequence. Each of these with their specific function.

The task quickly seems like a giant project, as even a small sample of water or soil can harbor tens of thousands of organism, each with thousands of unique genes. And the task needs to be handled fast, as the earth is losing different organism every day due to climate change, human development, and other factors.

The Race for Screening

The team behind the project quickly realized that there was a need to screen this "big data" from hundreds of millions of genes that were being decoded and fast.

With an expectation to examine 200 million genes, in a wast variety of life forms. These samples could be things like seaweed from Australia compared to microbes from Amazon. The search compared if two genes where similar, and if the function of the gene where already known it would allow scientists to make an educated guess about the purpose of the other gene.

With about 20 quadrillion comparisons to be made, a calculated computer time of 40,000 years, they decided to use the World Community Grid system and their distributed platform.

They had a specific set of goals with the project

  • To create a database of protein sequence comparison information, based on the DNA found from diverse sources, for all scientists to reference.
  • To discover new gene functions, augmenting our knowledge about biochemical processes in general.
  • To find how organisms interact with each other and environment.
  • To document the current baseline microbial diversity, allowing us to understand how microorganisms change under environmental stresses, such as climate change.
  • To better understand and model complex microbial systems.

Only the Beginning

The Uncovering Genom Mystery Project is only the beginning of a long knowledge journey. The results from this task are expected to help identify, design and produce new antibiotics and drugs as well as creating industrial applications from enzymes.

So, what happened with the 30 TB of data?

Last month, in November, the project released an update of their results made from the distributed work done. The whole dataset has been stored on bioinformatics server at the Oswaldo Cruz Foundation, as well as to Dr. Torsten Thomas and his team from the Centre for Marine Bio-Innovation & the School of Biological, Earth and Environmental Sciences at the University of New South Wales in Sydney, Australia.

The dataset, even highly compressed and still 30 TB, took a few months to be transferred from Brazil to Australia!

University of New South Wales

At the University of New South Wales, the results from protein comparisons will help to interpret the analyses of marine bacterial ecosystems, where micro-organisms, coral reef, sponges and many other intriguing creatures interact and form their living communities.

The Fiocruz team

Further processing the primary output, The Fiocruz team has to deal with the rapidly growing data it involves in the workflow of deciphering the raw data. Their task is to associate the data with the correct inter-genome comparison, check for errors, tabulate and transform the data into meaningful information.

One goal is to build a database interface that appeals to the general public interested in biodiversity. The results are not supposed to be a scientists only project.

Some data from the project is already being used in general practice as it's used for vaccine and drug design against viruses such as Zika, dengue, and yellow fever. It's also an asset for building an understanding of the interaction of bacteria with their environment and how this reflects in their metabolic pathways.

Fiocruz is looking for partnerships that would add additional data analytics and artificial intelligence to the project.

Further updates are expected as the data analyze progress.

Sources

Article: Analysis Underway on 30 Terabytes of Data from the Uncovering Genome Mysteries Project
Research: Uncovering Genome Mysteries: Project Details


Join Gridcoin Research+BOINC and Change the World

BOINC (Berkeley Open Infrastructure for Network Computing, pronounced /bɔɪŋk/ – rhymes with "oink") is a distributed work platform that has been around since 2002 (15 years now) and rewards all participants with a score. BOINC consists of over 500,000+ active users and many more computers. It is a popular platform for researchers to do large amounts of distributed work. The infrastructure is already in place, and no payment is required. Doing distributed work is purely voluntary, where users can attach any project of their choosing.

Gridcoin Research is a Proof-of-Stake (PoS) cryptocurrency that rewards Proof-of-Research (PoR) for BOINC computation based on the BOINC RAC (Recent Average Credit) score. Some requirements are for projects to be listed on the platform but are relatively easy to comply. There are currently ~20+ Projects included that the network rewards work done by users.


Vote for me as Witness

Enjoy what I contribute to the community? Consider voting for my Witness on Steemit or BitShares. You can also vote for me as a Steemit Proxy Voter.

By voting for me as a witness, you will support an active witness on Steem and BitShares. I believe a witness should keep up-to-date on current happenings and be a conduit between the many users and the system.

Read my Witness Posts: BitShares, Steemit

Support my Projects: Gridcoinstats.eu, Crypto.fans

Available on: BTS Discord, Gridcoin Slack and Gridcoin Telegram

Vote for sc-steemit on Steemit

Vote for sc-ol on BitShares

Proud Supporter of Gridcoin and BitShares

Sort:  

It's always good to read something about a project when it's finished so we can see our help really matters! Thanks for the article.

It's as important as the crunching period. Just because there are no more work left doesn't mean it's all done 😄

I find updates like these very interesting to know about, that the work done contribute to something.

Before reading the source I didn't know that the results where already used for cures. That's amazing.

Great post there, keep up good work !

This replay was created using STEEMER.NET Alpha ( support STEEMER.NET Transactor / Wallet / Exchange Project here: https://steemit.com/investors-group/@cryptomonitor/steemer-net-steem-blockchain-transactor-for-windows-android-app-funding-update-243-1200-sbd-28-12-2017 )

"The dataset, even highly compressed and still 30 TB, took a few months to be transferred from Brazil to Australia!"
Wow thats pretty... bad. We have 2017 where you can get residential internet with 1gb/s upload so real transfer at 100mb/s so you can transfer 30TB in around 83 hours. Few months is pretty bad result. :/

Truly, but that's what they said it took. Not sure how horrible internet access these universities and research centers has though :)

It's possible they were doing excessive redundancy checks to verify the integrity of each packet, which hypothetically could double or triple transfer time. Then they may have had to keep it throttled back to save bandwidth for all other needs. This is probably the big one here.

That sounds exciting! I might jump over to do some genome crunching!

Coin Marketplace

STEEM 0.36
TRX 0.12
JST 0.039
BTC 70112.96
ETH 3549.99
USDT 1.00
SBD 4.71