Affordable DNA sequencing at scale has enabled the genomes of hundreds of thousands of people to be determined across the world and has led to a better understanding of the causes of complex diseases, better diagnosis / early disease detection and more options for identifying tailored treatment options. 

In order to achieve these outcomes, genomic information from one individual needs to be compared with multiple other genomes from similar cases in order to form cohorts of sufficient size to produce statistically meaningful outputs.  This is often done across multiple efforts/jurisdictions, at a national or global scale, and requires the genomic data to be findable, searchable, shareable, and linkable to analytical capabilities.

Despite a desire to share data for research use, there are many siloed collections of human genome data in Australia, with each collection often inaccessible to outside users. Our Human Genomes Platform Project prepared foundational infrastructure that paves the way for human genome data in Australia to be findable, searchable, shareable, and linkable to analytical capabilities, while ensuring the privacy of individuals is protected and data processing is performed ethically, securely and safely.

In the virtual cohorts subproject, a system will be established that can be used to identify cohorts of individuals across the participating repositories. Explicitly out-of-scope but a stretch goal to be considered is the ability to perform compute against distributed virtual cohorts.