Material Detail

Variant prioritization by genomic data fusion

Variant prioritization by genomic data fusion

This video was recorded at Marie Curie Initial Training Network on Machine Learning for Personalized Medicine (MLPM) 1st Summer School, Tübingen, 2013. NGS has rapidly increased our ability to discover the cause of many previously unresolved rare monogenic disorders by sequencing rare exomic variation. However, after standard filtering against nonsynonymous single nucleotide variants (nSNVs) and loss-of-function mutations that are not present in healthy populations or unaffected samples, many potential candidate mutations are often retained and we need predictive methods to prioritize variants for further validation. Several computational methods have been proposed that take into account biochemical, evolutionary and structural properties of mutations to assess their potential deleteriousness. However, most of these methods suffer from high false positive rates when predicting the impact of rare nSNVs. A plausible explanation for this poor performance is that many of these predicted variants are mildly deleterious, but in no way specific to the disease of interest. We therefore propose a genomic data fusion methodology that integrates multiple strategies to detect deleteriousness of mutations and prioritizes them in a phenotype-specific manner. A key innovation is that we incorporate into our strategy a computational method for gene prioritization, which scores mutated genes based on their similarity to known disease genes by fusing heterogeneous genomic information. We also integrate haploinsufficiency prediction scores that predict the probability that the function of a gene is affected if present in a functionally haploid state. To integrate or fuse these data sources, we develop a machine-learning model using the Human Genome Mutation Database (HGMD) of human disease-causing mutations compared to three control sets: common polymorphisms and two independent sets of rare variation. Benchmarking on HGMD demonstrates that this integrative phenotype-specific variant prioritization significantly outperforms state-of-the-art predictors, such as SIFT or PolyPhen-2.


  • Editor Reviews
  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.