Website

https://github.com/yale-p2med/PBS-R

Description

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both simulated data and real data when compared to traditional methods. It also has comparable or better performance compared to Pathifier.

Share

COinS
 

A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both simulated data and real data when compared to traditional methods. It also has comparable or better performance compared to Pathifier.

https://elischolar.library.yale.edu/dayofdata/2018/posters/8