Material Detail

Machine learning methods for effective proteomics image analysis

Machine learning methods for effective proteomics image analysis

This video was recorded at Learning and Inference in Computational Systems Biology (LICSB), Warwick 2010. Two-dimensional gel electrophoresis (2DGE) remains the most widely used method for proteins identification and differential expression analysis, due to its lower cost and the existence of mature commercial software tools for 2DGE image analysis, despite the fact that non-gel based methods are gaining in popularity. Although there are several software packages that promise automation of the whole protein spot detection and quantification process, the hard reality remains today [1] that as Fey and Larsen stated in 2001, "There is no program that is remotely automatic when presented with complex 2-DE images" ... "most programs require often more than a day of user hands-on time to edit the image before it can be fully entered into the databaseā€š" [2]. To address these limitations and develop an automated 2DGE image analysis workflow we have developed in previous works an effective image analysis methodology that first denoises the 2DGE image based on the Controurlet transform [3] and then separates effectively the parts of the denoised image which include true protein spots (to be called Regions of Interest (ROIs) from the background-only areas, by using Active Contours (AC) without edges [4]. In this work we complete the image analysis workflow by adding a well tuned pipeline of operations based on unsupervised machine learning methods for analyzing further each isolated ROI, in order to "fish" in it the centers and estimate the quantities of the individual "hidden" spots.One-dimensional mixture modeling of the ROI pixel intensities histogram is applied first to identify and remove any remaining background pixels. Then the surviving ROI pixels are used as "molecules generators", in order to convert (by random sampling) the processed ROI image to an isomorphic dataset (through appropriate random sampling) representing the distribution of molecules of the underlying protein species (that are "projected" as spots on the gel image). This reverse engineering action rooted on machine learning constitutes a unique innovation of this work that, to the best of our knowledge, has not been applied before in 2DGE image analysis. The candidate protein spot centers are then located by applying hierarchical clustering. Finally the individual spot boundaries are delineated by fitting 2D Gaussian models to the data using generalized mixture modeling and the Minimum Message Length (MML) criterion to control the best model complexity. An extensive evaluation of this novel spot modeling methodology using both real and synthetic 2DGE images reveals that it is more precise and more specific than PDQuest in terms of spot detection while both methods achieve comparable high sensitivity. Furthermore, it can estimate more reliably the volumes of the extracted spots, even in the presence of substantial noise and in areas of the image where faint and overlapping (or saturated) spots are located close to each other. It should be noted that the end-to-end workflow that we have developed for 2DGE image analysis does not require any re-calibration of parameters every time a new gel image is presented for analysis. This desirable characteristic makes it a suitable candidate for the automatic processing of image stacks, as needed for highthroughput proteomics analysis to support systems biology projects.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Disciplines with similar materials as Machine learning methods for effective proteomics image analysis


Log in to participate in the discussions or sign up if you are not already a MERLOT member.