Material Detail

Machine learning methods for effective proteomics image analysis

This video was recorded at Learning and Inference in Computational Systems Biology (LICSB), Warwick 2010. Two-dimensional gel electrophoresis (2DGE) remains the most widely used method for proteins identification and differential expression analysis, due to its lower cost and the existence of mature commercial software tools for 2DGE image analysis, despite the fact that non-gel based methods are gaining in popularity. Although there are several software packages that promise automation of the whole protein spot detection and quantification process, the hard reality remains today [1] that as Fey and Larsen stated in 2001, "There is no program that is remotely automatic when presented with complex 2-DE images" ... "most programs require often more than a day of user hands-on time to edit the image before it can be fully entered into the database‚" [2]. To address these limitations and develop an automated 2DGE image analysis workflow we have developed in previous works an effective image analysis methodology that first denoises the 2DGE image based on the Controurlet transform [3] and then separates effectively the parts of the denoised image which include true protein spots (to be called Regions of Interest (ROIs) from the background-only areas, by using Active Contours (AC) without edges [4]. In this work we complete the image analysis workflow by adding a well tuned pipeline of operations based on unsupervised machine learning methods for analyzing further each isolated ROI, in order to "fish" in it the centers and estimate the quantities of the individual "hidden" spots.One-dimensional mixture modeling of the ROI pixel intensities histogram is applied first to identify and remove any remaining background pixels. Then the surviving ROI pixels are used as "molecules generators", in order to convert (by random sampling) the processed ROI image to an isomorphic dataset (through appropriate random sampling) representing the distribution of molecules of the underlying protein species (that are "projected" as spots on the gel image). This reverse engineering action rooted on machine learning constitutes a unique innovation of this work that, to the best of our knowledge, has not been applied before in 2DGE image analysis. The candidate protein spot centers are then located by applying hierarchical clustering. Finally the individual spot boundaries are delineated by fitting 2D Gaussian models to the data using generalized mixture modeling and the Minimum Message Length (MML) criterion to control the best model complexity. An extensive evaluation of this novel spot modeling methodology using both real and synthetic 2DGE images reveals that it is more precise and more specific than PDQuest in terms of spot detection while both methods achieve comparable high sensitivity. Furthermore, it can estimate more reliably the volumes of the extracted spots, even in the presence of substantial noise and in areas of the image where faint and overlapping (or saturated) spots are located close to each other. It should be noted that the end-to-end workflow that we have developed for 2DGE image analysis does not require any re-calibration of parameters every time a new gel image is presented for analysis. This desirable characteristic makes it a suitable candidate for the automatic processing of image stacks, as needed for highthroughput proteomics analysis to support systems biology projects.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 10, 2015
Date Modified in MERLOT:: February 10, 2015
Author:: Elias S. Manolakos, Department of Informatics and Telecommunications, National and Kapodistrian University of Athens Panepistimiopolis
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States