[Solomonov Seminar] 215. Solomonov seminar

Marko Grobelnik marko.grobelnik at ijs.si
Mon Feb 15 01:41:18 CET 2010


V ponedeljek !!!nestandarni termin!!!, 15. februarja bo ob 13:00h v Oranzni predavalnici
(drugo nadstropje glavne stavbe IJS), 215. Solomonov seminar.
Posnetki preteklih seminarjev so na http://videolectures.net/solomon/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nives Skunca, Institut Rudjer Boskovic, ZagrebParalogs considerably improve accuracy of phylogenetic profiling for in silico functional annotation

Phylogenetic profiling is a genomic context method that predicts gene function by correlating gene occurrence patterns in selected 
organisms [1]. The intuition behind phylogenetic profiling is that genes found and lost together (i. e. inherited together) in 
different genomes are likely to share function, either by 1) being involved in the same biological pathway (which is therefore 
incomplete without all members in a given genome), or 2) being crucial for survival in a particular environment, so their presence 
is mandatory throughout the phenotype.

We have used a recently developed machine learning approach based on decision trees for Hierarchical Multi - label Classification 
(HMC) [2] to predict Gene Ontology (GO) assignments of Orthologous Matrix (OMA) groups [3]. The HMC extension of the decision tree 
classifier takes into account the hierarchical layout of GO and considerably improves computational efficiency and accuracy by 
taking into account a set of class labels simultaneously when constructing the decision trees, instead of learning each class label 
separately. A standard decision tree would recursively split the training data into subsets ('branches') on values of an attribute 
in such a manner as to decrease a measure of entropy of a class label within the subsets after the split. The HMC approach has to 
deal with multiple class labels, and would compute a weighted average of decrease in entropy over all labels when deciding on a 
split point. The weights here are inversely proportional to the depth of a class in the GO, giving more significance to high-level, 
more general GO terms.

We have inspected the effects of stepwise addition of putative paralogs on computational learning of orthologous groups' (and 
consequentially gene) function. By introducing paralogous genes in the learning process, we substantially increase its success and 
show that gene function prediction from sequence information alone, when encoded as a paralog-containing phylogenetic profile, is a 
promising approach in narrowing of possible function space for a particular protein.

References:
1.            Pellegrini, M., et al., Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. 
Proceedings of the National Academy of Sciences of the United States of America, 1999. 96(8): p. 4285-4288.
2.            Vens, C., et al., Decision trees for hierarchical multi-label classification. Machine Learning, 2008. 73(2): p. 
185-214.
3.            Roth, A.C.J., G.H. Gonnet, and C. Dessimoz, Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics, 
2008. 9: p. 518-528.
 



More information about the Solomonov-seminar mailing list