[Solomonov Seminar] 215. Solomonov seminar
Marko Grobelnik
marko.grobelnik at ijs.si
Mon Feb 15 01:41:18 CET 2010
V ponedeljek !!!nestandarni termin!!!, 15. februarja bo ob 13:00h v Oranzni predavalnici
(drugo nadstropje glavne stavbe IJS), 215. Solomonov seminar.
Posnetki preteklih seminarjev so na http://videolectures.net/solomon/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nives Skunca, Institut Rudjer Boskovic, ZagrebParalogs considerably improve accuracy of phylogenetic profiling for in silico functional annotation
Phylogenetic profiling is a genomic context method that predicts gene function by correlating gene occurrence patterns in selected
organisms [1]. The intuition behind phylogenetic profiling is that genes found and lost together (i. e. inherited together) in
different genomes are likely to share function, either by 1) being involved in the same biological pathway (which is therefore
incomplete without all members in a given genome), or 2) being crucial for survival in a particular environment, so their presence
is mandatory throughout the phenotype.
We have used a recently developed machine learning approach based on decision trees for Hierarchical Multi - label Classification
(HMC) [2] to predict Gene Ontology (GO) assignments of Orthologous Matrix (OMA) groups [3]. The HMC extension of the decision tree
classifier takes into account the hierarchical layout of GO and considerably improves computational efficiency and accuracy by
taking into account a set of class labels simultaneously when constructing the decision trees, instead of learning each class label
separately. A standard decision tree would recursively split the training data into subsets ('branches') on values of an attribute
in such a manner as to decrease a measure of entropy of a class label within the subsets after the split. The HMC approach has to
deal with multiple class labels, and would compute a weighted average of decrease in entropy over all labels when deciding on a
split point. The weights here are inversely proportional to the depth of a class in the GO, giving more significance to high-level,
more general GO terms.
We have inspected the effects of stepwise addition of putative paralogs on computational learning of orthologous groups' (and
consequentially gene) function. By introducing paralogous genes in the learning process, we substantially increase its success and
show that gene function prediction from sequence information alone, when encoded as a paralog-containing phylogenetic profile, is a
promising approach in narrowing of possible function space for a particular protein.
References:
1. Pellegrini, M., et al., Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles.
Proceedings of the National Academy of Sciences of the United States of America, 1999. 96(8): p. 4285-4288.
2. Vens, C., et al., Decision trees for hierarchical multi-label classification. Machine Learning, 2008. 73(2): p.
185-214.
3. Roth, A.C.J., G.H. Gonnet, and C. Dessimoz, Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics,
2008. 9: p. 518-528.
More information about the Solomonov-seminar
mailing list