[Solomonov Seminar] 190. Solomonov seminar

Sun Sep 9 00:44:08 CEST 2007

Vabim vas na 190. Solomonov seminar, ki bo v torek 11. septembra, 
ob 13h v Sejni sobi E8 (Oranzna predavalnica) - drugo nadstropje glavne stavbe IJS.
Posnetke preteklih seminarjev najdete na naslovu http://videolectures.net/solomon/ 

Tokrat bo predaval nas gost Fabrice Colas iz Univerze v Leiden-u (NL) na temo
uporabe metode podpornih vektorjev (SVM) pri klasifikaciji besedil.

----------------------------------------
Fabrice Colas, Leiden University 
       Explanation of SVM's behaviour in text classification

We are concerned with the problem of learning classification rules in text categorization 
where many authors presented Support Vector Machines (SVM) as leading classification 
method. Number of studies, however, repeatedly pointed out that in some situations SVM 
is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule. 
In this paper, we aim at developing better understanding of SVM behaviour in typical 
text categorization problems represented by sparse bag of words feature spaces. 
We study in details the performance and the number of support vectors when varying 
the training set size, the number of features and, unlike existing studies, also SVM free 
parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show 
that SVM solutions with small C are high performers. However, most training documents 
are then bounded support vectors sharing a same weight C . Thus, SVM reduce to a 
nearest mean classifier; this raises an interesting question on SVM merits in sparse 
bag of words feature spaces. Additionally, SVM suffer from performance deterioration 
for particular training set size/number of features combinations.