[Solomonov Seminar] 190. Solomonov seminar
Marko Grobelnik
marko.grobelnik at ijs.si
Sun Sep 9 00:44:08 CEST 2007
Vabim vas na 190. Solomonov seminar, ki bo v torek 11. septembra,
ob 13h v Sejni sobi E8 (Oranzna predavalnica) - drugo nadstropje glavne stavbe IJS.
Posnetke preteklih seminarjev najdete na naslovu http://videolectures.net/solomon/
Tokrat bo predaval nas gost Fabrice Colas iz Univerze v Leiden-u (NL) na temo
uporabe metode podpornih vektorjev (SVM) pri klasifikaciji besedil.
----------------------------------------
Fabrice Colas, Leiden University
Explanation of SVM's behaviour in text classification
We are concerned with the problem of learning classification rules in text categorization
where many authors presented Support Vector Machines (SVM) as leading classification
method. Number of studies, however, repeatedly pointed out that in some situations SVM
is outperformed by simpler methods such as naive Bayes or nearest-neighbor rule.
In this paper, we aim at developing better understanding of SVM behaviour in typical
text categorization problems represented by sparse bag of words feature spaces.
We study in details the performance and the number of support vectors when varying
the training set size, the number of features and, unlike existing studies, also SVM free
parameter C, which is the Lagrange multipliers upper bound in SVM dual. We show
that SVM solutions with small C are high performers. However, most training documents
are then bounded support vectors sharing a same weight C . Thus, SVM reduce to a
nearest mean classifier; this raises an interesting question on SVM merits in sparse
bag of words feature spaces. Additionally, SVM suffer from performance deterioration
for particular training set size/number of features combinations.
More information about the Solomonov-seminar
mailing list