[Solomonov Seminar] 130. Solomonov seminar

Marko Grobelnik marko.grobelnik at ijs.si
Fri Oct 31 00:57:15 CET 2003


Vabim vas na 130. Solomonov seminar, ki bo v torek, 
4. novembra 2003 ob 13:00 uri v Veliki predavalnici IJS.
Posnetki in materiali preteklih seminarjev so dostopni
na http://solomon.ijs.si

Na seminarju bo Jasminka Dobsa iz Varazdina predstavila
temo svoje doktorske disertacije iz podrocja Information Retrieval.
Predstavila bo dva prijema, ki jih uporabljamo pri analizi besedil
in ki omogocata kompaktnejso predstavitev vecje mnozice dokumentov.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jasminka Dobsa (Fakultet organizacije i informatike, Varazdin)
          Comparison of information retrieval techniques: 
          Latent semantic indexing (LSI) and Concept indexing (CI)

Information retrieval in the vector space model is based on literal 
matching of terms in the documents and the queries. The model is 
implemented by creating the term-document matrix, which is formed 
on the base of frequencies of terms in documents. Literal matching of 
terms does not necessarily retrieve all relevant documents. Synonymy 
(multiple words having the same meaning) and polysemy (words having 
multiple meaning) are two major obstacles for efficient information retrieval. 
Latent semantic indexing (LSI) and concept indexing (CI) are information 
retrieval techniques embedded in the vector space model, which address the 
problem of synonymy and polysemy. 
 
The method of LSI is an information retrieval technique using a low-rank 
singular value decomposition (SVD) of the term-document matrix. Although 
the LSI method has empirical success, it suffers from the lack of interpretation 
for the low-rank approximation and, consequently, the lack of controls for 
accomplishing specific tasks in information retrieval. The method of  CI uses 
centroids of clusters or so-called concept decomposition (CD) for lowering 
the rank of the term-document matrix. Here we compare SVD/LSI and  
CD/CI in terms of matrix approximations and precision of information retrieval.



More information about the Solomonov-seminar mailing list