[Solomonov Seminar] 152. Solomonov seminar

Marko Grobelnik marko.grobelnik at ijs.si
Mon Oct 4 06:16:49 CEST 2004


Vabim vas na 152. Solomonov seminar, ki bo v torek, 5. oktobra 2004 ob
13:00 uri v Veliki predavalnici IJS. Posnetki preteklih seminarjev so na http://solomon.ijs.si/

Na seminarju bo gost iz Berlina predstavil statisticne metode za identifikacijo
objektov v nekonsistentnih bazah - problem je precej zanimiv in aktualen v situacijah,
ko skusamo zliti vec podatkovnih baz, kjer so opisovani isti objekti vendar so zapisani
drugace. Tehnike, ki jih bo predstavil Hans Lenz pomagajo pri razresevanju takih
nekonsisten v podatkih.

Naslednji seminar bo v petek 8. oktobra ob 11h - predaval bo Airan Zweger
iz Bruslja, projektni uradnik na FP6-IST projektih. Naslov predavanja
bo "IST - behind the scenes" - govoril bo o zadevah, ki bi jih morali vedeti
pri prijavi projektov, pa niso nikjer eksplicitno zapisane.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Object Identification by Statistical Methods
Hans-J. Lenz, Free University, Berlin

Abstract

Numerical data fusion or merging of overlapping data files becomes a hard problem if no global unique identifying keys exist in the 
corresponding data sets. Typical examples are the linkage of address files supplied from different sources for commercial purposes - 
a money making area-, the merging of special offers in various media (cf. duplicate detection), or an administrative record census 
(ARC) as planed in Germany, where several autonomous, heterogeneous registers are to be merged.
We present a three-step procedure consisting of the steps conversion of attributes, comparison of values of a pair of objects, and 
classification ('matching problem') of pairs either as "same" or "matched and "not same" or "not matched".
We pay special attention to the quality and the efficiency of the methodology. We briefly discuss questions like correctness and 
completeness as well as pre-selection techniques like 'blocking' to reduce the computational complexity of pairwise comparisons.
The approach is illustrated on data from carefully composed benchmark data sets.
We assume some basic knowledge in computer science and classification (supervised learning). 



More information about the Solomonov-seminar mailing list