[SlovLit] Marja Boršnik in Simon Jenko -- Označevanje v digitalni humanistiki -- Očetov naših imenitna dela -- Jezikovni izzivi sodišč
Miran Hladnik gmail
miran.hladnik na gmail.com
Sre Sep 16 08:52:07 CEST 2015
V torek 29. sept. 2015 bo ob 18.00 v Ullrichovi hiši ob gradu
Khislstein v Kranju (Tomšičeva 42) okrogla miza ob izidu monografije o
Marji Boršnik. (http://mailman.ijs.si/pipermail/slovlit/2015/005258.html)
V Mestni knjižnici Kranj pa bodo 13. okt. ob 18.00 odprli razstavo in
simpozij ob 180-letnici rojstva Simona Jenka (1835--1869).
From: "Fišer, Darja" <Darja.Fiser na ff.uni-lj.si>
To: "sdjt-l na ijs.si" <sdjt-l na ijs.si>, "slovlit na ijs.si" <slovlit na ijs.si>
Date: Mon, 14 Sep 2015 18:08:48 +0000
Subject: [JOTA: Otvoritev nove sezone]
Dragi vsi, prav lepo vabljeni na prvo epizodo nove sezone Jot, ki bo v
sredo, 23. 9. 2015 ob 15:30 v Oranžni sobi IJS. Predaval bo dr.
Maarten Janssen (http://maarten.janssenweb.net/) z Univerze v Lizboni.
Lp, Darja Fišer
TEITOK: tagging and digital humanities (povzetek predavanja)
When looking at historic corpora, there are typically two different
types of corpora: on the one hand those corpora created by
philologists, which are annotated with textual information concerning
line breaks, letter types, deletions, additions, etc. and on the other
hand those created by corpus linguists, which are annotated with
linguistic information concerning part-of-speech, lemmatisation, and
(shallow) syntactic information. Corpora that contain both are rare,
although they do exists, such as the Sainte Graal of the university in
Lyons, or the IMP corpus of the university of Ljubljana. One reason
for the lack of such corpora is that people working with textual
markup are often not interested in the linguistic markup, and
vice-versa. However, another important reason is that there are no
tools to help in the hard task of creating corpora combining both
types of information. In this presentation, I will demonstrate the
TEITOK system, which aims to provide a system that does exactly that:
it allows adding and modifying layers of linguistic annotation on top
of documents containing textual annotation. It is a system for both
the creation and maintenance, and the distribution of such corpora
combining two types of annotation. The system allows corpus queries
using the rich CQP query language, and see the results in a way that
closely resembles the original format. The presentation will give a
general overview of the philosophy and architecture of the system,
with a focus on those aspects that are most relevant for historic
corpora: how the system allows to combine orthographic variants of
each word to be displayed and searched, most notably the original
orthography and the normalized orthography; how the system deals with
tokenisation mismatches where for instance the grammatical word does
not align with the orthographic word, or where the original
orthography does not align with the normalized orthography; and how
the system allows users to switch from a more text-oriented view to a
more manuscript-oriented view.
Od: Matej Krajnc <krajnc75 na gmail.com>
Datum: 15. september 2015 23:07
Zadeva: Očetov naših imenitna dela
Miran, zdravo, še en krajši prispevek k interdisciplinarnosti za
Slovlit. ;-) LP, Matej
https://youtu.be/YvJWYfh2pe0 (besedilo: F. Prešeren, glasba in
izvedba: M. Krajnc)
From: Cobec Aleksander <Aleksander.Cobec na rtvslo.si>
To: "slovlit na ijs.si" <slovlit na ijs.si>
Date: Wed, 16 Sep 2015 06:08:07 +0000
Subject: Jezikovni izzivi sodišč
Na sodiščih se pogosto zgodi, da je realnost govora daleč od knjižne
izreke in celo razumljivosti. Nemalokrat pa se tudi dogaja, da stranka
v sodnem postopku nima interesa vsega povedati po resnici. Njene
izjave lahko zato vsebujejo mašila, ponavljanja, nejasne izgovarjave,
jecljanja in druge jezikovne napake. Kako to vpliva na samo presojo
sodnikov in kakšni so jezikovni izzivi slovenskih sodišč, smo se
pogovarjali z dr. Hotimirjem Tivadarjem:
