<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body style="font-size: 10pt; font-family: Verdana,Geneva,sans-serif">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<table width="640" cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td>
<table width="640" cellspacing="0" cellpadding="0"
border="0" bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px; min-width:
640px;" width="640" cellspacing="0"
cellpadding="0" border="0"
bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;"
align="center">
<table width="100%"
cellspacing="0"
cellpadding="0" border="0"
align="center">
<tbody>
<tr>
<td width="110"
valign="middle"
align="left"><img
src="cid:part1.MAGAn2Ua.2v8GVPgD@ijs.si" alt="" class="" width="101"
height="30"></td>
<td width="20"
height="1"> </td>
<td valign="middle"
align="right">
<table width="100%"
cellspacing="0"
cellpadding="0"
border="0"
align="center">
<tbody>
<tr>
<td
style="font-family:
Poppins,sans-serif; font-size: 21px; line-height: 31.5px; font-weight:
bold; color:
#0080ad;"
align="right">CLASSLA
Mailing List</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td
style="line-height: 10px;
min-height: 10px;"
height="10"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table width="640" cellspacing="0" cellpadding="0"
border="0" bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px; min-width:
640px;" width="640" cellspacing="0"
cellpadding="0" border="0"
bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td align="center">
<table
style="border-top:
3px double #ededf3;
border-collapse: initial;"
width="100%"
cellspacing="0"
cellpadding="0" border="0"
align="center">
<tbody>
<tr>
<td
style="line-height:
0px; min-height:
0px;" height="0"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table width="640" cellspacing="0" cellpadding="0"
border="0" bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px; min-width:
640px;" width="640" cellspacing="0"
cellpadding="0" border="0"
bgcolor="#ffffff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td
style="line-height: 10px;
min-height: 10px;"
height="10"> </td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;"
align="center">
<table
style="border-radius:
2px;" width="560"
cellspacing="0"
cellpadding="0" border="0"
align="center">
<tbody>
<tr>
<td
style="padding:
0px 40px; border:
1px solid #e6e6e6;
border-radius: 2px;"
bgcolor="#FCFCFC"
align="center">
<table width="100%"
cellspacing="0"
cellpadding="0"
border="0"
align="center">
<tbody>
<tr>
<td
height="30"> </td>
</tr>
<tr>
<td
id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477bodyText-8"
style="font-family: Poppins,sans-serif; font-size: 14px; line-height:
21px; color:
#000000;">
<p>Dear all,</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">As
we wrap up
another
eventful year,
we would like
to share an
overview of
the key
developments
and activities
at the CLASSLA
Knowledge
Centre for
South Slavic
Languages
during 2025.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>CLASSLA-web
corpora for
South Slavic
languages</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">We
are excited to
announce that
we have
released the
second version
of the
CLASSLA-web
corpora,
comprising
texts that
were collected
from the web
in 2024. You
can now
already query
the new
corpora on the
CLARIN.SI
concordancer (<a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_bs"
moz-do-not-send="true">Bosnian</a>, <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_bg"
moz-do-not-send="true">Bulgarian</a>, <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_hr"
moz-do-not-send="true">Croatian</a>, <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_mk"
moz-do-not-send="true">Macedonian</a>, <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_cnr"
moz-do-not-send="true">Montenegrin</a>, <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_sr"
moz-do-not-send="true">Serbian</a>, and <a
href="https://www.clarin.si/ske/#dashboard?corpname=classlaweb2_sl"
moz-do-not-send="true">Slovenian</a> corpora) or find more information
about both 1.0
and 2.0
versions of
CLASSLA-web
corpora on a
new website: <a
href="https://clarinsi.github.io/classla-web/"
class="moz-txt-link-freetext" moz-do-not-send="true">https://clarinsi.github.io/classla-web/</a></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Although
collected from
the same
national
domains as
version 1.0
from 2021 and
2022, the new
release is
substantially
larger and
contains
mostly new
material:
around 50%
more texts and
words,
totalling 38
million texts
and 17 billion
words across
seven South
Slavic
languages. The
corpora are
linguistically
annotated with
an <a
href="https://zenodo.org/records/13936406" moz-do-not-send="true">improved
CLASSLA-Stanza</a>
tool (<a
href="https://clarin.si/oznacevalnik/eng" moz-do-not-send="true">available
as a service
here</a>) and
a multilingual
genre
classifier <a
href="https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier"
moz-do-not-send="true">X-GENRE</a>. In addition, version 2.0 now also
includes topic
labels based
on our <a
href="https://huggingface.co/classla/multilingual-IPTC-news-topic-classifier"
moz-do-not-send="true">multilingual news topic classifier</a>. Soon, the
corpora will
also be
available on
the CLARIN.SI
repository in
JSONL and
linguistically-annotated
VERT formats.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>CLASSLA-Express
workshop
series</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Our
<a
href="https://www.clarin.si/info/k-centre/workshops/classla-express/"
moz-do-not-send="true">CLASSLA-Express workshop</a> programme expanded
both in
content and
geography.
This year,
seven
workshops were
held across
four countries
– Austria,
Bulgaria,
Croatia, and
Slovenia – led
primarily by
Ivana
Filipović
Petrović and
Jelena
Parizoska,
with
contributions
from Petya
Osenova and
local
organizers. In
addition to
demonstrating
the use of
CLARIN.SI
concordancers
and the
CLASSLA-web
corpora, the
workshops
introduced new
topics with a
strong focus
on applying
modern AI
methods in
linguistic
research. We
are delighted
by the
continued
interest and
encourage you
to explore the
<u> <a
href="https://www.clarin.si/info/k-centre/workshops/"
moz-do-not-send="true">detailed workshop reports</a></u> available on
our website.
You are warmly
invited to
stay tuned:
CLASSLA-Express
3.0, with a
new focus on
spoken
corpora, is
already on the
horizon.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>Benchmarking
large language
models for
South Slavic
languages and
dialects</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Evaluation
of large
language
models (LLMs)
continued to
be one of our
key
activities.
This year, we
participated
in development
of multiple
South Slavic
benchmarks for
LLM
evaluation,
including the
<a
href="https://arxiv.org/abs/2510.24081" moz-do-not-send="true">Global-PIQA</a>
test set, a
multilingual
commonsense
reasoning
benchmark
developed by
335 co-authors
and covering
116 languages
and dialects,
including
standard South
Slavic
languages, as
well as
Torlak,
Chakavian, and
the Slovenian
Cerkno
dialects.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">In
parallel, we
launched an <a
href="https://llm-benchmarks-classla.streamlit.app/"
moz-do-not-send="true">interactive platform presenting evaluation
results for
South Slavic
languages and
dialects
across six
tasks</a>: two
commonsense
reasoning
benchmark
families (COPA
and PIQA),
sentiment
classification,
news topic
classification,
and automatic
genre
identification.
The platform
enables
researchers
and developers
to compare
large language
model
performance,
identify
strengths and
weaknesses,
and follow
developments
over time. To
support
further
experimentation
and
application,
we provide an
<u> <a
href="https://arxiv.org/abs/2511.07989" moz-do-not-send="true">accompanying
paper with an
overview of
current model
performance</a></u>
as well as <a
href="https://github.com/TajaKuzman/Benchmarking-Text-Classification-on-South-Slavic"
moz-do-not-send="true">open-source code</a> for running evaluations and
adapting LLMs
to new tasks.
We are excited
to continue
our
benchmarking
activities as
part of the <a
href="https://www.cjvt.si/llm4dh/en/" moz-do-not-send="true">LLM4DH</a>
and <a
href="https://alt-edic.eu/projects/llms4eu/" moz-do-not-send="true">LLMs4EU</a>
projects,
which will
extend over
the next few
years.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>Speech
corpora and
technologies</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Our
efforts in
speech
resources
advanced
significantly
this year,
with a major
focus on
expanding and
enriching
parliamentary
speech
corpora. A key
achievement
was the
release of <a
href="https://clarinsi.github.io/parlaspeech/" moz-do-not-send="true">ParlaSpeech
3.0</a>, a
multilingual
collection
covering
Croatian,
Serbian,
Czech, and
Polish
parliamentary
proceedings.
In the new
release,
ParlaSpeech
has been
extended with
five
annotation
layers:
linguistic
annotation,
sentiment
labels,
filled-pause
detection,
precise
word-level
alignments,
and primary
stress
information.
These
enrichment
layers have
been added
automatically
with
cutting-edge
models for
processing
speech and
text, most of
which can be
found on the <a
href="https://huggingface.co/classla" moz-do-not-send="true">CLASSLA
Hugging Face
page</a>. The
enrichments
enable
advanced
studies of
prosody,
disfluency
patterns, and
multimodal
aspects of
parliamentary
speech. In
addition to
the <a
href="http://hdl.handle.net/11356/1833" moz-do-not-send="true">CLARIN.SI
repository</a>,
the corpora
are now
accessible
through the
CLARIN.SI
concordancers
(<a
href="https://www.clarin.si/ske/#concordance?corpname=parlaspeech3_hr"
moz-do-not-send="true">Croatian</a>, <a
href="https://www.clarin.si/ske/#concordance?corpname=parlaspeech3_rs"
moz-do-not-send="true">Serbian</a>, <a
href="https://www.clarin.si/ske/#concordance?corpname=parlaspeech3_cz"
moz-do-not-send="true">Czech</a> and <a
href="https://www.clarin.si/ske/#concordance?corpname=parlaspeech3_pl"
moz-do-not-send="true">Polish</a>), accompanied by a <a
href="https://clarinsi.github.io/parlaspeech/concordancer/concordancer-guide.html"
moz-do-not-send="true">tutorial on how to query them</a>.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>Supporting
SSH
researchers in
working with
large language
models</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">As
part of the
newly
established <a
href="https://llms4ssh.clarin-pl.eu/" moz-do-not-send="true">LLMs4SSH</a>
CLARIN
Knowledge
Centre, we
contributed
expertise to
help
researchers in
the social
sciences and
humanities
navigate the
rapidly
evolving
landscape of
large language
models. Our
contributions
included an <a
href="https://www.clarin.si/info/k-centres/llms4ssh-clarin-k-centre-for-large-language-models-in-ssh/"
moz-do-not-send="true">overview of Slovenian activities, technologies,
and datasets
related to LLM
development</a>;
a <a
href="https://arxiv.org/abs/2510.24450" moz-do-not-send="true">proposal
for a new
taxonomy for
LLM evaluation
datasets</a>;
and a concept
for a European
database
offering a
clear map of
available
resources by
language and
evaluation
task.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>Models
and datasets
on Hugging
Face</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">This
year, numerous
new models and
datasets were
released to
the <a
href="https://huggingface.co/classla" moz-do-not-send="true">CLASSLA
Hugging Face
page</a>,
including the
first <a
href="https://huggingface.co/classla/multilingual-IPTC-news-topic-classifier"
moz-do-not-send="true">openly-available multilingual IPTC news topic
classifier</a>,
which has
already
surpassed
600,000
downloads. We
are thrilled
to see such
strong uptake
and will
continue
expanding the
collection of
openly
accessible
tools and
corpora for
South Slavic
languages and
beyond.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;"><b>Looking
ahead</b></p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">As
we reflect on
this year’s
achievements,
we extend our
sincere thanks
to all team
members and
collaborators
who have
contributed to
our
activities,
and to the
users who
uptake on our
resources.
Your
engagement and
feedback drive
our continued
commitment to
supporting
linguistic
research and
technology
development
for South
Slavic
languages.</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">We
look forward
to another
productive
year filled
with exciting
advances and
new
collaborations.
Wishing you a
successful and
inspiring year
ahead!</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Best
wishes,</p>
<p dir="ltr"
style="line-height:1.38;text-align: justify;margin-top:12pt;margin-bottom:12pt;">Nikola,
Taja, and many
other
CLASSLAers</p>
</td>
</tr>
<tr>
<td
height="30"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td
style="line-height: 10px;
min-height: 10px;"
height="10"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table width="640" cellspacing="0" cellpadding="0"
border="0" bgcolor="#e6f4ff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px; min-width:
640px;" width="640" cellspacing="0"
cellpadding="0" border="0"
bgcolor="#e6f4ff" align="center">
<tbody>
<tr>
<td>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td
style="line-height: 20px;
min-height: 20px;"
height="20"> </td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;"
align="center">
<table width="100%"
cellspacing="0"
cellpadding="0" border="0"
align="center">
<tbody>
<tr>
<td
style="font-family:
Poppins,sans-serif;
font-size: 14px;
font-weight: bold;
line-height: 21px;
color: #111111;"
align="left"><a
href="https://www.clarin.si/info/k-centre/" target="_blank"
rel="noopener
noreferrer"
moz-do-not-send="true">CLASSLA: The Knowledge Centre for South Slavic
Languages</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td height="10"> </td>
</tr>
</tbody>
</table>
<table
style="width: 640px;
min-width: 640px;" width="640"
cellspacing="0" cellpadding="0"
border="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;"
align="center">
<table width="100%"
cellspacing="0"
cellpadding="0" border="0"
align="center">
<tbody>
<tr>
<td align="center">
<table
style="width:
267px; min-width:
267px;"
width="267"
cellspacing="0"
cellpadding="0"
border="0"
align="left">
<tbody>
<tr>
<td
id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477footerText-10"
style="font-family: Poppins,sans-serif; font-size: 12px; line-height:
18px; color:
#111111;"
align="left">
<p
style="margin-top:
0px;
margin-bottom:
10px;"><a
href="http://clarin.si/" target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">CLARIN.SI</a></p>
<p
style="margin-top:
0px;
margin-bottom:
10px;">Jožef
Stefan
Institute</p>
<p
style="margin-top:
0px;
margin-bottom:
0px;">Jamova
cesta 39,
Ljubljana<br>
Slovenia</p>
</td>
</tr>
<tr>
<td
height="25"> </td>
</tr>
</tbody>
</table>
<table
style="width:
267px; min-width:
267px;"
width="267"
cellspacing="0"
cellpadding="0"
border="0"
align="right">
<tbody>
<tr>
<td
id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477footerUnsubscribeText-10"
style="font-family: Poppins,sans-serif; font-size: 12px; line-height:
18px; color:
#111111;"
align="right">
<p
style="margin-top:
0px;
margin-bottom:
0px;"><br>
<span
style="font-size:
10px;"></span></p>
</td>
</tr>
<tr>
<td
height="10"> </td>
</tr>
<tr>
<td
style="font-family:
Poppins,sans-serif; font-size: 12px; line-height: 18px; color: #111111;"
align="right"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div dir="ltr"> </div>
</div>
</div>
<div class="pre"
style="margin: 0; padding: 0; font-family:
monospace;"> </div>
</body>
</html>