<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt; font-family: Verdana,Geneva,sans-serif'>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<table border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td>
<table border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;" align="center">
<table border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td align="left" valign="middle" width="110"><img id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477logoBlock-4" class="gmail_canned_response_image" style="display: block;" src="https://bucket.mlcdn.com/a/3476/3476114/images/d542a766ebbbc112d5bc5d9e40be271b526a92c6.jpeg" width="110" border="0" /></td>
<td width="20" height="1"> </td>
<td align="right" valign="middle">
<table border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="font-family: Poppins,sans-serif; font-size: 21px; line-height: 31.5px; font-weight: bold; color: #0080ad;" align="right">CLASSLA Mailing List</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="line-height: 10px; min-height: 10px;" height="10"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td align="center">
<table style="border-top: 3px double #ededf3; border-collapse: initial;" border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="line-height: 0px; min-height: 0px;" height="0"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#ffffff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="line-height: 10px; min-height: 10px;" height="10"> </td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;" align="center">
<table style="border-radius: 2px;" border="0" width="560" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px; border: 1px solid #e6e6e6; border-radius: 2px;" align="center" bgcolor="#FCFCFC">
<table border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td height="30"> </td>
</tr>
<tr>
<td id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477bodyText-8" style="font-family: Poppins,sans-serif; font-size: 14px; line-height: 21px; color: #000000;">
<p><span style="font-weight: 400;">Dear all,</span></p>
<p><span style="font-weight: 400;">We wanted to share with you our recent results on speech processing, something we mentioned will be one of our foci in 2022.</span></p>
<p><span style="font-weight: 400;">We released two speech datasets. One is in Croatian, the </span><a href="http://hdl.handle.net/11356/1494" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">ParlaSpeech-HR dataset</span></a><span style="font-weight: 400;">, 1816 hours of recordings in size, with accompanying transcriptions and speaker metadata. The dataset is based on the </span><a href="http://hdl.handle.net/11356/1432" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">ParlaMint corpus</span></a><span style="font-weight: 400;"> of Croatian parliamentary proceedings. The other dataset is in Serbian, the </span><a href="http://hdl.handle.net/11356/1679" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">JuzneVesti-SR dataset</span></a><span style="font-weight: 400;">, “only” 50 hours in size. It consists of audio recordings and transcripts from the Južne Vesti website and its host show called </span><a href="https://www.juznevesti.com/Tagovi/Intervju-15-minuta.sr.html" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">15 minuta</span></a><span style="font-weight: 400;">, with speaker metadata available as well. With each of the datasets, we released also automatic speech recognition (ASR) models on HuggingFace, </span><a href="https://huggingface.co/models?search=parlaspeech" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">four Croatian ASR models</span></a><span style="font-weight: 400;"> for the ParlaSpeech-HR dataset, with excellent (but in-domain) word error rate of only 4%, and for now </span><a href="https://huggingface.co/classla/wav2vec2-xls-r-juznevesti-sr" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">one Serbian ASR model</span></a><span style="font-weight: 400;"> for the JuzneVesti-SR dataset. You are more than welcome to take any of the models or data (all are available under CC-BY-SA). Interestingly, our speech-related efforts were very quickly picked up by the industry as well, featuring our speech and text technologies </span><a href="https://www.neos.hr/neos-blog-can-ai-understand-croatian-parliment-asr-model/" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">in a recent blog</span></a><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">We also published two papers, one on the </span><a href="http://www.lrec-conf.org/proceedings/lrec2022/workshops/ParlaCLARINIII/pdf/2022.parlaclariniii-1.16.pdf" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">overall approach to building the ParlaSpeech-HR dataset</span></a><span style="font-weight: 400;">, another on performing </span><a href="https://nl.ijs.si/jtdh22/pdf/JTDH2022_Ljubesic-et-al_The-ParlaSpeech-HR-benchmark-for-speaker-profiling-in-Croatian.pdf" target="_blank" rel="noopener noreferrer"><span style="font-weight: 400;">benchmarking for user profiling over the ParlaSpeech-HR dataset</span></a><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">Given the recent successes in acquiring funding for performing more research on spoken data, in the following years we will be researching many super-interesting speech-related tasks, including:</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">word-level clustering of types of pronunciation and extraction of prototypical pronunciations</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">linguistic processing of transcripts of spoken data, potentially informed by the speech signal itself</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">disfluency identification and classification</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">dialogue act classification</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">identifying ways to build large and cheap spoken corpora of South Slavic languages</span></li>
</ul>
<p><span style="font-weight: 400;">Please do get in touch if you are interested, or already working on speech. Also, we invite similar e-mails – drafting future activities – from other sides as well! We need coordination between different efforts, something we discussed to great length in our <a href="https://www.degruyter.com/document/doi/10.1515/9783110767377-017/html" target="_blank" rel="noopener noreferrer">recently published book chapter</a>.</span></p>
<p><span style="font-weight: 400;">Best regards,</span></p>
<p><span style="font-weight: 400;">Nikola and Taja</span></p>
</td>
</tr>
<tr>
<td height="30"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="line-height: 10px; min-height: 10px;" height="10"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#e6f4ff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center" bgcolor="#e6f4ff">
<tbody>
<tr>
<td>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="line-height: 20px; min-height: 20px;" height="20"> </td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;" align="center">
<table border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="font-family: Poppins,sans-serif; font-size: 14px; font-weight: bold; line-height: 21px; color: #111111;" align="left"><a href="https://www.clarin.si/info/k-centre/" target="_blank" rel="noopener noreferrer">CLASSLA: The Knowledge Centre for South Slavic Languages</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td height="10"> </td>
</tr>
</tbody>
</table>
<table style="width: 640px; min-width: 640px;" border="0" width="640" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td style="padding: 0px 40px;" align="center">
<table border="0" width="100%" cellspacing="0" cellpadding="0" align="center">
<tbody>
<tr>
<td align="center">
<table style="width: 267px; min-width: 267px;" border="0" width="267" cellspacing="0" cellpadding="0" align="left">
<tbody>
<tr>
<td id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477footerText-10" style="font-family: Poppins,sans-serif; font-size: 12px; line-height: 18px; color: #111111;" align="left">
<p style="margin-top: 0px; margin-bottom: 10px;"><a href="http://clarin.si/" target="_blank" rel="noopener noreferrer">CLARIN.SI</a></p>
<p style="margin-top: 0px; margin-bottom: 10px;">Jožef Stefan Institute</p>
<p style="margin-top: 0px; margin-bottom: 0px;">Jamova cesta 39, Ljubljana<br />Slovenia</p>
</td>
</tr>
<tr>
<td height="25"> </td>
</tr>
</tbody>
</table>
<table style="width: 267px; min-width: 267px;" border="0" width="267" cellspacing="0" cellpadding="0" align="right">
<tbody>
<tr>
<td id="m_925030267947577449gmail-m_6504557075424313283gmail-m_-5089897522223699477footerUnsubscribeText-10" style="font-family: Poppins,sans-serif; font-size: 12px; line-height: 18px; color: #111111;" align="right">
<p style="margin-top: 0px; margin-bottom: 0px;"><br /><span style="font-size: 10px;"></span></p>
</td>
</tr>
<tr>
<td height="10"> </td>
</tr>
<tr>
<td style="font-family: Poppins,sans-serif; font-size: 12px; line-height: 18px; color: #111111;" align="right"> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div dir="ltr"> </div>
</div>
</div>
<div class="pre" style="margin: 0; padding: 0; font-family: monospace;"> </div>
</body></html>