<!DOCTYPE html><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<br>
Great to see this summary, thanks a lot, Nikola!<br>
<br>
Anybody planning to evaluate YugoGPT on the <a href="https://www.clarin.si/repository/xmlui/handle/11356/1404">COPA-HR</a>,
<a href="https://www.clarin.si/repository/xmlui/handle/11356/1708">COPA-SR</a>
data sets so that we can insert it in this chart (by Nikola)?<br>
<br>
<b id="docs-internal-guid-efc48c3a-7fff-03ce-506c-85556b40f253" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none; font-weight: normal;"><img src="https://lh7-us.googleusercontent.com/DHmnEj0P8zb1G1VixwEZcQDI4Ikvhg0ruQAg3c6tEAC8Du2XOG6lcA3Ga-eHjpRs_mBAc8G3TA_E1EudPWrhnI0ZfwM4dluoxMPQKBdtJ1VA3ux3LcPHkfwnc4_46HBghDd5AnKBJYioPX-uwhQq28DZLQ=s2048" width="494" height="307"></b><br>
<br>
<div class="moz-cite-prefix"><br>
And how about testing these models on Aleksa's eval?<br>
<br>
I think it's important for the community to have comparable
scores. <br>
<br>
Happy holidays!<br>
<br>
Tanja <br>
<br>
<br>
On 25.12.23 10:44, Aleksa Gordić wrote:<br>
</div>
<blockquote type="cite" cite="mid:CAB=Ynf3H0Z88egbuO1a-KNm-fCCuWsdt1H3uhyc6Z3uPhsKfuQ@mail.gmail.com">
<div dir="ltr">Nikola it was a huge pleasure to work with you!
Thank you for the kind words, we're just getting started! :))
<div><br>
</div>
<div>I'm still in crunch mode but you might be able to play with
YugoGPT already a bit later today.</div>
<div><br>
</div>
<div>Btw, I had a call with Andrija who's fine-tuning Whisper,
his story is quite interesting as he works in a library and
he's a philosopher "by trade". He picked up ML along the way
by going through a HuggingFace course.</div>
<div><br>
</div>
<div>This is a really cool report, thanks for including me.</div>
<div><br>
</div>
<div>-Aleksa</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Dec 24, 2023 at
1:58 PM Nikola Ljubešić <<a href="mailto:nljubesi@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">nljubesi@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">This year has been extremely packed with
activities, hence this very last-minute cheer - we are on a
good track to become much less of a less-resourced language
family! I give a few examples that come to mind first.
<div><br>
</div>
<div>- <a href="https://universaldependencies.org/treebanks/mk_mtb/index.html" target="_blank" moz-do-not-send="true">Macedonian has
arrived to Universal Dependencies 🥳</a>🥳🥳 thanks to
Vladimir Cvetkoski, this may be "only" 155 sentences and
1.360 tokens, but, hey - it is infinitely more than there
was before. Bravo, Vladimir!</div>
<div><br>
</div>
<div>- CLASSLA followed the great example of Vladimir and
decided to <a href="http://hdl.handle.net/11356/1886" target="_blank" moz-do-not-send="true">publish
SETimes.MK</a> in its current status as version 0.1, 570
sentences and 13.310 tokens in size, annotated on XPOS,
UPOS, FEATS and LEMMA level, to give additional momentum
to the positively developing situation for Macedonian.</div>
<div><br>
</div>
<div>- In Slovenia <a href="https://www.cjvt.si/povejmo/" target="_blank" moz-do-not-send="true">the PoVeJMo
project</a> has started, focused on adapting an LLM to
Slovenian language in general, as well as adapting it to a
series of industrial use cases.</div>
<div><br>
</div>
<div>- Andrija Sagić, a multimedia enthusiast, is seriously
biting in the speech apple, <a href="https://huggingface.co/Sagicc/whisper-large-v3-sr-cmb" target="_blank" moz-do-not-send="true">additionally
fine-tuning the really great whisper-large-v3 model</a>
on all the data he can scrape together for Serbian, which
mostly includes <a href="http://hdl.handle.net/11356/1679" target="_blank" moz-do-not-send="true">our Južne Vesti dataset</a>. We
are now working with Andrija on improving the dataset
(quite many typos in the human transcript!) and are
looking forward to jointly publishing a version 2.0. This
is the type of collaboration we are very much in need of!</div>
<div><br>
</div>
<div>- The ReLDI team has started, together with ICEF,
Belgrade, the industry-funded (you do not see many of
those!) <a href="https://icef-nlp.github.io/COMtext.SR/" target="_blank" moz-do-not-send="true">ComText.SR
project</a> on collecting, curating, annotating and
publicly releasing textual data for various domains of
special interest to the industry.</div>
<div><br>
</div>
<div>- The JeRTeh society has started publishing transformer
models for Serbian, the first two models being named <a href="https://huggingface.co/jerteh/gpt2-vrabac" target="_blank" moz-do-not-send="true">Vrabac</a> and <a href="https://huggingface.co/jerteh/gpt2-orao" target="_blank" moz-do-not-send="true">Orao</a>. You
guess which is the bigger one. :-) We were told there will
be additional models coming from that direction and we are
very much looking forward to those!</div>
<div><br>
</div>
<div>- You might have followed on social media the most
productive project I have ever seen - the <a href="https://www.linkedin.com/posts/aleksagordic_well-its-official-yugogpt-7b-significantly-activity-7143209223722627072-0s9Y/" target="_blank" moz-do-not-send="true">yugoGPT model</a>
- work of Aleksa Gordić. We were happy to be able to
support Aleksa at least on the data and some discussion
front. It was not easy to keep up with that guy! Wow! We
really hope this is not Aleksa's (first? and) last HBS LLM
rodeo!</div>
<div><br>
</div>
<div>We wish you calm and relaxing holidays!</div>
<div><br>
</div>
<div>The CLASSLA team</div>
<div><br>
</div>
</div>
</blockquote>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
</blockquote>
<br>
</body>
</html>