<div dir="ltr">Nikola it was a huge pleasure to work with you! Thank you for the kind words, we're just getting started! :))<div><br></div><div>I'm still in crunch mode but you might be able to play with YugoGPT already a bit later today.</div><div><br></div><div>Btw, I had a call with Andrija who's fine-tuning Whisper, his story is quite interesting as he works in a library and he's a philosopher "by trade". He picked up ML along the way by going through a HuggingFace course.</div><div><br></div><div>This is a really cool report, thanks for including me.</div><div><br></div><div>-Aleksa</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Dec 24, 2023 at 1:58 PM Nikola Ljubešić <<a href="mailto:nljubesi@gmail.com">nljubesi@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">This year has been extremely packed with activities, hence this very last-minute cheer - we are on a good track to become much less of a less-resourced language family! I give a few examples that come to mind first.<div><br></div><div>- <a href="https://universaldependencies.org/treebanks/mk_mtb/index.html" target="_blank">Macedonian has arrived to Universal Dependencies 🥳</a>🥳🥳 thanks to Vladimir Cvetkoski, this may be "only" 155 sentences and 1.360 tokens, but, hey - it is infinitely more than there was before. Bravo, Vladimir!</div><div><br></div><div>- CLASSLA followed the great example of Vladimir and decided to <a href="http://hdl.handle.net/11356/1886" target="_blank">publish SETimes.MK</a> in its current status as version 0.1, 570 sentences and 13.310 tokens in size, annotated on XPOS, UPOS, FEATS and LEMMA level, to give additional momentum to the positively developing situation for Macedonian.</div><div><br></div><div>- In Slovenia <a href="https://www.cjvt.si/povejmo/" target="_blank">the PoVeJMo project</a> has started, focused on adapting an LLM to Slovenian language in general, as well as adapting it to a series of industrial use cases.</div><div><br></div><div>- Andrija Sagić, a multimedia enthusiast, is seriously biting in the speech apple, <a href="https://huggingface.co/Sagicc/whisper-large-v3-sr-cmb" target="_blank">additionally fine-tuning the really great whisper-large-v3 model</a> on all the data he can scrape together for Serbian, which mostly includes <a href="http://hdl.handle.net/11356/1679" target="_blank">our Južne Vesti dataset</a>. We are now working with Andrija on improving the dataset (quite many typos in the human transcript!) and are looking forward to jointly publishing a version 2.0. This is the type of collaboration we are very much in need of!</div><div><br></div><div>- The ReLDI team has started, together with ICEF, Belgrade, the industry-funded (you do not see many of those!) <a href="https://icef-nlp.github.io/COMtext.SR/" target="_blank">ComText.SR project</a> on collecting, curating, annotating and publicly releasing textual data for various domains of special interest to the industry.</div><div><br></div><div>- The JeRTeh society has started publishing transformer models for Serbian, the first two models being named <a href="https://huggingface.co/jerteh/gpt2-vrabac" target="_blank">Vrabac</a> and <a href="https://huggingface.co/jerteh/gpt2-orao" target="_blank">Orao</a>. You guess which is the bigger one. :-) We were told there will be additional models coming from that direction and we are very much looking forward to those!</div><div><br></div><div>- You might have followed on social media the most productive project I have ever seen - the <a href="https://www.linkedin.com/posts/aleksagordic_well-its-official-yugogpt-7b-significantly-activity-7143209223722627072-0s9Y/" target="_blank">yugoGPT model</a> - work of Aleksa Gordić. We were happy to be able to support Aleksa at least on the data and some discussion front. It was not easy to keep up with that guy! Wow! We really hope this is not Aleksa's (first? and) last HBS LLM rodeo!</div><div><br></div><div>We wish you calm and relaxing holidays!</div><div><br></div><div>The CLASSLA team</div><div><br></div></div>
</blockquote></div>