[CLASSLA] Some last-minute holiday greetings

Tanja Samardzic tanja.samardzic at uzh.ch
Fri Dec 29 13:31:57 CET 2023


Great to see this summary, thanks a lot, Nikola!

Anybody planning to evaluate YugoGPT on the COPA-HR 
<https://www.clarin.si/repository/xmlui/handle/11356/1404>, COPA-SR 
<https://www.clarin.si/repository/xmlui/handle/11356/1708> data sets so 
that we can insert it in this chart (by Nikola)?

**


And how about testing these models on Aleksa's eval?

I think it's important for the community to have comparable scores.

Happy holidays!

Tanja


On 25.12.23 10:44, Aleksa Gordić wrote:
> Nikola it was a huge pleasure to work with you! Thank you for the kind 
> words, we're just getting started! :))
>
> I'm still in crunch mode but you might be able to play with YugoGPT 
> already a bit later today.
>
> Btw, I had a call with Andrija who's fine-tuning Whisper, his story is 
> quite interesting as he works in a library and he's a philosopher "by 
> trade". He picked up ML along the way by going through a HuggingFace 
> course.
>
> This is a really cool report, thanks for including me.
>
> -Aleksa
>
> On Sun, Dec 24, 2023 at 1:58 PM Nikola Ljubešić <nljubesi at gmail.com> 
> wrote:
>
>     This year has been extremely packed with activities, hence this
>     very last-minute cheer - we are on a good track to become much
>     less of a less-resourced language family! I give a few examples
>     that come to mind first.
>
>     - Macedonian has arrived to Universal Dependencies 🥳
>     <https://universaldependencies.org/treebanks/mk_mtb/index.html>🥳🥳
>     thanks to Vladimir Cvetkoski, this may be "only" 155 sentences and
>     1.360 tokens, but, hey - it is infinitely more than there was
>     before. Bravo, Vladimir!
>
>     - CLASSLA followed the great example of Vladimir and decided to
>     publish SETimes.MK <http://hdl.handle.net/11356/1886> in its
>     current status as version 0.1, 570 sentences and 13.310 tokens in
>     size, annotated on XPOS, UPOS, FEATS and LEMMA level, to give
>     additional momentum to the positively developing situation for
>     Macedonian.
>
>     - In Slovenia the PoVeJMo project <https://www.cjvt.si/povejmo/>
>     has started, focused on adapting an LLM to Slovenian language in
>     general, as well as adapting it to a series of industrial use cases.
>
>     - Andrija Sagić, a multimedia enthusiast, is seriously biting in
>     the speech apple, additionally fine-tuning the really great
>     whisper-large-v3 model
>     <https://huggingface.co/Sagicc/whisper-large-v3-sr-cmb> on all the
>     data he can scrape together for Serbian, which mostly includes our
>     Južne Vesti dataset <http://hdl.handle.net/11356/1679>. We are now
>     working with Andrija on improving the dataset (quite many typos in
>     the human transcript!) and are looking forward to jointly
>     publishing a version 2.0. This is the type of collaboration we are
>     very much in need of!
>
>     - The ReLDI team has started, together with ICEF, Belgrade, the
>     industry-funded (you do not see many of those!) ComText.SR project
>     <https://icef-nlp.github.io/COMtext.SR/> on collecting, curating,
>     annotating and publicly releasing textual data for various domains
>     of special interest to the industry.
>
>     - The JeRTeh society has started publishing transformer models for
>     Serbian, the first two models being named Vrabac
>     <https://huggingface.co/jerteh/gpt2-vrabac> and Orao
>     <https://huggingface.co/jerteh/gpt2-orao>. You guess which is the
>     bigger one. :-) We were told there will be additional models
>     coming from that direction and we are very much looking forward to
>     those!
>
>     - You might have followed on social media the most productive
>     project I have ever seen -  the yugoGPT model
>     <https://www.linkedin.com/posts/aleksagordic_well-its-official-yugogpt-7b-significantly-activity-7143209223722627072-0s9Y/>
>     - work of Aleksa Gordić. We were happy to be able to support
>     Aleksa at least on the data and some discussion front. It was not
>     easy to keep up with that guy! Wow! We really hope this is not
>     Aleksa's (first? and) last HBS LLM rodeo!
>
>     We wish you calm and relaxing holidays!
>
>     The CLASSLA team
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20231229/ff348fc5/attachment.htm>


More information about the CLASSLA mailing list