[CLASSLA] Some last-minute holiday greetings
Tanja Samardzic
tanja.samardzic at uzh.ch
Fri Dec 29 13:31:57 CET 2023
Great to see this summary, thanks a lot, Nikola!
Anybody planning to evaluate YugoGPT on the COPA-HR
<https://www.clarin.si/repository/xmlui/handle/11356/1404>, COPA-SR
<https://www.clarin.si/repository/xmlui/handle/11356/1708> data sets so
that we can insert it in this chart (by Nikola)?
**
And how about testing these models on Aleksa's eval?
I think it's important for the community to have comparable scores.
Happy holidays!
Tanja
On 25.12.23 10:44, Aleksa Gordić wrote:
> Nikola it was a huge pleasure to work with you! Thank you for the kind
> words, we're just getting started! :))
>
> I'm still in crunch mode but you might be able to play with YugoGPT
> already a bit later today.
>
> Btw, I had a call with Andrija who's fine-tuning Whisper, his story is
> quite interesting as he works in a library and he's a philosopher "by
> trade". He picked up ML along the way by going through a HuggingFace
> course.
>
> This is a really cool report, thanks for including me.
>
> -Aleksa
>
> On Sun, Dec 24, 2023 at 1:58 PM Nikola Ljubešić <nljubesi at gmail.com>
> wrote:
>
> This year has been extremely packed with activities, hence this
> very last-minute cheer - we are on a good track to become much
> less of a less-resourced language family! I give a few examples
> that come to mind first.
>
> - Macedonian has arrived to Universal Dependencies 🥳
> <https://universaldependencies.org/treebanks/mk_mtb/index.html>🥳🥳
> thanks to Vladimir Cvetkoski, this may be "only" 155 sentences and
> 1.360 tokens, but, hey - it is infinitely more than there was
> before. Bravo, Vladimir!
>
> - CLASSLA followed the great example of Vladimir and decided to
> publish SETimes.MK <http://hdl.handle.net/11356/1886> in its
> current status as version 0.1, 570 sentences and 13.310 tokens in
> size, annotated on XPOS, UPOS, FEATS and LEMMA level, to give
> additional momentum to the positively developing situation for
> Macedonian.
>
> - In Slovenia the PoVeJMo project <https://www.cjvt.si/povejmo/>
> has started, focused on adapting an LLM to Slovenian language in
> general, as well as adapting it to a series of industrial use cases.
>
> - Andrija Sagić, a multimedia enthusiast, is seriously biting in
> the speech apple, additionally fine-tuning the really great
> whisper-large-v3 model
> <https://huggingface.co/Sagicc/whisper-large-v3-sr-cmb> on all the
> data he can scrape together for Serbian, which mostly includes our
> Južne Vesti dataset <http://hdl.handle.net/11356/1679>. We are now
> working with Andrija on improving the dataset (quite many typos in
> the human transcript!) and are looking forward to jointly
> publishing a version 2.0. This is the type of collaboration we are
> very much in need of!
>
> - The ReLDI team has started, together with ICEF, Belgrade, the
> industry-funded (you do not see many of those!) ComText.SR project
> <https://icef-nlp.github.io/COMtext.SR/> on collecting, curating,
> annotating and publicly releasing textual data for various domains
> of special interest to the industry.
>
> - The JeRTeh society has started publishing transformer models for
> Serbian, the first two models being named Vrabac
> <https://huggingface.co/jerteh/gpt2-vrabac> and Orao
> <https://huggingface.co/jerteh/gpt2-orao>. You guess which is the
> bigger one. :-) We were told there will be additional models
> coming from that direction and we are very much looking forward to
> those!
>
> - You might have followed on social media the most productive
> project I have ever seen - the yugoGPT model
> <https://www.linkedin.com/posts/aleksagordic_well-its-official-yugogpt-7b-significantly-activity-7143209223722627072-0s9Y/>
> - work of Aleksa Gordić. We were happy to be able to support
> Aleksa at least on the data and some discussion front. It was not
> easy to keep up with that guy! Wow! We really hope this is not
> Aleksa's (first? and) last HBS LLM rodeo!
>
> We wish you calm and relaxing holidays!
>
> The CLASSLA team
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20231229/ff348fc5/attachment.htm>
More information about the CLASSLA
mailing list