<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <br>

    Great to see this summary, thanks a lot, Nikola!<br>

    <br>

    Anybody planning to evaluate YugoGPT on the <a href="https://www.clarin.si/repository/xmlui/handle/11356/1404">COPA-HR</a>,

    <a href="https://www.clarin.si/repository/xmlui/handle/11356/1708">COPA-SR</a>

    data sets so that we can insert it in this chart (by Nikola)?<br>

    <br>

    <b id="docs-internal-guid-efc48c3a-7fff-03ce-506c-85556b40f253" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none; font-weight: normal;"><img src="https://lh7-us.googleusercontent.com/DHmnEj0P8zb1G1VixwEZcQDI4Ikvhg0ruQAg3c6tEAC8Du2XOG6lcA3Ga-eHjpRs_mBAc8G3TA_E1EudPWrhnI0ZfwM4dluoxMPQKBdtJ1VA3ux3LcPHkfwnc4_46HBghDd5AnKBJYioPX-uwhQq28DZLQ=s2048" width="494" height="307"></b><br>

    <br>

    <div class="moz-cite-prefix"><br>

      And how about testing these models on Aleksa's eval?<br>

      <br>

      I think it's important for the community to have comparable

      scores. <br>

      <br>

      Happy holidays!<br>

      <br>

      Tanja <br>

      <br>

      <br>

      On 25.12.23 10:44, Aleksa Gordić wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:CAB=Ynf3H0Z88egbuO1a-KNm-fCCuWsdt1H3uhyc6Z3uPhsKfuQ@mail.gmail.com">

      <div dir="ltr">Nikola it was a huge pleasure to work with you!

        Thank you for the kind words, we're just getting started! :))

        <div><br>

        </div>

        <div>I'm still in crunch mode but you might be able to play with

          YugoGPT already a bit later today.</div>

        <div><br>

        </div>

        <div>Btw, I had a call with Andrija who's fine-tuning Whisper,

          his story is quite interesting as he works in a library and

          he's a philosopher "by trade". He picked up ML along the way

          by going through a HuggingFace course.</div>

        <div><br>

        </div>

        <div>This is a really cool report, thanks for including me.</div>

        <div><br>

        </div>

        <div>-Aleksa</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Sun, Dec 24, 2023 at

          1:58 PM Nikola Ljubešić <<a href="mailto:nljubesi@gmail.com" moz-do-not-send="true" class="moz-txt-link-freetext">nljubesi@gmail.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div dir="ltr">This year has been extremely packed with

            activities, hence this very last-minute cheer - we are on a

            good track to become much less of a less-resourced language

            family! I give a few examples that come to mind first.

            <div><br>

            </div>

            <div>- <a href="https://universaldependencies.org/treebanks/mk_mtb/index.html" target="_blank" moz-do-not-send="true">Macedonian has

                arrived to Universal Dependencies 🥳</a>🥳🥳 thanks to

              Vladimir Cvetkoski, this may be "only" 155 sentences and

              1.360 tokens, but, hey - it is infinitely more than there

              was before. Bravo, Vladimir!</div>

            <div><br>

            </div>

            <div>- CLASSLA followed the great example of Vladimir and

              decided to <a href="http://hdl.handle.net/11356/1886" target="_blank" moz-do-not-send="true">publish

                SETimes.MK</a> in its current status as version 0.1, 570

              sentences and 13.310 tokens in size, annotated on XPOS,

              UPOS, FEATS and LEMMA level, to give additional momentum

              to the positively developing situation for Macedonian.</div>

            <div><br>

            </div>

            <div>- In Slovenia <a href="https://www.cjvt.si/povejmo/" target="_blank" moz-do-not-send="true">the PoVeJMo

                project</a> has started, focused on adapting an LLM to

              Slovenian language in general, as well as adapting it to a

              series of industrial use cases.</div>

            <div><br>

            </div>

            <div>- Andrija Sagić, a multimedia enthusiast, is seriously

              biting in the speech apple, <a href="https://huggingface.co/Sagicc/whisper-large-v3-sr-cmb" target="_blank" moz-do-not-send="true">additionally

                fine-tuning the really great whisper-large-v3 model</a>

              on all the data he can scrape together for Serbian, which

              mostly includes <a href="http://hdl.handle.net/11356/1679" target="_blank" moz-do-not-send="true">our Južne Vesti dataset</a>. We

              are now working with Andrija on improving the dataset

              (quite many typos in the human transcript!) and are

              looking forward to jointly publishing a version 2.0. This

              is the type of collaboration we are very much in need of!</div>

            <div><br>

            </div>

            <div>- The ReLDI team has started, together with ICEF,

              Belgrade, the industry-funded (you do not see many of

              those!) <a href="https://icef-nlp.github.io/COMtext.SR/" target="_blank" moz-do-not-send="true">ComText.SR

                project</a> on collecting, curating, annotating and

              publicly releasing textual data for various domains of

              special interest to the industry.</div>

            <div><br>

            </div>

            <div>- The JeRTeh society has started publishing transformer

              models for Serbian, the first two models being named <a href="https://huggingface.co/jerteh/gpt2-vrabac" target="_blank" moz-do-not-send="true">Vrabac</a> and <a href="https://huggingface.co/jerteh/gpt2-orao" target="_blank" moz-do-not-send="true">Orao</a>. You

              guess which is the bigger one. :-) We were told there will

              be additional models coming from that direction and we are

              very much looking forward to those!</div>

            <div><br>

            </div>

            <div>- You might have followed on social media the most

              productive project I have ever seen -  the <a href="https://www.linkedin.com/posts/aleksagordic_well-its-official-yugogpt-7b-significantly-activity-7143209223722627072-0s9Y/" target="_blank" moz-do-not-send="true">yugoGPT model</a>

              - work of Aleksa Gordić. We were happy to be able to

              support Aleksa at least on the data and some discussion

              front. It was not easy to keep up with that guy! Wow! We

              really hope this is not Aleksa's (first? and) last HBS LLM

              rodeo!</div>

            <div><br>

            </div>

            <div>We wish you calm and relaxing holidays!</div>

            <div><br>

            </div>

            <div>The CLASSLA team</div>

            <div><br>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

    </blockquote>

    <br>

  </body>

</html>