<div dir="ltr">Dear all,<div><br></div><div>We were keeping rather silent for some time now due to many developments that required our full capacity. But you can expect reports on interesting resources, tools, and experiments in the following months!</div><div><br></div><div>We were, however, not the only ones who were very busy in the previous period. Philipp Wasserscheidt has recently published the PDRS web corpus of Serbian language, 715 million tokens in size. You can find more details on the corpus in the <a href="http://CLARIN.SI">CLARIN.SI</a> repository entry (<a href="http://hdl.handle.net/11356/1752">http://hdl.handle.net/11356/1752</a>) where the corpus is available for download. The corpus is also available via the <a href="http://CLARIN.SI">CLARIN.SI</a> concordancers (NoSkE link is <a href="https://www.clarin.si/noske/run.cgi/corp_info?corpname=pdrs10&struct_attr_stats=1">https://www.clarin.si/noske/run.cgi/corp_info?corpname=pdrs10&struct_attr_stats=1</a>).</div><div><br></div><div>Philipp is also making sure that future users know how to use the corpus. This is slightly last-minute, but maybe still not too late for some of you - a workshop on the PDRS web corpus usage will be held from this Thursday to Saturday in Belgrade. More information is available at <a href="https://javnidiskurs.rs/poziv-na-radionicu-pdrs-1-0/">https://javnidiskurs.rs/poziv-na-radionicu-pdrs-1-0/</a>.<br></div><div><br></div><div>Since we are on the topic of web corpora, we have two pieces of news to share right away as well:</div><div><br></div><div>1. I have taken one of the leading roles in the ACL Special Interest Group for Web as a Corpus (SIGWAC). If you are interested in this area of research, you should join the SIG by signing up to the mailing list at <a href="http://devel.sslmit.unibo.it/mailman/listinfo/sigwac">http://devel.sslmit.unibo.it/mailman/listinfo/sigwac</a>.</div><div><br></div><div>2. We are in the process of releasing the MaCoCu datasets, which are web crawls of various national top-level domains, including those of Slovenia, Croatia, Bosnia and Herzegovina, Montenegro, Serbia, Macedonia and Bulgaria. We are sharing here the link just to the Macedonian dataset - <a href="http://hdl.handle.net/11356/1801">http://hdl.handle.net/11356/1801</a>. Linguistic processing of the datasets has just started, and will result in the CLASSLA web corpora, to be updated on a biyearly basis.</div><div><br></div><div>Warm greetings from Zagreb (if anyone will be in Dubrovnik next week for EACL, let me know, we might do a meet-up, similar to JTDH in Ljubljana),</div><div><br></div><div>Nikola</div></div>