[CLASSLA] Parliamentary ParlaCAP Dataset and CAP Topic Classifier
Taja Kuzman Pungeršek
taja.kuzman at ijs.si
Fri Oct 17 11:13:04 CEST 2025
CLASSLA Mailing List
CLARIN.SI is pleased to announce the release of the ParlaCAP dataset
<https://doi.org/10.23669/1ZTELP>: an extension of the ParlaMint 5.0
<https://hdl.handle.net/11356/2004> collection enriched with sentiment
and topic annotations, as well as extended metadata on parties and
democracies. The dataset contains around 8 million speeches from 28
European parliaments, and is provided in a tabular format, enhancing the
usability of the ParlaMint corpora for social and political science
research. As part of the OSCARS ParlaCAP project
<https://oscars-project.eu/projects/parlacap-comparing-agenda-settings-across-parliaments-parlamint-dataset>,
the dataset was published through the Croatian CESSDA node CROSSDA
<https://www.crossda.hr/>, promoting thereby collaboration between
infrastructures. We also released the multilingual topic classifier
<https://huggingface.co/classla/ParlaCAP-Topic-Classifier> using the CAP
(Comparative Agendas Project) labels, and tutorials for analysing
ParlaCAP data in Python
<https://github.com/clarinsi/ParlaCAP-Analysis-Tutorials>. More
information is available here
<https://www.clarin.eu/sites/default/files/18-Bazaar-Ljubesic.pdf>.
CLASSLA: The Knowledge Centre for South Slavic Languages
<https://www.clarin.si/info/k-centre/>
CLARIN.SI <http://clarin.si/>
Jožef Stefan Institute
Jamova cesta 39, Ljubljana
Slovenia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20251017/03a9477a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EgnHzq0OLDrCZSz9.png
Type: image/png
Size: 174960 bytes
Desc: not available
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20251017/03a9477a/attachment-0001.png>
More information about the CLASSLA
mailing list