[CLASSLA] Parliamentary ParlaCAP Dataset and CAP Topic Classifier

Taja Kuzman Pungeršek taja.kuzman at ijs.si
Fri Oct 17 11:13:04 CEST 2025


		
CLASSLA Mailing List

CLARIN.SI is pleased to announce the release of the ParlaCAP dataset 
<https://doi.org/10.23669/1ZTELP>: an extension of the ParlaMint 5.0 
<https://hdl.handle.net/11356/2004> collection enriched with sentiment 
and topic annotations, as well as extended metadata on parties and 
democracies. The dataset contains around 8 million speeches from 28 
European parliaments, and is provided in a tabular format, enhancing the 
usability of the ParlaMint corpora for social and political science 
research. As part of the OSCARS ParlaCAP project 
<https://oscars-project.eu/projects/parlacap-comparing-agenda-settings-across-parliaments-parlamint-dataset>, 
the dataset was published through the Croatian CESSDA node CROSSDA 
<https://www.crossda.hr/>, promoting thereby collaboration between 
infrastructures. We also released the multilingual topic classifier 
<https://huggingface.co/classla/ParlaCAP-Topic-Classifier> using the CAP 
(Comparative Agendas Project) labels, and tutorials for analysing 
ParlaCAP data in Python 
<https://github.com/clarinsi/ParlaCAP-Analysis-Tutorials>. More 
information is available here 
<https://www.clarin.eu/sites/default/files/18-Bazaar-Ljubesic.pdf>.

CLASSLA: The Knowledge Centre for South Slavic Languages 
<https://www.clarin.si/info/k-centre/>

CLARIN.SI <http://clarin.si/>

Jožef Stefan Institute

Jamova cesta 39, Ljubljana
Slovenia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20251017/03a9477a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EgnHzq0OLDrCZSz9.png
Type: image/png
Size: 174960 bytes
Desc: not available
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20251017/03a9477a/attachment-0001.png>


More information about the CLASSLA mailing list