[CLASSLA] Second CfP: 13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026, Budapest, Hungary, 29 Oct, 2026

Nikola Ljubešić nljubesi at gmail.com
Fri Jun 5 16:37:25 CEST 2026


Second Call for Papers
13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026, Budapest, Hungary, 29 Oct,
2026
https://wackyworkshop.org

The World Wide Web has evolved from a resource for building linguistic
corpora into the central data infrastructure powering modern natural
language processing and Large Language Models (LLMs). As web-scale data
increasingly shapes AI systems’ knowledge and capabilities, understanding
its quality, representativeness, and ethical implications has become
critical.

At the same time, the “more is better” paradigm is being challenged by
issues such as machine-generated content, data toxicity, limited metadata,
and the under-representation of many languages and domains. These
challenges call for a shift toward Data-Centric AI, focusing on the
curation, analysis, and responsible use of web-derived data.

The 13th Web-as-Corpus (WaC-13) workshop provides a multidisciplinary forum
for research addressing the full lifecycle of web data. We invite
submissions on methods, resources, and applications related to web corpora,
with special emphasis on multilingual data and less-resourced languages.

Topics of interest include (but are not limited to):

* Creation and evaluation of high-quality datasets for foundation models
(e.g., data collection, filtering, enrichment, language identification)
* Use of web data in empirical linguistic research
* Analysis of web-scale corpora for quality, representativeness, and
societal insights
* Ethical and legal aspects of collecting, sharing, and using web data

By bringing together researchers from NLP, linguistics, and the social
sciences, WaC aims to advance best practices for one of the field’s most
influential data sources.

Important dates

Direct paper submission deadline
7 August, 2026

Pre-reviewed ARR commitment deadline
1 September, 2026

Notification of acceptance
5 September, 2026

Camera-ready paper due
20 September, 2026

Workshop date
29 Oct, 2026

Submissions

Submit your papers through
https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13 or through ARR
commitment
https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13_ARR_Commitment.

Workshop Organizers

Nikola Ljubešić, Jožef Stefan Institute, Slovenia
Yves Scherrer, University of Oslo, Norway
Laurie Burchell, Common Crawl Foundation
Veronika Laippala, University of Turku, Finland
Pedro Ortiz Suarez, Common Crawl Foundation
Thom Vaughan, Common Crawl Foundation
Vuk Dinić, Jožef Stefan Institute, Slovenia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.ijs.si/pipermail/classla/attachments/20260605/529a85bf/attachment.htm>


More information about the CLASSLA mailing list