Roland Schäfer (2016). On Bias-free Crawling and Representative Web Corpora. In Proceedings of the 10th Web as Corpus Workshop (WAC-X). [BibTeX]
Category Archives: Papers
Accurate and Efficient General-Purpose Boilerplate Detection for Crawled Web Corpora (2016)
CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws (2016)
Processing and Querying Large Web Corpora with the COW14 Architecture (2015)
Roland Schäfer. Processing and Querying Large Web Corpora with the COW14 Architecture. In Proceedings of Challenges in the Management of Large Corpora (CMLC-3) (IDS publication server). 28–34. [BibTeX]
Die Kurzformen des Indefinitartikels im Deutschen (2014)
Roland Schäfer & Ulrike Sayatz (2014) Die Kurzformen des Indefinitartikels im Deutschen (Cliticization of the indefinite article in German). Zeitschrift für Sprachwissenschaft (ZS) 33(2). [BibTeX]
Focused Web Corpus Crawling (2014)
The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction (2013)
Scalable Construction of High-quality Web Corpora (2013)
Building Large Corpora from the Web Using a New Efficient Tool Chain (2012)
Roland Schäfer & Felix Bildhauer (2012) Building Large Corpora from the Web Using a New Efficient Tool Chain. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). 486–493. [BibTeX]
Please cite this paper if you use the COW corpora up to version COW16.