Roland Schäfer & Ulrike Sayatz (2014) Die Kurzformen des Indefinitartikels im Deutschen (Cliticization of the indefinite article in German). Zeitschrift für Sprachwissenschaft (ZS) 33(2). [BibTeX]
Tag Archives: Web corpora
Focused Web Corpus Crawling (Proc WAC)
Token-level noise in large Web corpora and non-destructive normalization for linguistic applications (2013)
Felix Bildhauer & Roland Schäfer: Token-level noise in large Web corpora and non-destructive normalization for linguistic applications. Corpus Analysis with Noise in the Signal (CANS 2013). Corpus Linguistics 2013, Lancaster.
Linguistic research with large annotated web corpora
Linguistic research with large annotated web corpora (2013). Pre-conference tutorial, The 20th International Conference on Head-Driven Phrase Structure Grammar, Berlin, August 26, 2013, 9:30 – 16:00
COW Tutorial: Scripts
COW Tutorial: Slides
COW Tutorial: Worksheet
The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction (Proc WAC)
Web Corpus Construction (Morgan & Claypool)
Roland Schäfer & Felix Bildhauer (2013) Web Corpus Construction. Morgan and Claypool. [BibTeX]
Websites: Morgan & Claypool (official), Companion web site (additional information, errata, etc.)
Reviews: Serge Sharoff in Computational Linguistics 41(1) (2015), Mats Wirén in Nordic Journal of Linguistics 37, 03 (2014)
Scalable Construction of High-quality Web Corpora (JLTCL)
Building Large Corpora from the Web Using a New Efficient Tool Chain (Proc LREC)
Roland Schäfer & Felix Bildhauer (2012) Building Large Corpora from the Web Using a New Efficient Tool Chain. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). 486–493. [BibTeX]
Please cite this paper if you use the COW corpora up to version COW16.
Why only web corpora provide solutions to certain linguistic problems (2012)
Roland Schäfer & Felix Bildhauer. Why only web corpora provide solutions to certain linguistic problems. Annual conference of SLE 2012, Stockholm.
Building large corpora from the web (ESSLLI 2012)
Building large corpora from the web, Foundational course at the European Summer School in Logic, Language and Information 2012, Opole
Building large corpora from the web (for printing)
Building large corpora from the web (for screen reading)