Tag Archives: Web crawling
Bildhauer & Schäfer: Working with web corpora (Corpus Linguistics 2015 workshop)
Sehr große Webkorpora – Aufbau, Zusammensetzung und Anwendung (2014)
Felix Bildhauer & Roland Schäfer: Sehr große Webkorpora – Aufbau, Zusammensetzung und Anwendung (“Very large web corpora – construction, composition, and application”). Invited talk at Institut für Deutsche Sprache (IDS), Mannheim.
Focused Web Corpus Crawling (2014)
Web Corpus Construction (2013)
Roland Schäfer & Felix Bildhauer (2013) Web Corpus Construction. Morgan and Claypool. [BibTeX]
Websites: Morgan & Claypool (official), Companion web site (additional information, errata, etc.)
Reviews: Serge Sharoff in Computational Linguistics 41(1) (2015), Mats Wirén in Nordic Journal of Linguistics 37, 03 (2014)
Scalable Construction of High-quality Web Corpora (2013)
Building Large Corpora from the Web Using a New Efficient Tool Chain (2012)
Roland Schäfer & Felix Bildhauer (2012) Building Large Corpora from the Web Using a New Efficient Tool Chain. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). 486–493. [BibTeX]
Please cite this paper if you use the COW corpora up to version COW16.
Building large corpora from the web (ESSLLI 2012)
Building large corpora from the web, Foundational course at the European Summer School in Logic, Language and Information 2012, Opole
Building large corpora from the web (for printing)
Building large corpora from the web (for screen reading)