This publication is in the INCUBATOR section.
Roland Schäfer (in preparation) Statistische Inferenz in der Linguistik. To be submitted to Language Science Press when it’s done. English version planned to be written after the German version.
The Git repository is here (Roland Schäfer Statistical Inference in Linguistics Git repository), but there isn’t much going on at the moment. It’s still an empty document mostly.
Moved to GitHub as of 1 May 2016 (from SourceForge rev. 622).
This is the work horse web page cleaning system behind the COW. It turns crawled HTML documents into clean XML corpus documents. It is released under a permissive 2-clause BSD license. Continue reading
Felix Bildhauer & Roland Schäfer: Sehr große Webkorpora – Aufbau, Zusammensetzung und Anwendung (“Very large web corpora – construction, composition, and application”). Invited talk at Institut für Deutsche Sprache (IDS), Mannheim.
Roland Schäfer, Adrien Barbaresi & Felix Bildhauer: The Good, the Bad, and the Hazy: Design Decisions in Web Corpus Construction. 8th Web as Corpus Workshop (WAC-8). Corpus Linguistics 2013, Lancaster. Go to proceedings.
Endorsed by ACL SIGWAC, co-located with EACL 2014, April 26, 2014 (Gothenburg, Sweden).
Organized by Felix Bildhauer and Roland Schäfer.
Visit official WAC-9 homepage for details. Visit WAC-9 proceedings page.