Roland Schäfer

Professor of Linguistics | German Grammar and Lexicon | Friedrich-Schiller-Universität Jena

Menu

Skip to content
  • Research
    • Projects
    • External Funding
    • Software
  • CV
    • Education
    • Employment
  • Teaching
    • General Linguistics
    • German Linguistics
    • English Linguistics
    • Computational Linguistics
    • Languages
  • Publications
    • Incubator
    • Books
    • Papers
    • Theses
    • Chapters and Encyclopedia Articles
  • Talks
  • Confs
    • Workshops
    • Tutorials/Courses
  • Refereeing
    • Journals
    • Edited Volumes
    • Books
    • Conferences
  • Impressum (DE)
  • Datenschutz (DE)

Tag Archives: Parallel computing

CommonCOW: Massively Huge Web Corpora from CommonCrawl Data and a Method to Distribute them Freely under Restrictive EU Copyright Laws (2016)

Roland Schäfer (2016). CommonCOW: massively huge web corpora from CommonCrawl data and a method to distribute them freely under restrictive EU copyright laws. In In Proceedings of LREC 2016. 4500–4504. [BibTeX]

Continue reading →

texrex web page cleaning system

Moved to GitHub as of 1 May 2016 (from SourceForge rev. 622).

This is the work horse web page cleaning system behind the COW. It turns crawled HTML documents into clean XML corpus documents. It is released under a permissive 2-clause BSD license. Continue reading →

Processing and Querying Large Web Corpora with the COW14 Architecture (2015)

Roland Schäfer. Processing and Querying Large Web Corpora with the COW14 Architecture. In Proceedings of Challenges in the Management of Large Corpora (CMLC-3) (IDS publication server). 28–34. [BibTeX]

Continue reading →

Scalable Construction of High-quality Web Corpora (2013)

Chris Biemann, Felix Bildhauer, Stefan Evert, Dirk Goldhahn, Uwe Quasthoff, Roland Schäfer, Johannes Simon, Leonard Swiezinski & Torsten Zesch (2013) Scalable Construction of High-quality Web Corpora. In Journal for Language Technology and Computational Linguistics 18. 23–60. [BibTeX]

Continue reading →

My Einführung in die grammatische Beschreibung has been downloaded 85,546 times and is the no. 1 best-downloading monograph of LangSci Press as of 30 December 2022. The fourth edition is coming.

Recent Posts

  • Vorlesung “Deutsche Syntax”
  • Seminar “Einführung in die Morphologie und Lexikologie”
  • Vorlesung “Schrift und Schreibung im Deutschen”
  • Academic CV, publications, list of courses taught in PDF form
  • Errata in Einführung in die grammatische Beschreibung, 3. Auflage

Office Address

Prof. Dr. Roland Schäfer
Germanistische Sprachwissenschaft
Fürstengraben 30
07743 Jena

Email address

Empfehlungen für Emails

Sprechstunden

Secretary: Stephanie Hanemann