Currently bundled with texrex on GitHub.
ClaraX (funded by the German Research Council through grant SCHA1916/1-1 Linguistic web characterization) is the companion of the planned (but delayed) HeidiX (Heidi is a crawler system) software. It performs parametrized random walk crawls in the web graph and integrates full texrex‘s web page cleaning functionality. It is purely experimental in the sense that it is designed to conduct experiments and fundamental research. It is in no way suitable for large-scale productive crawling. It is released under a permissive 2-clause BSD license.