Assessing compatibility and stability of individual grammars through replication (DGfS 2020 AG 15, Hamburg)

Accepted presentation at Modelling gradient variability in grammar (AG 15 at the Annual Meeting of the German Linguistic Society, DGfS 2020, Hamburg). Read abstract: AG15 Roland Schäfer (FU Berlin) Assessing compatibility and stability of individual grammars through replication (PDF).

Gradient variability is studied prominently in Probabilistic Grammar (Bresnan 2007). A major strain of research in Probabilistic Grammar focuses on cases where speakers chose between two or more superficially interchangeable options (alternations) and the choices are controlled probabilistically by contextual features and soft constraints (e. g., Bresnan 2007, Gries 2017). Often, corpus evidence is used, but some studies also uses experimental data either exclusively or in order to corroborate the corpus findings. Some studies show how corpus data and experimental data converge (Bresnan et al. 2007, Durrant & Doherty 2010, Gries & Wulff 2005, Gries et al. 2005), but a number of other studies led to diverging or only partially converging results between corpus and experimental evidence (Arppe & Järvikivi 2007, Dąbrowska 2014, Mollin 2009). More importantly, Dąbrowska (2008, 2012, 2015) presented evidence showing that individual speakers have partially incompatible grammars (not reducible to dia- or sociolectal variation). This poses some problems for the use of massively pooled data in corpora. In psycholinguistics, participant-level variables (such as age) are often used to model between speaker variation (e. g., Huettig & Janse 2016), but Verhagen & Mos (2016) address the possibility that there might be random between-speaker variability in the sense of Dąbrowska (incompatibility), and individual speaker grammars might also be subject to random fluctuations (instability). There are thus at least six potential sources of variability to consider in alternation research: [1] soft context-sensitive semantic/syntactic constraints, [2] functional within-speaker variation (register), [3] incompatibility, [4] instability, [5] noise, including performance effects, [6] artefacts of the experiment.

My research presented here addresses [3] through [6]. I show results of two repeated replications of two previously published experiments (split-100 and self-paced reading [SPR]) on binary morphosyntactic alternations in German. In both cases, determining context-sensitive semantic and morphosyntactic constraints influencing the alternation was the objective of the original study, and the point of the replications was to find out whether and how incompatibility and instability mar the inferences about these constraints. The repeated replication consisted of two replications of the original studies with the same highly homogeneous groups of participants with two months in between, allowing for a detailed analysis of incompatibility and instability effects. Both original experiments were reported alongside parallel corpus studies, showing strong convergence (split-100) and weak but significant convergence (SPR) between corpus and experimental evidence. We thus have: [1] the corpus study, [2] the original experiment, [3] the first replication, [4] the second replication with the same participants as in [3]. The analysis shows incompatibility effects (like in Dąbrowska op. cit.) inasmuch as large separate groups of participants strongly prefer either variant A over B or variant B over A consistently in the split-100 task. The same effect seems to be observable in the SPR data with variant A or B (depending on the speaker) incurring a reading time delay regardless of other factors (further analysis pending). Turning to instability effects, for roughly two thirds of the participants, split-100 decisions are stable between the replications. The other third reacts more or less randomly across the two experiment runs. I argue that this is likely due to some problems with the split-100 task itself. More dramatically, over half of the participants in the SPR experiment show no stable behaviour in the two replications across all target stimuli, which means this cannot be due purely to performance effects. I discuss this w. r. t. the usability of SPR in alternation research. Also, I discuss the sensitivity of SPR to specifics of the experiment (cf. the wildly diverging SPR results from ten papers on Chinese relative clause processing in Vasishth 2015: 8). Importantly, however, the fundamental findings about the probabilistic semantic and morphosyntactic constraints controlling the alternation turn out to be robust across the corpus studies and all experiments. I discuss how this robustness might come about in the face of incompatibility and instability, and how this affects linguistic theory, experimental practice and corpus studies, as well as the statistical analysis of the findings (building on the discussion in Barr et al. 2013, Bates et al. 2015, Matuschek et al. 2017).