Introduction

Recent research concerning text comprehension has emphasized readers' monitoring of the consistency, coherence, and congruence of an ongoing message. The contributing processes are frequently referred to as discourse validation (Richter & Rapp, 2014; Schroeder et al., 2008; Singer, 2013; Singer et al., 1992). There is considerable evidence that validation is immediate (Staub et al., 2007) and passive or nonstrategic (Isberner & Richter, 2013; Singer, 2006), and that it serves as a criterion for representational updating (Ferretti et al., 2013; Schroeder et al., 2008; Singer, 2006, 2013). Validation comprises complex processes that result both in successful and failed comprehension.

The phenomenon perhaps most commonly interpreted to reflect readers' text validation is the consistency effect (CE; Albrecht & O'Brien, 1993; Hakala & O'Brien, 1995; Huitema et al., 1993; O'Brien & Albrecht, 1992). It comprises greater reading time for target sentences inconsistent in their discourse context, compared with equivalent consistent targets. We examined numerous early studies (Albrecht & O'Brien, 1993; Huitema et al., 1993; O'Brien et al., 1998; O'Brien & Albrecht, 1992) and more recent ones (O'Brien & Cook, 2016; Smith & O'Brien, 2012; Williams et al., 2018) that documented CEs. In all instances, interpretation and conclusions emphasized readers' nonstrategic sensitivity to text discrepancies.

In their early seminal study, for example, O'Brien and Albrecht (1992, p. 782) asserted that "incoming information is CONTINUALLY checked" (all-caps added) against the existing discourse mental model. A simple interpretation of this analysis is that the passive processes that afford the consistency effect result in the promotion of text discrepancies to awareness. Alternatively, those discrepancies might not rise to the level of consciousness. It is striking that none of the aforementioned studies denied the possibility that readers systematically become aware of consistency-effect discrepancies. This issue was the main focus of the present study.

The latter conclusions, however, ostensibly contradict considerable existing evidence of readers' deficient validation – ranging from readers' substandard comprehension monitoring (Glenberg et al., 1987) to their overlooking text contradictions, such as a character making sandwiches in the absence of bread (Cohen, 1979). Systematic comprehension flaws, often collectively labeled "misinformation effects" (Rapp & Braasch, 2014), are addressed by theoretical analyses such as good-enough processing (Ferreira et al., 2002) and the scenario-map** and focus theory (Sanford & Garrod, 2005).

To reconcile the CE with instances of deficient validation, this study further scrutinized the character and implications of the effect. Experiment 1 replicated the CE with two well-known and intermixed material sets. In Experiment 2, we applied a critical, new procedure in which subjects made continual responses about the consistency of text segments. We considered that this would clarify whether readers can consciously detect the consistency-effect inconsistencies. Experiment 3 addressed the possibility that failures to detect inconsistencies result from the reader's lack of knowledge about the corresponding ideas.

In the Introduction, we next present an overview of text validation, with emphasis on (a) the consistency effect and (b) misinformation effects that might challenge the implications of the CE. A theoretical framework and its mechanisms that might account for flawed validation are outlined next. Finally, the plan for our empirical investigation is described.

Monitoring text congruence

The consistency effect (CE)

In the "consistency paradigm" (e.g., Long & Chong, 2001; also, "contradiction paradigm," e.g., Lassonde et al., 2012), the CE is usually quantified by comparing self-paced reading time for identical-text segments that are either inconsistent or consistent in their discourse context. To cite a classic example, readers would initially encounter one version of As Kim stood inside/outside the health club, she felt a little sluggish. Reading time later in the text for the phrase, She decided to go outside, was greater as an inconsistent continuation than as a consistent one (O'Brien & Albrecht, 1992). The CE applies to inconsistencies in many situational relations (van Dijk & Kintsch, 1983), reflecting readers' monitoring of coherence both for text verbatim form and at deeper situational levels (Graesser et al., 1994; Zwaan et al., 1995; Zwaan et al., 1998). Thus, CEs, typically ranging from 150 ms to 400 ms, are detected for information bearing on narrative protagonists (e.g., Cook et al., 2007; O'Brien et al., 1998; O'Brien et al., 2010; Huitema et al., 1993) and both conspicuous and subtle causal inconsistencies (Lutz & Radvansky, 1997). They are measured for spatial (O'Brien & Albrecht, 1992; Rinck et al., 2001), temporal (Magliano & Schleich, 2000; Rinck et al., 2001), and narrative goal and motivational relations (Albrecht & Myers, 1995; Huitema et al., 1993; Klin et al., 1999).

Like other comprehension effects, the CE frequently "spills over" beyond the text segment that initiates them. Thus, spillover CEs are detected both for surface contradictions and for situational dimensions; including inconsistencies pertaining to goals (Huitema et al., 1993), protagonist traits (O'Brien et al., 1998), and spatial relations (Smith & O'Brien, 2012). In some instances, spillover CEs may appear in the absence of the CE at the preceding target (Levine & McCully, 2017; Williams et al., 2018).

However, validation success in the consistency paradigm framework is regulated by numerous text variables. For example, readers are more likely to detect a contradiction between a character lifting a heavy object and his advanced age when the age had been extensively elaborated in the antecedent text than when it had not (Albrecht & O'Brien, 1993; O'Brien et al., 1990). Other such variables include the similarity and message-distance between the current segment and its antecedents (Albrecht & Myers, 1995, 1998; O'Brien et al., 1990); and the typicality (Myers et al., 2000, Experiment 3), distinctiveness, strength (O'Brien & Albrecht, 1991; O'Brien, Albrecht, Hakala, & Rizzella, 1995; McKoon & Ratcliff, 1992), and degree of plausibility of those antecedents (Richter & Maier, 2017). The mechanisms by which these variables exert their effects are considered in the Theoretical framework section.

Misinformation effects

Certain implications of the consistency effect seem contradicted by systematic validation shortcomings. Diverse misinformation effects reflect the readers' challenge in integrating and reconciling ideas both within- and between-text and with general knowledge. Instances such as deficient comprehension monitoring (Glenberg et al., 1987) and overlooking text contradictions (Cohen, 1979) were mentioned earlier.

Other well-known instances of understanders missing text and discourse anomalies involve them replying "two" to the question, How many animals of each kind did Moses take on the ark? (it was actually Noah, not Moses; Erickson & Mattson, 1981) and being misled by trick riddles, such as When an airplane crashes, where should the SURVIVORS be buried? (Barton & Sanford, 1993). Likewise, when asked, How fast was the car going past THE BARN along the country road? people incorrectly agree that a non-existent barn had appeared in a prior video (Loftus & Zanni, 1975). These outcomes reflect a variety of mechanisms. The Moses illusion and the buried survivor results may reflect shallow, passive, or partial processing on the part of the reader (Barton& Sanford, 1993; Daneman, Lennertz, & Hannon, 2007; Reder & Kusbit, 1991). Both the Moses and the barn stimuli embed the critical term in the question presupposition (Halliday, 1967), which understanders insufficiently scrutinize (Hornby, 1974). Encountering false presuppositions, however, results in erroneous information being encoded in the sentence or message representation (Loftus & Zanni, 1975).

In addition, readers may accept text ideas despite their being incorrect or later discredited (Fazio et al., 2013). Readers insufficiently update text information (Rapp & Kendeou, 2009; Wilkes & Leatherbarrow, 1988). They may acquire or be influenced by text ideas that conflict with their existing knowledge, such as the Atlantic rather than the Pacific being the largest ocean (Fazio et al., 2013). Even warnings to be wary of inaccurate text ideas do not always inoculate readers against these effects (Ecker et al., 2010; Fazio et al., 2013; Marsh & Fazio, 2006). Readers' susceptibility to misinformation effects has been characterized as the "biggest challenge" to proposals about the routine validation of text coherence (Isberner & Richter, 2013).

Theoretical framework

There is growing consensus that consistency evaluation is subsumed in comprehension processing rather than forming a separate and subsequent processing stage. Several analyses (e.g., Ferretti et al., 2013; O'Brien & Cook, 2016; Richter & Maier, 2017) invoke Kintsch's (1988, 1998) construction-integration model (CI). According to that view, the reader first constructs a preliminary network of text content, enabled by the passive memorial resonance (Ratcliff, 1978) of either general knowledge or text ideas antecedent to the current clause. This results in both relevant and even irrelevant ideas appearing in the network. However, at the integration stage, activation is computationally settled in the most highly interconnected network elements, effectively eliminating irrelevant ideas (Kintsch, 1988, 1998; Rumelhart & McClelland, 1986). Like construction, integration is a passive, "dumb" process.

In this context, theorists posit a distinct processing stage of validation, which comprises the evaluative dimension of comprehension with reference both to prior text ideas and to general knowledge (e.g., O'Brien & Cook, 2016; Richter & Maier, 2017; Richter & Rapp, 2014). Validation is considered to succeed when text meets a threshold or standard of coherence adopted by the reader (van den Broek et al., 2011). Successful validation sanctions the updating of the discourse situation model (Ferretti et al., 2013; Rapp & Kendeou, 2009; Schroeder et al., 2008).

Most importantly, these models view this validation stage as nonstrategic and passive (Isberner & Richter, 2013; Singer, 2006). Passive validation is especially supported when the appearance of the CE is delayed until the reader encounters a subsequent, spillover sentence. This sometimes occurs in instances of "weak" inconsistencies, such as when a vegetarian orders a tuna salad (as opposed to a cheeseburger; Cook & O'Brien, 2014; see also Levine & McCully, 2017, and Williams et al., 2018). That result profile implies that validation has passively persisted beyond the sentence presenting the inconsistency.

The models differ in their treatment of the relation between the integration and validation stages of processing, their implementation of the coherence threshold, and their analysis of transitions from passive to strategic (or aware) validation. It is not the purpose of this study to resolve these differences. Rather, the models are invoked to provide a theoretical context for these investigations. The General discussion considers the bearing of this framework on the empirical results.

Current study

The main objective was to quantify, for the first time, the degree of readers' awareness of the within-text inconsistencies that are diagnosed by the consistency effect (CE). One mechanism that would enable the latter outcome is a coherence threshold that is below the reader's conscious awareness (O'Brien & Cook, 2016). Richter and Maier's (2017) default option of not subjecting coherence deficiencies to controlled processing would likewise mask text anomalies. Such mechanisms would produce a striking dissociation: Text inconsistencies would be reflected by phenomena such as the consistency effect while not being available to the reader's awareness.

To this end, as discussed earlier, we first replicated the CE (Experiment 1). In Experiment 2, subjects continually evaluated the discourse consistency of each sentence in the same passages as Experiment 1. We propose that the joint results of Experiments 1 and 2 clarify the implications of the CE. Experiment 3 probed an alternative interpretation of the outcome of Experiment 2.

Experiment 1

Experiment 1 was designed to replicate the familiar CE, in order to provide a framework for interpreting our new experimental procedures. It simultaneously scrutinized two well-known material sets from this domain. This required lengthy and complex stimulus lists that interwove the two sets. As a result, the subjects encountered stimulus passages that varied considerably in their structure, length, and content. The rationale was that if the CE could be replicated in this context, then it would provide a convincing backdrop for interpreting the results of Experiments 2 and 3.

Method

Subjects

The subjects were 47 female and male native-English speaking students of Introductory Psychology at the University of Manitoba.Footnote 1 Sample size was based on that of O'Brien et al. (1998). The subjects participated in partial fulfillment of a course requirement. All experiments in this study complied with the American Psychological Association guidelines for interacting with human subjects.

Materials

Two sets of experimental narrative passages were respectively derived from those of (a) (Singer, 2006; Singer et al., 2017; see also O'Brien et al., 1990) and (b) O'Brien et al. (1998). There were 28 passages in Singer et al.'s (2017) materials. These focused on the consistency of a critical Concept throughout the passage. There were 22 passages in O'Brien et al.'s materials, which focused on the consistency of a character Trait. Twenty passages were chosen at random from each set.

Sample passages of each set are shown in Table 1. The Concept passages were nine sentences long. Sentence 2 identified two candidate concepts for an action, such as camel and mule. Sentence 3, the critical antecedent, specified which of these was ultimately relevant. A filler section of four sentences (sentences 4–7) were locally coherent but focused on other ideas. Then, in sentence 8, a narrative character was described as expressing an assertion or belief about one of those alternatives (e.g., camel). As a result, sentence 8 could be either consistent or inconsistent with its antecedent. Finally, sentence 9 served as the spillover sentence.

Table 1 Sample experimental concept and trait passages of Experiment 1

The Trait passages of O'Brien et al. (1998) were 14 to 17 sentences long. Each passage started with two to three sentences that introduced a scenario and a main character. Next, a section of variable length introduced and elaborated upon a trait of the character, such as his age (e.g., 25 (young), 81 (elderly)). A filler section of six sentences continued the narrative but did not discuss the elaborated trait. A critical target sentence then described a protagonist action that was either consistent or inconsistent with his or her trait (He quickly ran and picked the boy up). The next sentence constituted the spillover sentence. The passage concluded with a two- to three-sentence Closing section.

The Concept passages only required matching the critical concept to its antecedent. In contrast, the Trait passages demanded inferences and access to world knowledge of the reader, in order that consistency be evaluated. Such processes are needed to expose, for example, that running and lifting are inconsistent with being old and frail. To summarize, the two material sets placed different demands on readers while inspecting their sensitivity to text consistency.

In both the Concept and Trait passages, the filler sections served to delete the critical antecedent from working memory. Even the four filler sentences of the Concept passages are sufficient to accomplish this while still affording the CE (O'Brien et al., 1998). Under these conditions, the CE requires the reinstatement to working memory of the antecedent from the reader's long-term memory for the text.

A simple comprehension question was composed for every passage. For both the Concept and Trait passages, half of these questions were randomly assigned to have the correct answer "yes."

The form of the stimulus texts was intended to promote rather than thwart CEs. That is, previous studies that used these materials indicated that the critical antecedents were neither too distant, dissimilar, nor atypical to impede retrieval and comparison. Therefore, overlooking text discrepancies in a consistency-judgment task (Experiment 2) would not be attributable to those text factors.

Across two experimental lists of 40 passages, each passage appeared alternately in the consistent and inconsistent condition. List 1 was constructed as follows: Each passage was randomly assigned to list position subject to two restrictions: First, no two consecutive passages represented the same one of the four conditions (Consistency × Set (concept-trait)). Second, each half of the list included exactly five passages in each condition. As a result, the subjects encountered a total of ten passages in each Consistency × Set condition. List 2 was identical except that each passage reversed its consistent-inconsistent condition. Finally, each list was preceded by four practice passages, one representing each condition. The passages that were randomly assigned to the same condition in each list comprised a Verbal Group.

The lists included no filler passages (cf. Singer et al., 2017). This is because we judged that the considerable variability of the Concept and Trait passages and the position of their inconsistencies would mask any regularities among the materials.

Procedure

Experimental sessions were conducted with groups of one to four subjects. Each subject was tested in a separate, closed room. The subject stations consisted of a PC computer, keyboard, and screen. The screen was positioned 22 cm from the edge of its table, but subjects were otherwise free to adjust their chairs to a comfortable position.

The subjects were randomly assigned to view list 1 or list 2, subject to the restriction that approximately equal numbers of subjects view each list. They read printed instructions describing the task and providing a sample passage followed by three example comprehension questions, two of which had the correct answer "yes." The sample passage differed in structure from the experimental passages.

The session comprised four practice and 40 experimental trials. On each trial, the signal "press READY for next story" appeared left-adjusted on row 1 of a 20-row screen. When the subject pressed the space bar (READY key), the first sentence of the passage appeared left-adjusted half-way down the screen (row 10). The subjects were instructed to press READY to signal their understanding of each successive sentence. Each response resulted in the erasure of the current sentence and the presentation of the next sentence, in the same screen location.

The subject's response to the final sentence of a passage was followed by a 2.5-s interval. Next, the fixation character X appeared for 0.5 s at row 6, column 1 of the screen. Then, the comprehension question was presented, starting at the fixation position. The keyboard "." and "x" keys, respectively, served as the Yes and No response buttons. The message "ERROR" was presented for 1 s after incorrect responses. After the response or the error message, there was a 3-s intertrial interval; followed by the reappearance of the Ready signal. After the last passage, a screen message thanked the subject for her or his participation.

Statistical analyses

The data of Experiment 1 and throughout were analyzed using linear mixed-effect modelling, conducted with the lme4 package (Bates et al., 2015) in R (R Development Core Team, 2016). The p-values were obtained using the lmerTest package in R (Kuznetsova et al., 2017). For timed measures only, the data were log-transformed and p-values used the Satterthwaite approximation. The significance criterion was α = 0.05.

Subjects and items (the experimental passages) functioned as random effects throughout. The accuracy analyses comprised generalized linear models with binomial errors. For timed measures, the models included random slopes for the fixed effect of Consistency (the target sentence matching vs. mismatching its antecedent) for both subjects and items. It will be noted if those random slopes were ultimately excluded due to the model in question not converging.

Results

One subject frequently signaled understanding of story sentences in under 1 s. All analyses were applied to the data of the remaining 46 subjects. The mean accuracy for the comprehension questions was 81.4%.

Target sentence

The data of main interest were the reading times at the target sentence and the following (spillover) sentence. The mean reading times as a function of Consistency and Set appear in Table 2 for both sentences.

Table 2 Mean reading times (ms; SE in parentheses) as a function of consistency and set in Experiment 1

An initial linear mixed-effect model that included random slopes of consistency for both for subjects and items failed to converge so random slopes of consistency were retained for items only. Reading time was significantly greater in the inconsistent condition than in the consistent condition, β = 0.07, SE = 0.02, t = 2.88, p < 0.01. Reading time was also significantly greater for the concept than the trait passages, β = 0.46, SE = 0.07, t = 6.65, p < 0.01. The Consistency × Set interaction did not approach significance, t = 0.51, p = 0.61.

Spillover sentence

The spillover sentences were inherently neither consistent nor inconsistent with their passages. Their consistency classification stemmed only from their association with the immediately preceding target sentence.

The model that included random slopes for consistency did not converge for subjects or items, so they were excluded. There were again significant effects of consistency β = 0.05, SE = 0.02, t = 2.00, p = 0.05, and set, β = 0.41, SE = 0.07, t = 5.78, p < 0.01. The Consistency × Set interaction was not significant, t = 0.57, p = 0.33.

Discussion

Experiment 1 revealed robust CEs for each of two familiar material sets. These effects were detected even though passages from the two sets were randomly intermixed. The CEs spilled over to the sentence immediately following the target (Albrecht & O'Brien, 1993; Hakala et al., 1995; Huitema et al., 1993). Reading times were greater for the concept than the trait passages, both at the target and spillover sentences. This is attributable to the greater length of the concept sentences, at both positions.

In summary, Experiment 1 heightens confidence in the comparability of the present materials to those previously examined. As such, it served as a preamble to Experiment 2, in which subjects made sentence-by-sentence judgments about text consistency.

Experiment 2

As discussed throughout, consistency effects are routinely detected at target and spillover sentences for a wide range of situational discrepancies. This might imply that readers become aware of those discrepancies but there has been little or no investigation of this issue. Furthermore, that conclusion ostensibly conflicts with well-documented deficiencies in readers' text validation. This raises the possibility that CE discrepancies might not reliably become available to the reader's awareness.

Experiment 2 was designed to scrutinize people's text-consistency monitoring. To accomplish this, we presented subjects with the materials of Experiment 1, but asked them whether each successive sentence was consistent with the entire preceding text. Their accuracy in this endeavor bears on the reliability of people's validation monitoring.

Method

Subjects

The subjects were 78 new individuals from the same population that was sampled in Experiment 1. Sample size in Experiments 2 and 3 was based on values that have dependably yielded significant consistency effects in our studies (Singer & Spear, 2020; Singer et al., 2017), ensuring adequate power.Footnote 2

Materials

The text materials were identical to those of Experiment 1. As before, each of two lists included equal numbers of passages representing the four Consistency × Set conditions in each list-half. There were again no filler passages.

Questionnaire

A questionnaire was administered after experimental testing, with two purposes. First, it probed the possible impact on the subjects' reading strategies of repeatedly encountering text inconsistencies. Second, it assessed whether, upon detecting an inconsistency, the reader favored the first or second of two contradictory concepts. The latter aim addressed the misinformation effects discussed throughout. Previous questionnaire data of ours indicated that readers are reluctant to unequivocally embrace novel text information that contradicts previous ideas (Singer et al., 2017). The present questionnaire further scrutinized that issue.

The questionnaire reminded subjects that they had been advised that some of the passages would present inconsistencies (see Procedure, below). Considering that the questionnaire focused on inconsistencies, the reminder served to make all subjects aware of that emphasis from the outset. Next, four questions asked whether the subjects were conscious of inconsistencies as they proceeded, and probed the strategies that they may have developed to cope with inconsistencies. The last of these four questions asked which conflicting concept(s) the subject may have favored. The questionnaire appears in Online Supplemental Material (OSM), Appendix A.

Procedure

Assignment of subjects to rooms and subject stations was the same as in Experiment 1. Testing comprised the four practice and 40 experimental trials of the lists.

Prior to the experiment, the subjects received detailed, printed instructions about it. These included a sample passage, each sentence of which was labeled as consistent or inconsistent in its context. The instructions showed the correct "yes" or "no" consistency judgment for each sentence. The sample passage used the structure of the Concept passages. However, the instructions also mentioned an example of an inconsistency of the sort present in the Trait passages. The subjects were advised that some but not all passages would include inconsistencies.

The subjects read the stimulus stories one sentence at a time. They were instructed to reply "yes" ("." on the keyboard) if the meaning of the current sentence was consistent with all that preceded it and "no" ("x") if they detected any inconsistency. The consistency judgment and its latency were recorded for the target sentence. As in Experiment 1, the spillover sentences were not distinctly consistent or inconsistent. Therefore, answer times were recorded but the specific consistency judgments were not.Footnote 3

The signal "press READY for next story" preceded each story. When the subject responded, the first sentence of the story appeared at row 10, column 1 of the screen. As soon as a "yes" or "no" response was registered, the next sentence appeared in the same position. Presentation continued in this manner until the end of the story. The target and spillover sentences were not explicitly highlighted for the subject. There was no answer-time limit for consistency judgments and no error feedback was provided.

Immediately after the last sentence of a passage, the signal QUESTION appeared for 2 s at row 6, column 1 of the screen. It was followed at the same position by a fixation X for 0.5 s. The comprehension question for that passage then replaced the fixation point. There was no answer-time limit. The message "ERROR" appeared for 1 s after incorrect responses. After either the subject's response or the error feedback, there was a 2-s blank intertrial interval. The READY signal then initiated the next trial. After completing the experimental task, the subjects wrote their answers to the questionnaire items.

Results

The data of four subjects who never replied "no" in judging text consistency and two subjects who frequently registered consistency judgments in less than 1 s were disregarded. All analyses were based on the remaining 72 subjects. The mean accuracy for the comprehension questions was 77.6%.

Target sentence accuracy

Mean proportion target accuracies as a function of consistency and set appear in Table 3. Linear mixed-effect modelling was applied, using the same design as in Experiment 1. Analysis included random slopes of consistency for items; but not for subjects, because that model failed to converge. Accuracy rates of .928 and .532 were observed in the consistent and inconsistent conditions respectively, a significant difference: β = 3.06, SE = 0.27, z = 11.16, p < .01. Neither the set effect (β = 0.11, SE = 0.33, z = 0.32, p = 0.75), nor the Consistency × Set interaction (β = 0.04, SE = 0.37, z = 0.10, p = 0.92) approached significance.

Table 3 Mean target consistency-judgment accuracy (proportion; SE in parentheses) as a function of consistency and set in Experiment 2

Target sentence answer times

Table 4 shows the mean consistency-judgment answer times as a function of the fixed effects consistency, set, and subjects' answers. The answers "no" and "yes" were incorrect in the consistent and inconsistent conditions, respectively.

Table 4 Mean answer times (ms; SE in parentheses) as a function of set, consistency, and answer in Experiment 2

Mean answer times were 2,533 ms and 2,633 ms for "yes" and "no" replies, respectively; a significant difference, β = 0.08, SE = 0.03, t = 3.38, p < 0.01. Answer times were greater for the concept than the trait passages, β = 0.26, SE = 0.05, t = 5.08, p < 0.01. The main effect of consistency was not significant, β = 0.03, SE = 0.02, t = 1.77, p = 0.08.

The analysis also revealed a Set × Answer interaction, β = 0.07, SE = 0.04, t = 2.01, p = 0.04. Follow-up tests of simple main effects indicated that "no" judgment time significantly exceeded "yesses" both in the consistent condition, β = 0.13, SE = 0.02, z = 6.77, p < 0.01, and the inconsistent condition, β = 0.14, SE = 0.02, z = 8.93, p < 0.01. No other effects reached significance.

Spillover sentence answer times

As noted earlier, subjects' answer times for the spillover sentences and not their answers were recorded. The answer times were alternately analyzed as a function of (a) set and consistency and (b) set and answer (the subject's "yes" or "no" answer to the preceding target sentence).

Mean answer times as a function of Consistency × Set appear in Table 5. The analysis revealed that inconsistent answer times exceeded consistent ones, β = 0.07, SE = 0.02, t = 2.79, p = 0.01. Answer time was greater for the concept than the trait passages, β = 0.29, SE = 0.06, t = 4.91, p < 0.01. There was a Consistency × Set interaction, β = 0.15, SE = 0.04, t = 3.66, p < 0.01. Tests of simple main effects revealed an effect of consistency for the trait passages, β = 0.14, SE = 0.03, t = 5.50, p < 0.01, but not for the concept passages, t = 0.28, p = 0.78.

Table 5 Mean spillover sentence answer time (ms; SE in parentheses) as a function of consistency and set in Experiment 2

Mean answer times as a function of Answer × Set appear in Table 6. That analysis was conducted because we interpreted subjects' replies to the preceding target sentence to diagnose whether, upon reading the spillover sentence, they might still be reconciling a perceived inconsistency. Answer time was significantly greater following "no" than "yes" responses, β = 0.11, SE = 0.02, t = 5.47, p < 0.01. Answer time was also greater for the concept than the trait passages, β = 0.34, SE = 0.06, t = 5.61, p < 0.01. The Set × Answer interaction was significant, β = 0.09, SE = 0.04, t = 2.18, p = 0.03. Tests of simple main effects indicated that answer time was greater after "no" than "yes" replies for both concept passages, β = 0.07, SE = 0.03, t = 2.46, p = 0.01, and trait passages, β = 0.16, SE = 0.03, t = 5.47, p < 0.01.

Table 6 Mean spillover sentence answer time (ms; SE in parentheses) as a function of answer and set in Experiment 2

Questionnaire

The questionnaire proportion replies appear in Table 7. Although the experiment instructions mentioned inconsistencies, only 92% of subjects reported detecting any. In response to inconsistencies, 28% and 72% of subjects, respectively, reported adopting special strategies (question 2) and engaging in memorization (item question 3). These values indicate that many subjects did not view memorization as a special strategy.

Table 7 Percent affirmative answers to the Experiment 2 questionnaire items

Of greatest interest were the subjects' expressed preferences between text contradictions (question 4): Sixty-nine percent of subjects favored the earlier introduced of two conflicting concepts and 11% of subjects preferred the later one. Another 18% of subjects considered that either of two conflicting alternatives might be true.

Discussion

Target sentence accuracy

The prevalence of consistency effects might imply that readers who intentionally monitor text accuracy will achieve high rates of discrepancy detection. To the contrary, accuracy was only 53% in the inconsistent condition, compared with 92% in the consistent condition. Overlooking 47% of text inconsistencies amounts to considerably flawed validation.

Different orienting tasks, such as self-paced reading versus consistency monitoring, exert both obvious and subtle effects on reading (e.g., Cirilo, 1981; Mayer & Cook, 1981; Walker & Meyer, 1980). Consistency judgments require decisions based on the reader's recollective memory of the antecedent text. In such circumstances, many task and stimulus variables affect the responder's decision criterion to respond affirmatively or negatively (Brown, Lewis, & Monk, 1977; Hirshman, 1995). Therefore, signal detection calculations (e.g., Macmillan & Creelman, 2005) were performed to quantify the latter criterion. In signal detection analyses of two-choice memory-decision tasks, parameters of strength (or sensitivity; e.g., d') and decision criterion (e.g., C) are derived from the rates of hits (here, "yes" to a consistent target) and false alarms ("yes" to an inconsistent target). Experiment 2 yielded values of d' = 1.54 and C = -0.69. A negative C value denotes a "liberal" decision criterion, amounting to a bias to anticipate congruent text continuations. This could promote frequent acceptance of inconsistent target sentences. This proposal is consistent with Kamas et al.'s (1996) proposal that variations in response criteria underlie the profile of people's responses to Moses-illusion stimuli. A liberal bias in the present paradigm would mesh with the cooperative principles of Grice (1975), according to which, among other things, readers may reasonably expect authors to be accurate.

The concept and trait passages yielded virtually identical accuracy values. The discrepancies of the concept passages turned on conspicuous text contradictions. Detecting the discrepancies of the trait passages, in contrast, depended on knowledge-based inferences such as that ordering a cheeseburger is inconsistent with eating a healthy diet. Similar result profiles for the two passage sets suggest that these results likely generalize to yet other domains of text consistency.

It is especially important that Experiments 1 and 2 scrutinized materials stemming from many prior demonstrations of the CE. Other researchers have documented that text anomalies yielding CEs may be unavailable to conscious detection (e.g., Cook et al., 2016). However, the latter study scrutinized discrepancies between sentences and general knowledge, such as Dogs usually don't like the sounds of LIGHTNING during a storm. The focus here, in contrast, was conspicuous within-text contradictions, as captured by materials that have afforded numerous instances of the CE.

Target sentence answer times

Most noteworthy was that "no" replies took longer than "yesses" regardless of whether the target sentence matched its antecedent or not. This pattern was similar across the four Consistency × Set conditions. We attribute this outcome to the reader's need to cognitively reconcile perceived inconsistencies. Perceiving a discrepancy can occur both for inconsistent and even consistent targets (as evidenced by the modest false alarm rate of Experiment 2), although this will occur much more frequently for the former.

Augmenting the difference between "no" and "yes" answer times is subjects' default expectation for the correct answer to be "yes" in dichotomous judgment tasks. As a result, it takes a measurable amount of time to change an internal response index from "yes" to "no" before registering an answer to negative test items (Carpenter & Just, 1975; see also Singer, 1984).

It is possible that in routine, self-paced reading (e.g., Experiment 1), some text inconsistencies remain undetected by both passive and strategic validation. In the consistency paradigm, however, it seems almost certain that the proportion of perceived discrepancies would be greater in the inconsistent than the consistent condition. This would be sufficient to result in observed answer-time consistency effects.

Spillover sentence answer times

Spillover answer times were greater after inconsistent than consistent targets, but this outcome was restricted to the trait passages. That result amounts to inferential inconsistencies (e.g., someone preferring health food ordering a cheeseburger) posing more ongoing difficulty at the spillover position than explicit contradictions (camel vs. mule). Such a pattern is arguably sensible but it is not one that we predicted.

The Set × Answer analysis revealed significantly slower answers when subjects had responded "no" (inconsistent) than "yes" to the immediately preceding target, regardless of the actual consistency of that target. This indicates that the readers found it more difficult to process and integrate the spillover sentence when grappling with a perceived inconsistency, whether real or imagined. This pattern meshes with extensive demonstrations of reading spillover effects, as discussed in the Introduction.

Questionnaire

Of greatest interest were responses to question 4, which indicated that subjects usually either endorsed the accuracy of the first of two contradictory concepts from a passage or else were equivocal regarding the two. This agrees closely with questionnaire data reported by Singer et al. (2017). In that study, strategy information was solicited with open-ended questions, to which only a small proportion of subjects provided clear answers. However, of those subjects who expressed preferences between two conflicting text ideas, over 85% either preferred the first-mentioned concept or equally preferred the two (vs. a total of 87% in Experiment 2). Virtually none of Singer et al.'s subjects definitively preferred the second concept.

Richter and Maier (2017) noted that some prior studies of misinformation effects likewise indicate that readers favor previously acquired ideas over subsequent, conflicting information (Anderson et al., 1980; Johnson & Seifert, 1994; Wilkes & Leatherbarrow, 1988). The literature also provides instances of the opposite trend: when new or updating information tends to override earlier ideas (Ecker et al., 2010; Fazio et al., 2013). We do not view this alternate preference for original versus novel information as being mutually incompatible. Rather, the specific relations among the (a) discourse context and (b) two or more contradictory ideas, likely determine which of the latter will predominate. In the discourse context that we have inspected (Experiment 2; Singer et al., 2017), it is the first of two contradictory ideas that is favored.

Experiment 3

Subjects' failure to detect a high proportion of target inconsistencies in Experiment 2 contradicts the possible implication of the CE that readers will reliably become consciously aware of text discrepancies. We considered two alternative explanations of the latter outcome. First, upon reading the target sentence, the subjects may not have known the correct antecedent fact; either because they had not encoded it or because they had forgotten it. Second, they may have known the antecedent fact but overlooked the antecedent-target conflict. Experiment 3 was designed to distinguish these alternatives.

To accomplish this, the target sentences of the previous experiments were replaced by questions about the crucial antecedent fact. For example, for the Concept passage of Table 1, the target sentence was replaced by the test question, Did the old man sell Harold a camel? Low accuracy for these questions would suggest that, upon reaching the target sentence, the Experiment 2 subjects did not know the antecedent fact. High accuracy, in contrast, would favor the view that Experiment 2 subjects knew the relevant fact but overlooked its discrepancy from its antecedent.

Method

Subjects

The subjects were 64 individuals from the same population as before, who had not participated in Experiment 1 or Experiment 2.

Materials

The materials comprised two experimental lists derived from those of Experiment 1. For each experimental passage, two changes were made. First, all passage sentences beginning with the former target through to the final sentence were deleted. Second, the simple comprehension questions of Experiment 1 were replaced by a question about the critical concept. For the Concept passage of Table 1, this question was Did the old man sell Harold a camel? For the Trait passage of Table 1, the question was Was Bill a young man? These test questions used the term or phrase of the corresponding target sentences of Experiment 1. As a result, they had the correct answer "yes" or "no," depending on which alternative had appeared in the critical antecedent sentence of the passage.

In all other regards, the materials were identical to those of Experiment 1. Across two counterbalanced lists, the passages alternately presented questions with the correct responses "yes" and "no." To facilitate comparing the results with those of Experiment 2, we retained the condition labels consistent and inconsistent, depending on whether the critical term in the question matched (consistent) or mismatched its antecedent. As a result of the passage deletions, the Concept passages were seven sentences long; and the Trait passages varied from nine to 13 sentences in length.

Procedure

The procedure was similar to that of Experiment 1. The subjects again read printed instructions. These included a complete sample passage, different in structure from the experimental materials. It was followed by three sample questions, two of which had the correct answer "yes."

The subjects then engaged in sentence-by-sentence reading of each passage, but only through the sentence preceding the former target. After a 2.5-s interval, a fixation X appeared for 0.5 s at row 6, column 1 of the screen. It was followed immediately at the same position by a question about the crucial antecedent fact, as described in the Materials. The subjects were instructed to respond yes (".") and no ("x") for questions describing a fact that, relative to the current passage, was true or false, respectively. There was no answer-time limit and incorrect responses were followed by a 1-s "ERROR" message. After a further 3.0-s interval, the READY prompt initiated the next trial.

Results

No post-passage comprehension questions were presented. The data of two subjects who were not native-English speaking were disregarded. All analyses were applied to the responses of the remaining 62 subjects.

Due to experimenter error, the final sentence of the "filler" section of one of the Trait passages was excluded. This sentence did not bear on the status of the critical trait, so the data of this story were not excluded from the analyses.

The data of principal concern were the accuracy rates as a function of consistency and set. The answer times were also of interest. However, for the Concept passages, there were too many empty cells (0 incorrect answers by numerous subjects) to permit a meaningful interpretation of answer times so that measure will be considered only for the Trait passages.

Accuracy

Mean accuracy as a function of Consistency and Set appears in Table 8. Random slopes of consistency were included for items but not for subjects, as the latter model failed to converge. None of the effects was significant, as follows: Consistency (β = 0.65, SE = 0.44, z = 1.48, p = 0.14); set (β = 0.20, SE = 0.54, z = 0.37, p = 0.72); Consistency × Set (β = 0.95, SE = 0.63, z = 1.51, p = 0.13).

Table 8 Mean accuracy (proportion; SE in parentheses) as a function of consistency and set in Experiment 3

Answer times

The mean answer times for the Trait passages appear in Table 9. The analysis yielded no significant effects, as follow: consistency, β = 0.01, SE = 0.04, t = 0.08, p = 0.94; answer, β = 0.04, SE = 0.03, t = 1.26, p = 0.21; Consistency × Answer, β = 0.10, SE = 0.06, t = 1.50, p = 0.13.

Table 9 Mean answer times (ms; SE in parentheses) for the trait passages as a function of consistency and subjects' answer in Experiment 3

Comparison of Experiments 2 and 3

Accuracy was greater for consistent than inconsistent target sentences when subjects made consistency judgments (Experiment 2) but there was little difference when the critical concept was directly questioned (Experiment 3). The latter result tends to exclude the possibility that Experiment 2 critical information was unknown when the target sentence was encountered. We next directly compare the accuracy rates of the two experiments. To corroborate a greater CE with continual consistency judgments (Experiment 2) than with direct questioning (Experiment 3), analysis ought to yield an Experiment × Consistency interaction.

The accuracy profiles of the concept and trait material sets were highly similar in Experiments 2 and 3. Therefore, the accuracy comparisons collapsed across the Concept and Trait material sets. Most importantly, there was a Consistency × Experiment interaction, β = 2.84, SE = 0.32, z = 9.00, p < 0.01. This was corroborated by tests of simple main effects. The CE was significant for Experiment 2, β = 2.46, SE = 0.12, z = 19.96, p < 0.01, but not for Experiment 3, β = 0.21, SE = 0.13, z = 1.65, p = 0.10. Analysis also revealed main effects of consistency, β = 2.25, SE = 0.22, z = 10.44, p < .01, and of experiment, β = 0.58, SE = 0.18, z = 3.30, p < .01. Both of those main effects reflect the low accuracy rates of the inconsistent condition of Experiment 2.

Discussion

In Experiment 2, subjects performing continual consistency judgments overlooked 47% of inconsistencies. This was at odds with the possible implication of the CE that readers will become consciously aware of text inconsistencies at a high rate. One candidate explanation of the Experiment 2 result was that the subjects either had not encoded the critical ideas or else had forgotten them by the time that the target sentence was encountered.

Experiment 3 addressed this possibility by interrogating the critical idea at the usual point of appearance of the target sentence. Whereas the Experiment 2 consistency judgments were much less accurate in the inconsistent than the consistent condition, answer accuracy in those conditions was approximately equal in Experiment 3. Thus, Experiment 3 indicates that readers of these materials satisfactorily encode and remember critical information over moderate text distances. However, they frequently fail to report discrepancies even when they are specifically instructed to monitor consistency, as was the case in Experiment 2.

Readers' successful encoding and retention of comparable narrative events was likewise documented by O'Brien et al. (2010). Subjects in norming studies correctly answered questions such as Did Bobby find his hammer? after reading, in different conditions, that: (a) the hammer was lost, (b) was lost but later found, or (c) was not lost at all. The norming procedure differed from the procedure of Experiment 3 in that reading in O'Brien et al.'s study was from hard copy and questioning was untimed. Experiment 3 provides the assurance that even under the constraints of self-paced reading and timed question answering, subjects exhibit knowledge of the critical antecedents.

It might be proposed that the high accuracy rates of Experiment 3 do not reflect the active availability of the relevant antecedent idea. Rather, the critical questions may reactivate the antecedent or else facilitate its reconstruction. This possibility aligns with our own interpretation of the results. The Materials section of Experiment 1 indicated that the content of the filler sections of the passages were expected to purge the antecedent from working memory. The detection of the CE (e.g., Experiment 1) requires that the target sentence access the antecedent and restore it to working memory. Likewise, correct replies in Experiment 3 depend on access to and reactivation of the backgrounded antecedent by the question content.

Analysis of the answer times for the trait passages revealed an interactive pattern of consistency and answer: namely, answer times were greater when subjects answered incorrectly ("no" to consistent items and "yes" to inconsistent ones), an arguably sensible profile. However, linear mixed-effect modelling yielded no significant effects, perhaps due to insufficient observations in some experimental conditions.

General discussion

The consistency effect (CE) of text comprehension diagnoses readers' sensitivity to discrepancies pertaining both to verbatim form and numerous situational dimensions. A simple inference from the CE might be that readers will become aware of those discrepancies. However, that proposal conflicts with flawed text validation, as exposed by numerous misinformation effects. To scrutinize that issue, we first replicated the CE using familiar but diverse materials (Experiment 1). In Experiment 2, however, subjects making consistency judgments about the same materials overlooked a large proportion of the inconsistencies. Experiment 3 denied that these errors resulted from the readers' lack of knowledge about the critical antecedents.

How, then, do CE inconsistencies fail to become apparent to readers? It was observed earlier that text factors such as insufficient similarity, distinctiveness, typicality, and degree of elaboration of a text antecedent may thwart its passive retrieval, resulting in overlooked discrepancies. However, the discussion of Experiment 2 emphasized that we intentionally selected experimental materials previously demonstrated to afford rather than impede the CE.

Rather, it is the immediate, passive quality of the validation processing stage that best accounts for subjects frequently overlooking text inconsistencies in Experiment 2. Theorists offer specific mechanisms consistent with that interpretation. As mentioned earlier, in O'Brien and Cook's RI-Val model (2016), the coherence threshold of comprehension (van den Broek et al., 2011) is below the level of the reader's conscious awareness. As a result, insufficient coherence does not ensure a transition to strategic validation and thus the detection of inconsistency. In contrast, Richter and Maier's (2017) two-stage model holds that deficient coherence at the first, passive, integrative stage of validation might, by default, direct processing to the next text segment without resolving inconsistencies. That route meshes with the proposal that, under conditions of incoherence, comprehension may be suspended, with the tacit expectation that later text will resolve contradictions (van den Broek et al., 1996). Alternatively, initial incoherence may initiate, at Richter and Maier's second stage, strategic elaborative processing to resolve discrepancies. Further research will be needed to distinguish among these and other competing hypotheses concerning the mechanisms of passive validation.

For the Concept materials of these experiments, overlooking text inconsistencies may be further enhanced by the similarity between a target and its antecedents, such as camel versus mule. There is evidence that, under certain conditions, readers perform only a partial match between a target and accessed information (O'Brien & Cook, 2016; Reder & Kusbit, 1991). Incomplete matches could result in readers overlooking anomalies such as paddling around the largest ocean, the ATLANTIC (Fazio et al., 2013) and How many animals of each kind did MOSES take on the ark? (Erickson & Mattson, 1981).

In a study comparable to the present one, Marsh and Fazio (2006) contrasted the performance of self-paced readers with those who made continual judgments about the consistency between text and their world knowledge. The "judgment" subjects overlooked 67% of critical incorrect facts, with similar values for easy (more familiar) and hard (less familiar) items. Thus, readers are very prone to miss text discrepancies, whether within-text (present Experiment 2) or with reference to world knowledge. We speculate that the lower miss rate of Experiment 2 (47% vs. Marsh & Fazio's 67%) reflects contradictions that were both within-text and frequently blatant.

Marsh and Fazio's (2006, Experiment 3) subjects also performed a cued-recall test after their reading task. The rate of producing previously-presented inaccurate text ideas was lower for consistency-judges than self-paced readers. This suggests that although monitoring text for misinformation only weakly promotes the detection of contradictions, it may inoculate the reader against their acquisition. Instructing subjects to correct text inaccuracies during reading has a similar impact (Rapp et al., 2014).

It is noteworthy that theoretical proposals of passive validation all admit the option of strategic resolution of text discrepancies. Some specify the circumstances that initiate those operations (Long & Lea, 2005; Richter & Maier, 2017), whereas others focus almost exclusively on the principles of passive processing (O'Brien & Cook, 2016, p. 271; see also Ferretti et al., 2013).

In Experiment 2, target answer latencies were greater for "no" than "yes" replies regardless of actual consistency. We proposed that the difference reflects the time needed to (a) reconcile perceived inconsistencies and (b) to change an internal response index from "yes" to "no." We interpret those results to stem from instances of controlled, aware detection of text discrepancies.

The questionnaire data of Experiment 2 are informative regarding certain misinformation effects. Following the consistency-judgment task, the subjects expressed a clear preference for the first of two conflicting concepts. Off-line comprehension measures are unreliable indicators of the quality of comprehension processing. However, the questionnaire data are consistent with those of Singer et al. (2017), which stemmed from a self-paced reading task. They also mesh with a variety of misinformation-effects that implicate the predominance of prior over subsequent information (Anderson et al., 1980; Johnson & Seifert, 1994; Wilkes & Leatherbarrow, 1988).

Conscious and unconscious inconsistency detection

The results raise the conundrum of what function or value passive validation might serve, if readers remain unaware of its products. However, that question meshes with long-standing considerations of the relationship between conscious and unconscious cognitive processes. In this regard, Reingold and Toth (1996) observed that (a) tasks are unlikely to be process-pure and (b) tasks requiring an explicit reply are subject to the respondent's response bias. Those claims are relevant to the present concerns.

First, consider process-purity. Reingold and Toth (1996) noted that although tasks such as cued recall and speeded recognition are respectively associated with explicit and implicit memory processes, it is likely that each task is supported by a combination of those processes. This would accord with hypotheses about process mixtures in numerous cognitive domains (Reder, 1987; Schacter, 1987; Wixted, 2007). The official or tacit requirements of a task do not prevent people from relying on alternative processing modes and so "contaminating" the dominant processes (Reingold & Toth, 1996).

In the present context, self-paced reading versus consistency-judgment might respectively be predominantly associated with passive validation (e.g., Cook & O'Brien, 2014) and controlled validation. However, the high rate of overlooked inconsistencies of Experiment 2 implicates the involvement of passive validation even in consistency judgment. Conversely, the role of conscious validation processing in routine reading was addressed although not elaborated by O'Brien and Cook (2016, p. 271). A given reading orienting task may be dominated by either controlled or passive validation while not precluding a contribution of the alternative process.

Second, Reingold and Toth (1996, p. 162) noted that tasks that require explicit responses, including those designed to diagnose processing mixtures, require consideration of the criterion or response bias. That proposal dovetails with our signal detection calculations of Experiment 2. We reported that the subjects of Experiment 2 applied a liberal response criterion in judging target consistency. Factors well known to regulate the decision criterion could complement below-threshold detections to generate the high miss rate of Experiment 2. The manipulation of such factors readily impacts people's preference for affirmative versus negative replies in two-choice answering tasks (Singer, 1984).

Conclusions

Prevalent consistency effects might imply that readers will reliably become aware of the diagnosed inconsistencies. That implication is denied by consistency judges' low accuracy rate for text discrepancies, as measured in Experiment 2. That result suggests that passive validation processes significantly influence the course of comprehension even when the reader intends to monitor text coherence.

The distinction between passively and strategically detected text inconsistencies delineate a considerable research agenda. First, it will be important distinguish when inconsistencies are either tolerated or resolved. For example, Richter and Maier's (2017) two-step model posits that bypassing the controlled scrutiny of an inconsistency is particularly favored when there are available, prior beliefs that override it. This would prevent passively detected text discrepancies from becoming available to conscious awareness. Although the two-step model specifically emphasizes belief-based inconsistencies between documents, its principles might be applied to other, within-text, contradictions. Conversely, there are circumstances in which readers might tolerate even conscious, actively-detected anomalies. For example, researchers note that the reader may "suspend comprehension," on the assumption that further text will clarify text discrepancies (van den Broek et al., 1996; see also Richards & Singer, 2001).

Second, the role of the reader's response criterion is relevant, especially for explicitly intentional tasks such as consistency monitoring. Third, the literature on conscious versus unconscious processing raises questions about process-purity even for explicitly passive or active tasks (Reingold & Toth, 1996). Therefore, the mixing of validation processes in different reading orienting tasks needs to be considered.

It was noted earlier that certain theories tend to highlight readers' validation deficiencies, and so address misinformation effects. According to the good-enough processing analysis, readers' resource limitations coupled with complex grammatical constructions generate erroneous and incomplete representations (Ferreira et al., 2002). Lack of completeness, in turn, is reflective of discourse focus: Specifically, elements that are not in focus may participate inadequately in text encoding (Sanford & Garrod, 2005). For example, Sanford and Garrod proposed that, regarding the sentence, In fact, the man with the hat was arrested, if the man with the hat does not distinguish among a set of men, then hat will be incompletely encoded.

It is promising that the latter theories appreciably converge with those derived from construction-integration (Kintsch, 1988, 1998), as discussed throughout. Singer and Spear (2020) noted that both approaches embrace (a) a parallel and incremental view of comprehension processes, (b) the passive activation of both relevant and irrelevant text antecedents, and (c) the impact of readers' resource limitations on encoding. These commonalities offer the promise of a theory that is unified regarding validation in particular and text comprehension more generally.