Visual word recognition: Evidence for a serial bottleneck in lexical access

White, Alex L.; Palmer, John; Boynton, Geoffrey M.

doi:10.3758/s13414-019-01916-z

Visual word recognition: Evidence for a serial bottleneck in lexical access

Open access
Published: 12 December 2019

Volume 82, pages 2000–2017, (2020)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Visual word recognition: Evidence for a serial bottleneck in lexical access

Download PDF

Alex L. White^1,2,
John Palmer³ &
Geoffrey M. Boynton³

3500 Accesses
16 Citations
6 Altmetric
Explore all metrics

Abstract

Reading is a demanding task, constrained by inherent processing capacity limits. Do those capacity limits allow for multiple words to be recognized in parallel? In a recent study, we measured semantic categorization accuracy for nouns presented in pairs. The words were replaced by post-masks after an interval that was set to each subject’s threshold, such that with focused attention they could categorize one word with ~80% accuracy. When subjects tried to divide attention between both words, their accuracy was so impaired that it supported a serial processing model: on each trial, subjects could categorize one word but had to guess about the other. In the experiments reported here, we investigated how our previous result generalizes across two tasks that require lexical access but vary in the depth of semantic processing (semantic categorization and lexical decision), and across different masking stimuli, word lengths, lexical frequencies and visual field positions. In all cases, the serial processing model was supported by two effects: (1) a sufficiently large accuracy deficit with divided compared to focused attention; and (2) a trial-by-trial stimulus processing tradeoff, meaning that the response to one word was more likely to be correct if the response to the other was incorrect. However, when the task was to detect colored letters, neither of those effects occurred, even though the post-masks limited accuracy in the same way. Altogether, the results are consistent with the hypothesis that visual processing of words is parallel but lexical access is serial.

Severe processing capacity limits for sub-lexical features of letter strings

Article Open access 03 January 2024

Holistic word processing is correlated with efficiency in visual word recognition

Article 19 February 2020

The effect of orthographic depth on letter string processing: the case of visual attention span and rapid automatized naming

Article 14 November 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

When listening to a story, the sensory signal is defined by change across time, and the words are presented sequentially. But when reading a story, the sensory signal is defined by change across space, and many words are available simultaneously. The visual system is capable of parallel processing across space, starting with the simultaneous retinal transduction of the entire incoming image. Therefore, it is theoretically possible that multiple written words can be processed in parallel.

The degree of parallel processing in natural reading is the subject of a long-running debate. The debate has been mostly fueled by measures of oculomotor behavior. For instance, readers fixate the majority of words directly, but they begin processing the next word (n+1) while still fixating on the current word (n) (Rayner, 2009). That can be shown by surreptitiously changing word n+1 during the saccade to it, which results in a slowdown of processing in the next fixation. But does that mean the two words (n and n+1) were processed in parallel? Some researchers argue affirmatively, based on a range of experimental data fit with computational models (Engbert, Nuthmann, Richter, & Kliegl, 2005; Snell, van Leipsig, Grainger, & Meeter, 2018b). Others argue, to the contrary, that word recognition is necessarily serial: attention shifts to begin processing word n+1 only after word n is completed (Reichle, Liversedge, Pollatsek, & Rayner, 2009; Reichle, Pollatsek, & Rayner, 2006).

The debate has recently extended beyond oculomotor measures during reading (Snell & Grainger, 2019b). For instance, several studies have shown that with relatively short displays (≤ 200 ms) word recognition performance is influenced by surrounding words and sentence context (Snell, Declerck, & Grainger, 2018a; Snell & Grainger, 2017; Snell, Meeter, & Grainger, 2017). This could be taken as evidence that multiple words are processed in parallel, although questions remain about the precise temporal dynamics of multiple word recognition.

In two recent studies, we took a related approach to ask a fundamental question: Can people recognize two words at exactly the same time? We used backwards masking to control the amount of time available to process each word. Specifically, we presented subjects with pairs of nouns, one to the left and one to the right of fixation. The nouns were flashed briefly and immediately preceded and followed by masks of random consonants. There were two main conditions: (1) In the single-task condition, the subject was pre-cued in advance to the location of the one word they had to recognize, so they could focus attention on it and ignore the other (while fixating centrally). (2) In the dual-task condition, the subject was pre-cued to both locations, so they had to divide attention and try to recognize both words simultaneously. At the end of the trial they were prompted to judge both words independently. In both conditions, the subject had to report whether each attended word belonged to a specific semantic category (e.g., “animals”).

Importantly, we set the duration of the inter-stimulus intervals (ISIs) between the words and the masks to each subject’s threshold, such that in the single-task condition they could categorize one word with ~80% correct accuracy. The question was, in that same amount of time, could they recognize both words? The answer was no: with the same stimulus timing, in the dual-task condition accuracy was sufficiently degraded that it ruled out two standard parallel models and supported an “all-or-none” serial processing model. This serial model assumes that only one word can be fully recognized at a time, and due to the limited time available, only one word can be recognized on each trial. If the subject is asked about the other word, they have to guess. Hence the name “all-or-none”: each word is either processed completely, or no task-relevant information is extracted at all.

We also found a trial-by-trial stimulus processing tradeoff in the dual-task condition: subjects were more likely to respond to one word correctly if they responded incorrectly to the other word. This tradeoff pattern also suggests that the subjects can’t recognize both words on each trial, and therefore provides further support for the all-or-none serial model.

However, when subjects viewed exactly the same stimulus sequences but had to judge the color of the letters, rather than the meaning of the words, dual-task accuracy was equivalent to single-task accuracy. Each dual-task response was more likely to be correct if the other was correct, unlike the stimulus processing tradeoff pattern we observed in semantic judgments. Overall, color detection performance was consistent with unlimited-capacity parallel processing, while semantic categorization performance suggested that a serial bottleneck lies somewhere in the word recognition system (White, Palmer, & Boynton, 2018).

In a subsequent study, we investigated the source of that bottleneck in the brain’s reading circuitry. We recorded brain activity with fMRI while participants performed a semantic categorization task with masked words to the left and right of fixation, similar to the experiment described above (White et al., 2018). We observed evidence of parallel processing of the two words throughout visual cortex. But in an anterior sub-region of the left hemisphere “visual word form area,” activity was consistent with serial processing of single words (White, Palmer, Boynton, & Yeatman, 2019).

In the experiments reported here, we sought to answer five of the questions left unanswered by our previous studies. First, is the serial bottleneck specific to high-level semantic judgments, or does it apply to any task that requires lexical access? Lexical access is the stage at which a written word activates an entry stored in long-term memory. Lexical access is often studied using the lexical decision task: the subject is presented with letter strings and reports whether they are real words or not. No further semantic processing is required. In Experiment 1, we assessed parallel vs. serial processing with a semantic categorization task (distinguishing living things from non-living things), and Experiment 2 we used we use a simpler lexical decision task (distinguishing real English words from pseudowords).

Second, is the serial bottleneck specific to words presented in opposite hemifields? With one word in the left hemifield and the other in the right, we previously observed a marked asymmetry: semantic categorization accuracy was much higher for words to the right than left of fixation (White et al., 2018), consistent with a many decades of prior studies (e.g., Mishkin & Forgays, 1952). It is possible that the inherent asymmetry induced a strategy of only attending to the right word in the dual-task condition. Therefore, in the three experiments here, we presented the words directly above and below fixation. Accuracy for those two locations is more balanced, and the letters are all closer to fixation and easier to resolve.

Third, is the serial bottleneck apparent only for some types of post-masks? Our prior results may have depended on masks composed of letters that caused interference at the level of orthographic processing. In Experiment 1, we directly compared two different masks: letters, and noise patches made by phase-scrambling images of letters. The scrambled masks were matched to the letters in spatial frequency and orientation content, size, and luminance contrast, but contained no objects. In Experiments 2 and 3 we used upside down non-letter characters as masks. These masks were composed of letter-like features arranged into objects that nonetheless aren’t recognizable letters.

Fourth, can two words pass through the bottleneck together if they are very short and common in the language? Short and common words may require fewer processing resources and therefore be processed in parallel. To test that possibility, in all three experiments we used a wider range of word lengths and lexical frequencies and binned the trials accordingly. Lexical frequency is a measure of how often a word occurs in large corpora of text, and correlates with familiarity and ease of recognition.

Fifth and finally, does a serial bottleneck constrain performance in any task as long as the stimuli are properly masked? In other words, is the deficit in the dual-task condition for semantic tasks due to the masking itself? We addressed that question in Experiment 3, using a color-detection task with the mask timing set to constrain accuracy in the same way as it did for the lexical and semantic judgments. In our previously published color detection experiments (White et al., 2018), the time between the words and the masks was matched to the semantic categorization condition, and was not set to the single-task threshold for color detection. The inter-stimulus interval (ISI) may therefore have been long enough to allow serial switching of attention to detect color in both words within one trial. Experiment 3 rectifies that concern.

To preview the results: performance in the semantic categorization and lexical decision tasks consistently ruled out the two standard parallel models and supported the all-or-none serial model. In contrast, the color-detection task supported a parallel model and was inconsistent with the all-or-none serial model, despite the strong masking. In the Discussion we consider several challenges to our interpretation of the data, including one related to the necessity of conscious awareness (Snell & Grainger, 2019a).

Methods

Experiment 1

Subjects

Ten volunteers (six female, ages 20–34 years, mean = 23.1 years) with normal or corrected-to-normal visual acuity participated in exchange for fixed monetary payment. Each subject gave informed consent in accordance with the Declaration of Helsinki and the University of Washington Institutional Review Board. All subjects were right-handed, naïve as to the purposes of the experiment, and had learned English as their first language. On the composite TOWRE-II Test of Word Reading Efficiency (Torgesen, Rashotte, & Wagner, 1999), all scored near or above the norm of 100 (M = 114, SEM = 4).

The sample size was chosen in advance of data collection on the basis of previous experiments with similar design (White et al., 2018). A power analysis suggested that in order to distinguish fixed-capacity parallel and all-or-none serial models with 95% power, on the basis of dual-task deficits and stimulus processing tradeoffs, we need at least 6 participants. We rounded that up to 10, to be conservative and consistent with our prior experiments.

Stimuli

We used custom MATLAB software (MathWorks, Natick, MA, USA) and the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) to present stimuli on a linearized CRT monitor (1,024 × 640 pixels; 120 Hz refresh rate; maximum luminance 90 cd/m²). The stimuli consisted of: a medium gray background (47 cd/m²), a small black fixation cross with dimensions 0.25 × 0.25 degrees of visual angle (°); and black letter strings in Courier font (28 pt; 4 cd/m²). The words were drawn from two semantic categories (“non-living” and “living”), each with 190 English nouns (available in the public repository for this study). Lexical frequency ranged from 0.06 to 539 per million with a median of 7.4 per million, according to the Clearpond database (Marian, Bartolotti, Chabal, & Shook, 2012). The words ranged from four to six characters in length, subtending 2.6–4.4° in width, and 0.6–1.1° in height. In addition, we used two types of post-masks: (a) strings of six random consonants, also black; (b) phase-scrambled images of consonant strings. Each phase-scrambled image was created by computing the Fourier transform of an image of consonants, replacing the phases with random values, and reverse transforming. The two mask types were thus matched in size (4.1–4.4° in width; 0.95–1.1° in height), root-mean-square luminance contrast, and spatial frequency content, but the phase-scrambled images contained no letters.

Trial sequence

As illustrated in Fig. 1a, each trial began with a 1,000-ms pre-cue: two vertical lines 0.15° long, one above and one below fixation, each with one end 0.05° from the center of the fixation mark. On dual-task trials, both pre-cue lines were black. On single-task trials, one was blue and one was green. Half the subjects were assigned to the blue cue, and half to the green. The line with the assigned color indicated the side (top or bottom) that would be post-cued on single-task trials. After a 500-ms blank interval containing only the fixation cross, the two words were flashed for 17 ms. The words were centered at 1.1° directly above and below fixation. Each word was equally likely to be drawn from either of the two semantic categories (living and non-living), independent of each other. The only constraints were that the words on the two sides could not be identical, and neither word could have appeared in the previous trial.

After the words was an ISI containing only the fixation mark, with duration set to the subject’s 80% correct single-task threshold. Table 1 lists the mean threshold ISIs used in each experiment. Details on how thresholds were determined are described in the Procedure section below. After the ISI, the two post-masks were presented for 250 ms, centered at the same locations as the preceding words. The mask type (consonants or phase-scrambled) varied randomly across trials, but both masks on each trial were of the same type. After another 100-ms blank interval, a post-cue appeared: two lines like the pre-cue lines, one green and one blue. After a 500-ms delay, a 25-ms click was played, which prompted the subject to press a key to report the category of the word on the side indicated by the post-cue line in their assigned color (blue or green). Key-presses before the click were not recorded.

Table 1 Inter-stimulus intervals (ISIs) between the words and the masks in each experiment. These ISIs were set to achieve 80–90% correct in the single-task conditions. The second column is the mean ISI across subjects. The third column indicates the range across subjects, computed by first taking the mean ISI across trials for each subject

Full size table

The task was semantic categorization: to report whether the post-cued word was a living thing or a non-living thing, along with confidence in the judgment. The subject pressed one of four keys with their left hand (a, s, d or f) when the post-cue pointed to the top side, or one of four keys with their right hand (m, <, >, or ?) when the post-cue pointed to the bottom side. With each hand, the left-most key indicated “sure non-living” and the right-most key indicated “sure living.” The middle two keys indicated “guess non-living” and “guess living,” respectively, for when confidence was lower.

On single-task trials, the post-cue matched the pre-cue, prompting the subject to judge the category of the one attended word. As soon as the subject pressed a key, a 100-ms feedback tone was played: high pitch (600 Hz) if the response was correct, or low pitch (180 Hz) if the response was incorrect. Feedback was determined only by the reported category and not the confidence level. Then after a 1,000-ms inter-trial interval (ITI), the next trial began.

On dual-task trials, the subject had to judge the words on both sides, in a random order. Importantly, the categories of the two words were independent, so the correct answer for one side did not predict the correct answer for the other. After the post-mask, the post-cue pointed to one side, and the subject pressed one key. Then the post-cue reversed to point to the other side, and 300 ms later another click prompted the second response. After that, two feedback tones were played: one for the first response and another for the second response. Then came the ITI and the next trial.

Eye-tracking

We monitored the right eye’s gaze position with an Eyelink 1000 eye-tracker (SR Research). Fixation was established during the ITI at the start of each trial. The trial only advanced if the estimated gaze position was within 1.5° horizontally and 2° vertically of the fixation cross for at least 200 ms. We allowed more vertical tolerance to accommodate drifts due to pupil size changes. The gaze position averaged over the next ten samples was defined as the current trial’s fixation position. A fixation break was then defined as a deviation of gaze position more than 1° horizontally or 1.25° vertically from that fixation position. If a fixation break occurred between the pre-cue offset and post-mask offset, the trial was immediately terminated. The subject had to press a button to continue the next trial. Terminated trials were repeated at the end of the block, unless fewer than three trials remained. As described in the Analysis section below, we also detected fixation breaks greater than 1° vertically in offline analysis of the eye traces, and excluded those trials as well.

Procedure

Completing the experiment required seven to ten sessions each lasting one hour. In sessions 1–2 the subjects received instructions, read the list of words used in the experiment, practiced the task, and then ran a staircase procedure to estimate their ISI thresholds for both types of post-masks. The staircase was run in blocks of 20 trials, alternating between the single-task top condition and the single-task bottom condition (no dual-task trials in the staircase). During each run, the word-mask ISI in units of log₁₀(seconds) was adjusted by a weighted 1-up/1-down staircase procedures controlled by the Palamedes toolbox (Prins & Kingdom, 2009). The step size down was always one-third of the step size up, which makes the staircase converge on the 75% correct threshold. Two staircases were randomly interleaved across trials, and blocks continued until both staircases had reversed direction ten times, and the threshold ISI was the mean value across all reversals. This whole procedure was run twice for both mask types tested separately in a random order, and threshold estimates were averaged across runs.

During the main experimental blocks (20 trials each), both mask types were randomly interleaved across trials, but the attention condition was blocked. Blocks were run in sets of four: two dual-task, one single-task top, and one single-task bottom, in a random order. Testing sessions continued until each subject had completed a total of 96 blocks (1,920 trials, half of which were dual-task). During each session, for each mask type, the ISI was constant across all conditions (dual-task and single-task).

The ISIs were initially set to the staircase threshold estimates but adjusted from session to session as necessary to keep single-task accuracy between 70% and 90% correct. Any run of four to 12 blocks with an ISI that was either too high (accuracy >90% correct) or too low (accuracy <70% correct) was discarded and re-run. This applied to 12 blocks for three subjects, and four blocks for one other.

Averaging across trials for each subject, the ISIs ranged from 17–49 ms (mean = 35 ± 3 ms) for consonant masks, and 4–31 ms (mean = 15 ± 2 ms) for the phase-scrambled masks. For all subjects, the ISIs were lower for the phase-scrambled masks than the consonant masks (mean difference = 20 ± 2 ms).

Finally, after the main experimental trials were finished, each subject ran 16 blocks of an “easy” condition with 400 ms ISI for both mask types. We used these easy blocks to assess accuracy when the masks were ineffective.

Experiment 2

Subjects

Ten volunteers participated (three female, mean age 25.6 years, ranging from 19 to 36 years). As in Experiment 1, all had normal or corrected-to-normal visual acuity, gave informed consent, and participated in exchange for fixed monetary payment. Two had also participated in Experiment 1. With the exception of one left-handed author (AW), all subjects were right-handed and naïve as to the purposes of the experiment. With the exception of one bilingual speaker of Urdu, all had learned English as their first language. All scored above the norm of 100 on the TOWRE-II reading test (M = 112, SEM = 3).

Stimuli and procedure

All stimuli and procedures were identical to Experiment 1 except as described here. The display background was white (90 cd/m²), and all characters were black (4 cd/m²). In an effort to make fixation easier, the fixation mark was more complex: a black cross 0.3° wide, with a 0.1° white dot at its center, and a thin black ring around it (0.3° diameter).

The stimulus set was composed of 702 real English words and 702 pronounceable pseudowords (available in the public repository for this study). Both categories were divided equally into strings of three, four and five letters long. We used a lower range of lengths here than in Experiment 1 to test the hypothesis that two very short words could be recognized in parallel. The real words came from all syntactic categories, ranging in lexical frequency from 3.4 to 873 occurrences per million. The four- and five-letter pseudowords had matched constrained trigram statistics to real words, and the three-letter pseudowords had matched constrained bigrams (Medler & Binder, 2005). Therefore, the pseudowords were pronounceable, with phonemic characteristics similar to real words. The masks were strings of non-letter characters drawn randomly from the set: ¢, ß, æ, ¥, ©, £, @, #, %, &. We generated a set of 702 unique masks with the same length distribution as the words. The masks were presented upside-down.

On each trial, two letter strings were presented simultaneously, one above and one below fixation, centered at 1.5° eccentricity. We increased the eccentricity in this experiment (compared to 1.1° in Experiment 1) to make it easier to process the two stimuli independently and avoid looking directly at either one. The two strings were the same length, and each had an independent 50% chance of being a real word. The masks were matched in length to the preceding letter strings, and presented upside down at the same locations.

During each trial, a fixation break was defined as a deviation of the right eye’s gaze position more than 1° horizontally or 1° vertically. This criterion was made more conservative than in Experiment 1 out of an abundance of caution, to ensure that all fixation breaks were detected.

The task was lexical decision: to report whether the post-cued letter string was a pseudoword or a real word. As in Experiment 1, the subjects pressed one of four keys for each post-cued side, to report the stimulus category and their level of confidence (from “sure pseudoword” to “sure real word”).

Given that there was only one mask type, we only had to estimate one ISI threshold for each subject, using the same staircase procedure. The across-trial average ISIs ranged from 33 to 92 ms (mean = 61 ± 7 ms). The fact that these ISI thresholds were longer than in Experiment 1 could be explained by the greater retinal eccentricity (1.5° vs. 1.1°), which made the stimuli somewhat more difficult to perceive.

Each subject completed a total of 60 blocks (1200 trials), over four to five 1-h sessions. No blocks had to be excluded and re-run due to the difficulty level being out of range. Unlike Experiment 1, there was no “easy” condition with a long ISI.

Experiment 3

Subjects

Ten volunteers participated (three female, mean age 25.4 years, ranging from 19 to 35 years). As in Experiments 1 and 2, all had normal or corrected-to-normal visual acuity, gave informed consent, and participated in exchange for fixed monetary payment. Two had also participated in Experiment 2, and two were left-handed. With the exception of one author (AW), all subjects were naïve as to the purposes of the experiment. With the exception of the same bilingual speaker of Urdu from Experiment 2, all had learned English as their first language. All participants were screened for normal color vision using Ishihara color plates.

Stimuli and procedure

All stimuli and procedures were identical to Experiment 2 except as described here. We used the same set of real words and pseudowords as in Experiment 2, except their luminance was set to 17% of the maximum (18.2 cd/m²; 83% Weber contrast). On each trial, each letter string had an independent 50% chance of being a color target: its letters alternated in color between red and green (with the first color randomized). The non-target letter strings were all dark gray, and roughly equiluminant with the reds and greens.

The task was color detection: to report whether the post-cued letter string was colored or gray. As in Experiment 2, the subjects pressed one of four keys for each post-cued side, to make a rating from “sure gray” to “sure colored”.

Adjusting the stimulus difficulty for each subject proceeded in two stages: first, we adjusted the saturations of the red and green colors to be roughly equally salient and to allow for >90% correct detection with 300 ms ISI. To adjust the saturations while keeping luminance roughly constant, we used the measured luminance outputs of each monitor gun. Starting with the baseline dark gray, we incremented the intensity of one gun (green or red) and decremented the other two by however much was necessary to keep the total luminance constant. This allowed for 132 red colors and 20 green colors, varying from gray to the maximum saturation available (corresponding to when the other two guns were at 0).

We express those saturation levels as proportions of the maximum while maintaining constant luminance. The mean (± SEM) red saturation proportion was 0.68 ± 0.05, and the mean green saturation proportion was 0.94 ± 0.03. One participant (S3) struggled to perform the task even with maximum saturations, so for that participant the duration of the letter strings was increased from 17 ms to 25 ms.

Then, with the color levels fixed, we adjusted the ISI to threshold, to achieve roughly 80% correct performance in the single-task condition. This was done by hand in practice blocks, rather than with a full staircase procedure. Across subjects, the threshold ISIs ranged from 17 to 51 ms (mean = 31 ± 3 ms).

During each trial, a fixation break was defined as a deviation of the right eye’s gaze position more than 1° horizontally or 1.25° vertically.

As in Experiment 1, we included some “easy” blocks with a long ISI (300 ms). To ensure that we set the color saturation levels appropriately, eight easy blocks were run before any of the main experimental blocks. Twelve more easy blocks were run at the end of the last session. In total, each subject completed 60 main experimental blocks (1200 trials) and 20 easy blocks (400 trials), in five to nine sessions. No blocks had to be excluded and re-run due to the difficulty level being out of range.

Analysis

Behavioral accuracy

In all three experiments, the subject’s task was to report which of two categories a letter string belonged to, along with a confidence rating. To analyze the subjects’ sensitivity, we re-labelled one category as “targets” and the other “non-targets.” A “target-present” trial was then defined as a trial in which the post-cued stimulus was from the target category. We then re-coded each response as a 1–4 rating from “sure target absent” to “sure target present.” The target categories in Experiments 1, 2, and 3 are: “living” words, real words, and colored letter strings, respectively.

As a bias-free measure of accuracy in each condition, we computed the area under the receiver operating characteristic (ROC) curve, A_g (Pollack & Hsieh, 1969). The ROC plots hit rates (HR) as a function of false alarm rates (FR). To compute these rates from the subjects’ response ratings, we varied an index i from 0 to 4. At each index level we coded responses greater than i as “yes” responses. For each value of i, HR(i) is the proportion of “yes” responses on target-present trials and FR(i) is the proportion of “yes” responses on target-absent trials. For instance, when i = 3, only response ratings of 4 (highest confidence) on target-present trials are considered hits, and only response ratings of 4 on target-absent trials are considered false alarms. The five pairs of HR(i) and FR(i) trace out a curve, the area under which (A_g) is a measure of accuracy. A_g ranges from 0.5 (chance) to 1.0 (perfect). One can think of A_g as an unbiased estimate of proportion correct.

Gaze fixation

During the experiments, fixation breaks were detected online and those trials were immediately terminated (and therefore excluded from the analysis). To be sure that we included no trials in which subjects may have looked directly at a word, we also analyzed the eye traces offline. First, for each trial in a block, we computed the median gaze position (across measurement samples) in the 300 ms before the pre-cue onset (excluding intervals with blinks). Then we defined the “central gaze position” for the block as the across-trial median of those initial gaze positions. This analysis corrects for any error in the eye-tracker calibration by assuming that subjects were fixating correctly in the interval before the pre-cue, when only the fixation mark was visible.

Then, for each trial, we analyzed gaze positions in the interval between the onset of the words and the offset of the post-masks. We defined an “offline fixation break” as a deviation that was more than 3° horizontally or 1° vertically from the central gaze position and that lasted more than 30 ms. In the analysis, we excluded all trials with offline fixation breaks. That led to an average loss of 4.9 ± 1.2% of the data in Experiment 1, 2.6 ± 0.9% in Experiment 2, and 3.5 ± 1.8% in Experiment 3.

Bootstrapping

Throughout the text we report bootstrapped 95% confidence intervals (CIs) for average measurements. To compute these, we generated a distribution of 5,000 resampled means. Each of those is the mean of ten values sampled with replacement from the original set of ten subjects’ means. The CI is the range from the 2.5th to 97.5th percentile of the distribution of resampled means, with an “accelerated” bias correction (Efron, 1987).

Results

Dual-task deficits and attention operating characteristics

In this paradigm, the primary evidence for a processing capacity limit is a dual-task deficit: lower accuracy compared to the single-task condition. Table 2 lists the mean (and SEM) accuracies in each of condition of the three experiments, collapsing across top and bottom sides. Accuracy is in units of Area under the ROC curve (A_g). All three experiments had significant dual-task deficits (p < 0.01, CI excludes 0), but they were roughly three times larger in the semantic and lexical tasks than in the color-detection task. In Experiment 1 (semantic categorization), the dual-task deficit was slightly higher with masks made of constants than phase-scrambled consonants, but not significantly so (mean difference in deficit = 0.02 ± 0.01; t(9)=1.94, p=0.084; CI = [-0.002 0.036]). Experiments 2 (lexical decision) and 3 (color detection) used very similar stimuli, so we directly compared them. The dual-task deficit in Experiment 2 (0.21) was significantly larger than in Experiment 3 (0.06): t(18)=7.15, p<10^-5, CI of difference = [0.11 0.18].

Table 2 Mean accuracies (in units of A_g) and dual-task deficits in the three experiments, with Experiment 1 (semantic categorization task) divided by the two mask types (N = 10)

Full size table

We also examined any differences in accuracy between the first and second responses in dual-task trials. In all three experiments, the mean differences (second – first) were small and not statistically significant: Experiment 1: -0.011 ± 0.009 A_g (CI = [-0.026 0.006]); Experiment 2: -0.026 ± 0.012 (CI = [-0.049 0.004]); Experiment 3: 0.008 ± 0.009 (CI = [-0.009 0.025]). Therefore, the large dual-task deficits in Experiments 1 and 2 cannot be explained by a failure to remember both words.

To compare the dual-task deficits to model predictions, we plot our data on attention operating characteristics (AOCs; Sperling & Melchner, 1978). The mean AOCs for each experiment are in Fig. 2: accuracy for words above fixation is plotted against accuracy for words below fixation. The single-task conditions are pinned to their respective axes. The accuracy levels in the dual-task condition form a single point (open circle) in that 2-D space. We compared that point to the predictions of three specific models of capacity limits (Bonnel & Prinzmetal, 1998; Scharff, Palmer, & Moore, 2011; Shaw, 1980; Sperling & Melchner, 1978; White et al., 2018):

1.
Unlimited-capacity parallel processing: Two stimuli can be fully processed simultaneously just as well as one stimulus, so there is no dual-task deficit. In the AOC, this model predicts that the dual-task point falls at the intersection of the dashed lines.
2.
Fixed-capacity parallel processing: The perceptual system extracts a fixed amount of information from the whole display per unit time. Therefore, processing resources must be shared between both stimuli in the dual-task condition, which lowers sensitivity. As the proportion of resources given to the right stimulus increases from 0 to 1, this model traces out the black curve in the AOC plot.
3.
All-or-none serial processing: Only one stimulus can be processed per trial, with equal sensitivity as in the single-task condition. The subject does not have time to even start processing the other stimulus and therefore must guess when asked about it. As the proportion v of trials in which the right side is processed increases from 0 to 1, this model traces out the diagonal black line in the AOC plot.

More information, including how the prediction curves were calculated, is in the Appendix. In addition, the Supplementary Material contains AOCs for individual subjects in all three experiments (Figures S1–S3).