Introduction

Visual working memory (VWM) is a short-term memory system that acquires relevant information via visual pathways and retains it, manipulating this information even after the visual stimuli have disappeared [1,2,3]. This memory system plays a key role in facilitating various complex cognitive activities [4, 5]. However, the capacity of VWM is notably limited [6,7,8,9,10]. This limited capacity not only restricts the quantity of information processed concurrently but also impacts the efficiency of different higher-level cognitive activities. Given the stimulus-rich environment that humans inhabit, where individuals frequently encounter potentially distracting stimuli, the active mitigation of the impact of distractors on VWM emerges as a critical strategy for enhancing memory performance. This necessitates prioritizing target items and suppressing irrelevant ones during the encoding and consolidation phases [11,12,13].

Due to the brief nature of encoding and consolidation processes in VWM, coupled with the potential overlap in the timeline of VWM resource allocation and the suppression of irrelevant items, traditional behavioral indicators (e.g., accuracy) have limitations in distinguishing individual resource allocation and filtering mechanisms. However, recent studies have used event-related potential (ERP) technology to investigate the filtering processes of distractors in VWM [11, 14,15,16,17]. An ERP component, distractor positivity (PD), was used to detect the filtering processes of distractors during the consolidation phases of a VWM task. The PD component is a neural marker associated with the suppression of irrelevant stimuli during visual tasks. It manifests as a positive deflection in the ERP waveform after stimulus presentation. The PD component is observed at posterior electrode sites and is contralateral to the location of the distractor [18, 19]. Research has shown that the PD component is crucial for understanding the mechanisms of attentional control, particularly in how the brain filters out distractors to maintain focus on task-relevant stimuli. This component is believed to reflect active suppression processes that help enhance perceptual and cognitive efficiency [20]. The amplitude and latency of the PD component can vary depending on the difficulty of the task, the salience of the distractor, and the individual’s attentional capacity [21, 22]. Studies using the PD component have provided significant insights into the neural underpinnings of selective attention, demonstrating that distractor suppression is not merely a passive process but involves active neural mechanisms that enable efficient visual processing [23, 24]. For instance, in the study by Feldmann-Wustefeld and Vogel [17], participants were required to focus on remembering the location and color of the target items while disregarding distractors in a lateralized change detection task. Under these conditions, the unilateral memory of target items elicited a contralateral delay activity (CDA) component—an ERP marker tracking VWM load [11, 25, 26]. Critically, the researchers observed that distractors elicited posterior contralateral positivity (Ppc) and PD components, with amplitudes that increased as the number of distractors increased and correlated positively with the individual’s VWM capacity. The Ppc component was argued to reflect the initial processing of stimuli or to represent physical salience, which might also be linked to predefined feature-filtering weights that are influenced by the participants’ task expectations [27,28,29]. By contrast, the PD component reflected the process used for suppression of to-be-ignored items, as well as the negative attentional weights on salience maps, thereby inhibiting the consolidation of distractors into the VWM system [17]. This study compellingly demonstrated active suppression of irrelevant items during VWM consolidation, preventing their entry into the VWM system.

The consolidation processes involved in VWM have been identified to include two distinct phases of resource allocation. During the early consolidation phase, individuals involuntarily allocate VWM resources in a stimulus-driven manner across visual stimuli. If the stimuli are presented for a sufficiently long duration, the individuals create low-resolution VWM representations for as many visual stimuli as possible. This initiates the transition into the late consolidation phase, wherein VWM resources are reallocated according to task demands, focusing on task-relevant visual stimuli [30,31,32,33]. Moreover, previous research has revealed that information is sequentially consolidated into the system in a bandwidth-limited manner [34, 35]. In addition, Vogel, et al. [36], using a masked change detection paradigm, demonstrated that the number of memory items participants could retain increased with the stimulus presentation duration up to the limit of their VWM capacity. This finding indicates that consolidating a single color item requires roughly 50 ms.

The bandwidth-limited sequential consolidation pattern in VWM raises an interesting question: Do irrelevant distractors require continuous suppression during their presence, or is suppression only necessary if they persist after the consolidation of task-relevant information? A critical aspect in resolving this question is determining whether distractor suppression processing still occurs when consolidation time is inadequate. Feldmann-Wustefeld and Vogel [17] did not provide a definitive answer in this context, given that their study used a fixed stimulus presentation duration of 200 ms. According to Vogel, et al. [36], this duration should be adequate for consolidation of more items than the two-color items required in the task used in the study by Feldmann-Wustefeld and Vogel [17]. Hence, the aim of the present study was to manipulate the stimulus presentation duration (i.e., short or long duration) to further explore the temporal dynamics between distractor suppression processing and the consolidation of task-relevant information. Our research strategy involved presenting memory stimuli under two presentation duration conditions (50 ms and 200 ms) while recording the ERP components of distractors as participants completed the task. We hypothesized that 200 ms represents a sufficient duration for consolidating task-relevant information, whereas 50 ms does not. Under the 200 ms condition, we anticipated observing results consistent with those of Feldmann-Wustefeld and Vogel [17]. More importantly, if the suppression processing of distractors occurs only after the consolidation of task-relevant information, we expected that the PD component elicited by distractors would be observable only under the long presentation duration condition and not under the short presentation duration condition. Conversely, if distractor suppression processing is independent of consolidation completion, then we expected to see similar PD component magnitudes under both stimulus presentation durations.

In addition to using the PD component to reflect distractor suppression, similar to the study by Feldmann-Wustefeld and Vogel [17], we also used the Ppc, N2pc, and CDA components to examine other cognitive processes in participants. The Ppc component can be elicited by targets or distractors [16, 37], and it reflects the initial processing of stimuli [27]. The N2pc component is characterized by contralateral negativity [23, 38,39,40,41,42], and has been widely used to investigate the deployment of attention [23, 40,41,42]. A relatively common practice is to interpret the N2pc as an index of the deployment of covert spatial visual attention [43] or the onset of attentional engagement [44]. The CDA component is a sustained negative potential widely used as an ERP marker of the visual information load stored in VWM [17, 45, 46]. By analyzing these ERP components separately, we can further investigate whether there are differences in the initial processing, attention allocation, and memory storage of targets or distractors under different stimulus presentation duration conditions.

Methods

Participants

We ensured sufficient statistical power for comparisons by conducting an a priori power analysis using G*Power 3.1.9.2 [47], referencing the effect sizes of the PD results reported in the study by Feldmann-Wustefeld and Vogel [17] (η2 = 0.274, 0.315, 0.367). We anticipated a similar effect size (η2 = 0.274) in our experimental design; therefore, we aimed for a statistical power of (1 – β) = 0.80 at a significance level of 0.05. This analysis suggested a minimum sample size of 26 participants. Based on our previous experience with EEG data collection in distractor filtering studies [15, 16, 48], we estimated that up to 25% of participants might be excluded due to excessive artifacts or high error rates. Therefore, to ensure obtaining data from at least 26 valid data from participants, we initially set our recruitment target at 35 participants.

Thirty-five students from Sichuan Normal University were recruited for the study. After artifact rejection and the removal of trials with response errors, five participants were excluded for having fewer than 100 trials per condition. The remaining 30 participants (aged 18–27 years, M = 20.6, SD = 2.54, including 9 males) were included in the data analysis. This sample size aligned closely with the sample size (N = 26) used in the study by Feldmann-Wustefeld and Vogel [17]. All participants were right-handed, had normal or corrected-to-normal vision, no color blindness or weakness, no psychiatric disorders, and had not previously participated in similar experiments. Participation was voluntary, and all participants were compensated based on their performance upon completion of the study. Our study was approved by the ethical committee of Sichuan Normal University and followed the guidelines of the Declaration of Helsinki (2008).

Materials

The experiment was programmed in Matlab 2018b using the Psych Toolbox and the program was used for data collection. The lab’s computer screen had a resolution of 1920 × 1080 pixels and a refresh rate of 60 Hz. A 64-channel electroencephalography (EEG) cap was used to collect the EEG signals. During the experiment, the distance between the participant’s eyes and the monitor was approximately 60 cm, with the participant’s eyes directly facing the center of the screen. The experiment was conducted in a quiet room, with all light sources other than the monitor turned off, ensuring clear visibility of the stimuli.

The stimuli were presented on the memory display and included colored squares (0.9°×0.9°), colored circles, and gray circles (all circles 1° in diameter). The colors of the stimuli were randomly chosen from nine options (RGB values: red [255-0-0], green [0-255-0], blue [0-0-255], yellow [255-255-0], magenta [255-0-255], cyan [0-255-255], dark green [20-80-20], purple [50-0-100], or orange [255-128-0]), with no color repeats. The gray circles had an RGB value of 128-128-128. The stimuli were presented in four groups, which included two colored squares, four colored circles, and two groups of grey circles (each containing two and four circles), as shown in Fig. 1A. Two groups of stimuli were positioned on the vertical median (above and below), and the other two were placed on the horizontal median (left and right), with their centers located 3.4° away from the screen center (measured from the center of the screen to the center of each group). The two colored squares represented the target items, the four colored circles represented the distractors, and the two groups of gray circles were neutral distractors. In the target lateral condition (50% of the trials), target items were presented on the horizontal median (either left or right, with equal probability), and the distractors were presented on the vertical median (either top or bottom, with equal probability). In the distractor lateral condition (the remaining 50% of the trials), the target items were presented on the vertical median (either top or bottom, with equal probability), and the distractors were presented on the horizontal median (either left or right, with equal probability). The remaining positions were filled with neutral distractor groups, with the set size of neutral distractors always matching the number of their opposite targets or distractors.

In the probe array, the location at which the memory stimulus appeared showed an outline, and only one of the target stimuli at that location was filled with color. In 50% of the trials, the color was identical to the color shown in the memory display at that location. In the remaining 50% of the trials, the color was chosen randomly from one of the other eight colors.

Experimental Procedure

As illustrated in Fig. 1B, each trial began with a 500 ms fixation cross at the center of the screen, signaling the upcoming stimuli. To minimize the impacts of eye movements and blinks, participants were instructed to focus on the cross throughout the experiment and to blink, if necessary, only during key presses. After 500 ms, the memory array appeared for either 50 ms or 200 ms. Participants were tasked with remembering the color and position of the colored squares (targets) while ignoring the circles (distractors). During the probe phase, their task was to determine whether the color and position of the probe square matched that of a square in the memory array. The probe array had a different color than the memory array in 50% of the trials and was identical in the remaining trials. Participants responded by pressing “F” for a match and “J” for a non-match. The trials were separated by a variable interval of 800–1000 ms.

The experiment consisted of 20 blocks, each containing 32 trials, totaling 640 trials. The stimulus presentation durations (50 ms or 200 ms) were randomly interspersed, each occurring 320 times. After each block, the average accuracy for that set of trials was displayed. At least 32 practice trials were performed prior to recording the test performance. To prevent fatigue, the participants were given a break after each block, and they continued the experiment upon feeling rested. The entire experiment lasted approximately one hour and twenty minutes.

It is worth noting that our study included additional controls for presentation time, which increased the total number of conditions compared to the study by Feldmann-Wustefeld and Vogel [17]. To reduce fatigue effects and ensure that participants could complete the tasks within a manageable time frame, we reduced the number of trials per condition by 20% compared to the study by Feldmann-Wustefeld and Vogel [17]. Therefore, while the study by Feldmann-Wustefeld and Vogel [17] used 200 trials per condition, our study used 160 trials for each condition.

Fig. 1
figure 1

(A) Memory displays utilized in the WM filtering task. The left image demonstrates memory displays where targets are displayed laterally, and distractors are presented on the vertical midline. These trials enabled the separation of target-related processing and the extraction of target-elicited lateralized ERP components (N2pc, CDA). The right image illustrates memory displays in which distractors are displayed laterally, and targets are presented on the vertical midline. These trials enabled the isolation of distractor-related processing and the extraction of distractor-evoked lateralized ERP components (Ppc and PD). (B) Display of a trial sequence. Each trial commenced with a fixation cross, followed by the memory display. Subsequently, a blank screen (containing only a fixation cross) was presented before a probe emerged at one of the previous target locations. Participants were required to determine whether the probe exhibited the same color as the target previously displayed at the same location

EEG data recording

The ERP recording system from Brain Products (Munich, Germany) was employed, with the EEG recorded utilizing a 64-channel electrode cap based on the international standard 10–20 system. The REF served as the online reference electrode, while the GND was the ground electrode. Horizontal electrooculography (EOG) was collected through the electrode site (IO) located 1 cm from the participant’s right eye corner, while the vertical EOG was recorded through the electrode site FCz. The sampling frequency was set at 500 Hz, with an impedance between the electrodes and the scalp of less than 10 kΩ and an impedance of less than 5 kΩ for the electrode sites used in the analysis.

Continuous EEG data were analyzed offline using the MATLAB (2018b), EEGLAB (2023.0), and ERPLAB (v10.0) toolboxes. The average values of TP9 and TP10 (bilateral mastoids) were used for re-referencing, and low-pass filtering (30 Hz, 24 dB/octave) was applied. The average amplitude 200 ms before the memory array appeared was used for baseline correction, and the analysis time window extended to 1000 ms after the memory array presentation.

Data analysis

Behavioral data

We conducted planned pairwise comparisons using two-tailed paired t-tests to assess differences in response time (RT) and accuracy (ACC) between the 50 ms and 200 ms presentation duration conditions.

EEG data

Our EEG data analysis methodology was aligned with the approach of Feldmann-Wustefeld and Vogel [17]. In the main task, the EEG data were averaged offline over a 1200 ms epoch, which included a 200 ms prestimulus baseline, with the epochs time-locked to the onset of the memory array. Trials involving incorrect responses, blinks, or saccades between 0 ms and 800 ms were excluded from the analysis.

Blinks were identified when the absolute amplitude of the vertical EOG exceeded 100 µV, whereas saccades were defined by a horizontal EOG amplitude difference of more than 32 µV in a 200 ms window relative to the subsequent 100 ms window (step criterion). Individual channel segments with an absolute voltage exceeding 100 µV were excluded. Data from five participants were discarded because those participants had fewer than 100 trials per condition due to insufficient trials post artifact removal or incorrect responses. The remaining 30 participants had an average of 7.28% unusable trials (SD = 3.41%).

The mean contralateral and ipsilateral ERP activity for each participant was calculated at electrode sites P7/P8, P5/P6, and PO7/PO8. This was done separately for each presentation duration (50 ms vs. 200 ms), for the laterality condition (contralateral vs. ipsilateral), and for sites contralateral and ipsilateral to the targets/distractors (target lateral condition/distractor lateral condition).

We determined the epochs for statistical analyses of different ERP components by calculating the lateralized ERP (contralateral minus ipsilateral) for both the target lateral and distractor lateral conditions, resulting in four distinct waveforms (two for the 50 ms presentation duration conditions and two for the 200 ms presentation duration conditions) that were indicative of lateralized activity due to targets and distractors. Our previous research has shown that early ERP components (e.g., Ppc) generally last around 50–60 ms [16]. Therefore, we used 60-ms measurement windows for these early components. The target-Ppc and distractor-Ppc epochs were determined as ± 30 ms around the most positive peak between 100 ms and 200 ms in the lateral-distractor waveform. The target-N2pc and distractor-N2pc epochs were identified as ± 30 ms around the most negative peak between 200 ms and 300 ms in the lateral-targets waveform. The distractor-PD epoch was established as ± 30 ms around the most positive peak between 250 ms and 350 ms in the lateral-distractor waveform. The mean amplitude for those time windows was calculated separately for laterality (contralateral vs. ipsilateral), for each presentation duration (50 ms vs. 200 ms). This resulted in four values for each ERP component per participant. In addition, as anticipated, visual inspection revealed that the CDA in the target lateral condition lacked a clear peak. Thus, we selected a measurement window of 350–750 ms for the CDA analysis, as used by Feldmann-Wustefeld and Vogel [17]. To examine the presence of each ERP component under different conditions, we first used one-tailed t-tests to compare the mean amplitude of each ERP component within its corresponding measurement window to zero for each presentation duration condition. If the ERP component was indeed present, we further conducted two-tailed t-tests to compare the ERP components between the 50 ms and 200 ms presentation duration conditions for significant differences. All t-tests included Cohen’s d as the estimator of effect size. Bayes factor analysis was also used to mitigate the incidence of chance-based null results, as suggested by Rouder, et al. [49]. The Bayes factor (BF10) provides an odds ratio for the alternative/null hypotheses (values < 1 favor the null hypothesis and values > 1 favor the alternative hypothesis). For instance, a BF10 of 0.2 indicates that the null hypothesis is five times more likely than the alternative hypothesis to be true.

Results

Behavioral results

The behavioral results are presented in Fig. 2. No significant difference was evident in response times (RT) between the 50 ms and 200 ms stimulus presentation duration conditions, t (29) = 1.006, P = 0.323, Cohen’s d = 0.184, BF10= 0.308. However, a significant difference in accuracy (ACC) was noted between the 50 ms and 200 ms stimulus presentation durations, t (29) = 6.813, P < 0.001, Cohen’s d = 1.244, BF10> 1000.

Fig. 2
figure 2

Behavioral results. The left panel presents the results of response times across different duration conditions, while the right panel depicts the accuracy results under different duration conditions. The error bars represent the standard errors of the mean values

ERP results

Target lateral condition

The grand-averaged difference waveforms (contralateral waveforms minus ipsilateral waveforms) are shown in Fig. 3A, and the amplitudes of each ERP component in the target lateral condition are depicted in Fig. 3B.

Ppc (111–171ms)

The results showed that the mean amplitude of the Ppc was significantly more positive than zero under both the 50 ms presentation duration condition, t(29) = 1.980, p = 0.029, Cohen’s d = 0.362, BF10= 2.079 (one-tailed test), and the 200 ms presentation duration condition, t(29) = 2.657, p = 0.006, Cohen’s d = 0.485, BF10= 7.314 (one-tailed test). This indicates that targets elicited a significant Ppc component in both presentation duration conditions. However, there was no significant difference in the amplitude of the Ppc elicited by a stimulus presentation duration of 200 ms compared to 50 ms, t(29) = 1.062, p = 0.297, Cohen’s d = 0.194, BF10= 0.325(two-tailed test).

N2pc(228–288ms)

The results showed that the mean amplitude of the N2pc was significantly more negative than zero under both the 50 ms presentation duration condition, t(29) = 2.594, p = 0.007, Cohen’s d = 0.474, BF10= 6.451 (one-tailed test), and the 200 ms presentation duration condition, t(29) = 2.350, p = 0.013, Cohen’s d = 0.429, BF10= 4.023 (one-tailed test). This indicates that the appearance of targets elicited a significant N2pc component in both presentation duration conditions. However, no significant difference was found in the amplitude of the N2pc elicited by a stimulus presentation duration of 200 ms compared to 50 ms, t (29) = 0.378, p = 0.708, Cohen’s d = 0.069, BF10= 0.208 (two-tailed test).

CDA (350–750ms)

The results showed that the mean amplitude of the CDA was significantly more negative than zero under both the 50 ms presentation duration condition, t(29) = 5.929, p < 0.001, Cohen’s d = 1.083, BF10> 1000 (one-tailed test), and the 200 ms presentation duration condition, t(29) = 6.516, p < 0.001, Cohen’s d = 1.190, BF10> 1000 (one-tailed test). This indicates that the appearance of targets elicited a significant CDA component in both presentation duration conditions. However, there was no significant difference in the amplitude of the CDA elicited by a stimulus presentation duration of 200 ms compared to 50 ms, t (29) = 0.687, p = 0.498, Cohen’s d = 0.125, BF10= 0.242 (two-tailed test).

Fig. 3
figure 3

(A) The target laterality difference wave was recorded at the P5/P6, P7/P8, and PO7/PO8 electrode sites, and the average amplitude of the waves was analyzed. The purple and gray lines represent conditions with stimulation presentation durations of 50 ms and 200 ms, respectively. The gray bar chart represents the time windows corresponding to the Ppc, N2pc, and CDA of the target laterality. (B) The results of the difference test for Ppc, N2pc and CDA under the two conditions (stimulation presentation duration of 50 ms or 200 ms) are shown. The purple bar chart represents the condition with a stimulation presentation duration of 50 ms, while the gray bar chart represents the condition with a stimulation presentation duration of 200 ms. Error bars indicate the standard errors of the mean values

Distractor lateral condition

The grand-averaged difference waveforms (contralateral waveforms minus ipsilateral waveforms) in Fig. 4A, and the amplitudes of each ERP component in the distractor lateral condition are depicted in Fig. 4B.

Ppc (111–171 ms)

The results showed that the mean amplitude of the Ppc was marginally more positive than zero under both the 50 ms presentation duration condition, t (29) = 1.682, p = 0.052, Cohen’s d = 0.307, BF10 = 1.283 (one-tailed test), and significantly more positive than zero the 200 ms presentation duration condition, t (29) = 2.075, p = 0.024, Cohen’s d = 0.379, BF10= 2.443 (one-tailed test). This indicates that the appearance of distractors elicited a Ppc component in both presentation duration conditions. However, no significant difference was apparent in the amplitude of the Ppc elicited by a stimulus presentation duration of 200 ms compared to 50 ms, t (29) = 0.252, p = 0.803, Cohen’s d = 0.046, BF10= 0.200 (two-tailed test).

N2pc (204–264 ms)

The results showed that under the 50 ms presentation duration condition, the mean amplitude in the N2pc measurement windows was significantly more negative than zero, t (29) = 1.729, p = 0.047, Cohen’s d = 0.316, BF10= 1.378 (one-tailed test). However, under the 200 ms presentation duration condition, the mean amplitude in the N2pc measurement windows did not differ from zero, t (29) = 1.360, p = 0.908, Cohen’s d = 0.248, BF10= 0.090 (one-tailed test). These results indicate that the appearance of distractors elicited a significant N2pc component only with the 50 ms stimulus presentation, not with the 200 ms presentation. Additionally, the mean amplitude in the N2pc measurement windows under the 50 ms presentation duration condition was significantly greater than that under the 200 ms presentation duration condition, t (29) = 2.533, p = 0.017, Cohen’s d = 0.462, BF10= 2.885 (two-tailed test).

PD (288–348 ms)

The results showed that the mean amplitude of the PD was significantly more positive than zero under both the 50 ms presentation duration condition, t (29) = 2.225, p = 0.017, Cohen’s d = 0.406, BF10= 3.190 (one-tailed test), and the 200 ms presentation duration condition, t(29) = 4.199, p < 0.001, Cohen’s d = 0.767, BF10= 242.710 (one-tailed test). These results indicate that the appearance of distractors elicited a significant PD component in both presentation duration conditions. Additionally, the PD amplitude was significantly greater under the 200 ms condition compared to the 50 ms condition, t(29) = 2.376, p = 0.024, Cohen’s d = 0.434, BF10= 2.145 (two-tailed test).

Fig. 4
figure 4

(A) Distractor laterality difference waves were recorded at the P5/P6, P7/P8, and PO7/PO8 electrode sites, with the average amplitude of the waves analyzed. The purple and gray lines represent conditions with stimulation presentation durations of 50 ms and 200 ms, respectively. The gray bar chart represents the condition with a stimulation presentation duration of 200 ms. The time windows correspond to the Ppc, N2pc, and CDA of the target laterality. (B) The different test results for Ppc, N2pc and PD under the two conditions (stimulation presentation duration of 50 ms or 200 ms) are shown. The purple bar chart represents the condition with a stimulation presentation duration of 50 ms, while the gray bar chart represents the condition with a stimulation presentation duration of 200 ms. Error bars indicate the standard errors of the mean values

Discussion

The results of our study revealed that when the number of target items was two, no significant difference was observed in the CDA amplitude elicited by the target items, regardless of the stimulus presentation duration (50 ms or 200 ms). However, extending the stimulus presentation duration resulted in a significant improvement in the individuals’ VWM accuracy. When distractors were presented with a contralateral bias, an N2pc component was elicited by distractors only with the 50 ms presentation duration, not with the 200 ms duration. This suggests that when participants do not have sufficient time to consolidate target stimuli, distractors are more likely to capture their attention. More importantly, a significant PD component was elicited by distractors at both 50 ms and 200 ms presentation durations, but the PD amplitude elicited by distractors was larger under the 200 ms presentation duration condition than under the 50 ms presentation duration condition. These findings indicate that when participants have sufficient time to consolidate the target stimuli, they can more effectively suppress the distractors. Thus, our current findings suggest that consolidation time is an important factor influencing the suppression effects of distractors.

Feldmann-Wustefeld and Vogel [17] found that increasing the number of distractors resulted in increases in the PD amplitude, leading them to suggest that the PD amplitude reflects the strength of “gating” in the gating theory. This theory posits that as the distraction effect of the number distractors increases, a larger voluntary effort is required to close the “gate” to prevent distractors from entering VWM. In our study, the two target items had already been fully consolidated into VWM by 200 ms; therefore, when the stimulus presentation duration was 200 ms, this left more time for the late consolidation phase. At that point, individuals could further allocate their unused VWM resources for processing visual stimuli. However, this condition may inadvertently allow distractors to enter VWM and disrupt performance. To prevent this, individuals need to exert a larger voluntary effort to tightly close the “gate” and actively suppress the entry of distractors. In the current experiment, the longer stimulus presentation meant that distractors remained in the visual field for a longer time, thereby increasing the likelihood of their entry into the VWM system. Thus, reducing the exposure time of distractors in the memory array (i.e., decreasing the stimulus presentation duration) effectively minimized the chance of distractors being consolidated into VWM.

Seminal research on the PD component has demonstrated that individuals with high VWM capacity are more adept at actively suppressing distractors in a visual search task [22]. This capability enables them to maintain higher levels of attention on relevant stimuli. Conversely, individuals with low VWM capacity struggle to suppress these distractors effectively. Therefore, a close relationship exists between an individual’s VWM capacity and their ability to process visual distractor suppression. Future research could use the PD component to further investigate how individuals with varying levels of VWM capacity manage distractor suppression under different stimulus presentation durations.

One point worth noting is that in our study we initially expected that we would not observe a significant PD component in the 50 ms presentation duration condition. However, in reality, we saw that a significant PD amplitude was already elicited in the 50 ms presentation duration condition. Moreover, neither condition elicited a CDA component, suggesting that distractors in both conditions required suppression and were successfully suppressed [50]. This indicates that when the memory load was low, participants successfully avoided storing distractors in VWM. When the memory consolidation time was limited, participants were likely in the early phase of memory consolidation and did not have sufficient time to consolidate distractors. Conversely, when the memory consolidation time was sufficient, participants were in the late consolidation phase and more actively suppressed the storage of distractors in VWM. We propose two possible explanations for this phenomenon: (1) The consolidation speed for target items may be faster than we initially assumed. Previous research has indicated that consolidating a single color item takes approximately 50 ms [36] and that two color items can be consolidated in parallel into VWM. Therefore, we hypothesize that 50 ms may be a sufficient duration for consolidating two color items into VWM. However, the observation of a significant PD component in the 50 ms presentation duration condition may indicate that the time required to consolidate two target items into VWM is less than 50 ms, suggesting a faster consolidation speed than we had assumed. (2) The consolidation of target items and the suppression of distractors may not be a sequential process, but rather a parallel one. In other words, individuals may start suppressing distractors while consolidating target items; consequently, even with a very brief stimulus presentation duration, a PD component may still be observed. Future research could further investigate these two possibilities.

In addition, we observed the Ppc component in both target and distractor lateral trials. However, we did not find that the stimulus presentation duration modulated the amplitude of the Ppc component. This suggests that stimulus presentation length did not affect the initial processing of stimuli in our study. This finding is reasonable, as our trials with different stimulus presentation durations were randomly intermixed, leading participants to use the same initial process to encode the memory array under different conditions, regardless of whether the presentation duration was 50 ms or 200 ms. However, very few studies have systematically analyzed the Ppc component [17, 28, 37]. The functional significance of the Ppc component remains unclear, making it difficult to further interpret the Ppc-related results based on our findings. The Ppc component may be related to the stimulus-driven P1, which is sometimes larger in areas contralateral to the targets than in those ipsilateral to them. This lateralization of P1 has been attributed to low-level sensory processes [23]. Overall, our findings add to the growing body of literature on the Ppc component, underscoring the need for further research to elucidate its role in cognitive processing.

Regarding the N2pc component, we observed a significant N2pc component in target lateral trials regardless of the stimulus presentation duration, indicating that participants shifted their attention to target locations upon their appearance. However, in distractor lateral trials, an ambiguous N2pc component (BF10 = 1.378) was observed only at the 50 ms presentation duration, while no N2pc component (BF10 = 0.090) was observed at the 200 ms presentation duration. This suggests that distractors presented for shorter durations are more likely to capture participants’ attention than those presented for longer durations. As previously mentioned, this result may be attributed to the parallel processes of initial suppression of distractors and initial consolidation of target stimuli. When the consolidation time for target is insufficient, participants may allocate more cognitive resources to consolidating the target, resulting in weaker suppression of distractors. Consequently, distractors can automatically capture attention. In this case, participants involuntarily allocate some attentional resources to the distractor. In contrast, with sufficient memory consolidation time, participants can actively suppress the distractor after completing the target consolidation, preventing the distractor from attracting more attentional resources. From this perspective, the N2pc results align well with our findings on the PD component. As the PD component strengthens, the N2pc component correspondingly weakens. This indicates that the mechanism of distractor suppression likely involves reducing the attentional capture by the distractor initially.

Moreover, our results with two target items demonstrated that the CDA amplitude was not significantly different between the two stimulus presentation durations (50 ms and 200 ms). Given that the CDA component tracks the number of items stored in VWM, this result indicates that while a longer presentation time may improve behavioral memory performance [51], it does not significantly increase the number of items stored in VWM. This finding was consistent with previous research and supported the notion of bandwidth-limited consolidation i.e., parallel consolidation for two colors, see [34, 52]. While no significant difference was found in the CDA amplitude, the average accuracy was significantly higher with the 200 ms stimulus presentation than with a 50 ms presentation. This finding supported the slot-resource model [6, 53, 54], which suggests that while the VWM resources have a finite total capacity, the precision of VWM representations depends on resource allocation under different task conditions. Moreover, according to our previously proposed two-phase resource allocation model [31, 32], the VWM consolidation process includes two distinct resource allocation phases [30, 33]. When the stimulus presentation duration is relatively limited, individuals automatically allocate resources to each memory stimulus as efficiently as possible, resulting in a relatively low memory precision for stimuli. If the stimuli continue to be presented until the early VWM consolidation phase is completed, then resource allocation enters a late consolidation phase, allowing individuals to further allocate unused VWM resources to memory representations, thereby enhancing the precision of the representations. Therefore, in the current experiment, participants needed to remember only two target items, leaving them with sufficient resources to allocate to each. However, when the stimulus presentation duration was only 50 ms, even if individuals could allocate more resources to remember target items, they could not achieve the same level of consolidation as was possible with a longer stimulus presentation duration of 200 ms. Therefore, our study provides new insights into the effect of presentation duration on the number of items stored in VWM.

It is worth noting that in our study, we controlled the time available for participants to consolidate memory of the target stimuli by directly manipulating the stimulus presentation duration. In contrast, some previous studies have used visual masks immediately following the disappearance of the memory array to more precisely control the consolidation time and prevent participants from further encoding the residual visual afterimages of the stimuli [10, 36, 55,56,57]. Our study did not use such masks to prevent further encoding for two main reasons. First, we believe that controlling the memory presentation duration alone, without the use of a mask, is a valid method for controlling the consolidation time. Although previous studies have shown that in the absence of a post-mask, visible persistence can retain stimulus information for at least 100 to 200 ms after stimulus offset due to retinal persistence [58,59,60,61], this visible persistence represents merely a gradual decay of visual information. Participants can continue to encode this visible persistence, but they cannot extract additional information to enhance the precision of their VWM representations. Thus, the total effective time for extracting information and forming high-precision memory representations primarily depends on the stimulus presentation duration. For this reason, our previous studies on memory consolidation also typically controlled only the presentation time without using masks [30,31,32]. Our recent research on early perceptual encoding benefits in VWM tasks also found consistent results and drew the same conclusions, regardless of whether we used a paradigm with or without masks [55]. Secondly, given that the focus of our current study is to investigate the suppression mechanisms of distractor stimuli, we needed to ensure that only the distractors we introduced would serve as stimuli requiring suppression during the task. We know that masks can disrupt the consolidation process for memory items. A previous study found that object-substitution masking in a VWM task can degrade the precision of VWM representations [62]. Thus, using masks would have introduced additional distractors. We did not want participants to allocate extra resources to suppress the masks to prevent them from being consolidated into VWM, which would have introduced more noise into the ERP components reflecting distractor suppression processing.

It is also important to note that both our research and previous studies investigating distractor processing in VWM tasks using ERP components typically used paradigms where participants are asked to memorize or suppress stimuli presented in only one visual hemifield [17]. However, in real-life scenarios, we often need to simultaneously memorize or suppress visual information presented in both visual hemifields. Previous research has demonstrated that VWM performance is better when visual items are allocated in both left and right visual fields than within only one hemifield [63, 64]. This improvement in VWM performance is most likely due to the allocation of more attentional resources when items are presented in both left and right visual fields [65]. Therefore, the process of suppressing distractor stimuli presented bilaterally might differ from that of suppressing stimuli presented unilaterally. When extending our research conclusions to the processing of visual information memory/suppression in both visual hemifields, we should adopt a more cautious approach. Future research could combine ERP components with experimental paradigms that present stimuli bilaterally. By varying the number of items requiring suppression in both hemifields, future research could examine whether our findings can be generalized to tasks involving bilateral visual field presentations.

Previous studies often used ERP components (e.g., CDA) to examine the processing of distractors in VWM tasks, typically presenting distractors and targets simultaneously during the encoding phase of the memory array [11, 14, 17]. However, recent research has shown that the mechanisms involved in processing distractors that appear during the VWM encoding phase (when distractors and targets appear simultaneously) differ from those involved in processing distractors that appear during the VWM maintenance phase (when distractors appear after the targets have disappeared) [66, 67]. Compared to suppressing distractors during the encoding phase, individuals find it more challenging to actively suppress information that appears during the maintenance phase. Recent findings on the different suppression mechanisms for distractors during encoding and maintenance phases have primarily focused on behavioral studies [66, 67]. To our knowledge, there have been no ERP studies directly investigating this issue. Future research should distinguish between the suppression mechanisms for distractors during the memory encoding phase and those during the memory maintenance phase. Additionally, researchers should consider using ERP techniques to further examine the impact of different stimulus presentation durations on these distinct suppression mechanisms.

Conclusions

This study manipulated the duration of stimulus presentation and explored the influence of time factors on individual suppression control of distractors. The results indicated that the suppression of distractors in VWM is regulated by the duration of stimulus presentation. Although individuals actively suppress distractors regardless of the stimulus presentation duration, a longer presentation duration results in a stronger suppression effect on distractors.