Background

Breast cancer represents the most prevalent cancer globally, accounting for 2.26 million new cases in 2020, and being the leading cause of cancer-related deaths in women [47]. While substantial progress has been made in understanding the etiology of breast cancer, encompassing genetic, hormonal, and lifestyle factors, these elements only partially explain its incidence. Air pollution classified as carcinogenic to humans by the International Agency for Research on Cancer (IARC) [35], has emerged as a potential contributor to breast cancer risk [45, 50]. Air pollution is a complex mixture of multiple pollutants including compounds with endocrine-disrupting properties such a polychlorinated biphenyls (PCBs) and polycyclic aromatic hydrocarbons (PAHs) [10]. Increased risks for breast cancer associated with exposure to PCBs, and benzo[a]pyrene (BaP) the most well-known PAH, have been reported [3, 15, 31, 34]. However, to date, the epidemiologic evidence on the effect of environmental exposure to these agents has been inconsistent.

PCBs are a group of synthetic chlorinated chemicals composed of various congeners. They were widely used in industry between the 1930s and the 1970s for their thermal, electrical, and non-flammable properties until their prohibition in the 1980s in many countries, including France [4, 31]. While they were mainly used as dielectric fluid in capacitors and transformers, they have also been used in construction materials and were released into the environment by leakage, volatilization or erosion. As persistent organic pollutants, PCBs are stable in various environmental matrices, i.e., biota, air, water, soil or sediment, and have been shown to bioaccumulate in the human body [29]. Among PCBs, PCB153 is the most abundantly detected in the environment and the primary contributor to the PCBs body burden estimate [6, 14, 21, 52, 54]. Most previous epidemiological studies on breast cancer have estimated exposure to PCBs from blood or adipose tissue samples, particularly from breast tissue, with inconsistent results [31, 32]. While some studies found association for some PCBs only [56], others found positive associations for a group of PCB congeners [28, 37, 44, 52], or even inverse association [30]. A positive correlation between PCB153 and total PCBs has been observed in several studies [6, 14]. In the XENAIR case–control study nested with the French prospective E3N cohort [2], we assessed air pollutant exposure at the residential addresses up to 22 years prior to breast cancer diagnosis and found a positive association with long term exposure to airborne PCB153 [15]. For each one standard deviation (SD, 55 pg/m3) increase of the cumulative of airborne PCB153 concentration levels, the adjusted odds ratio (OR) was 1.19 (95% confidence interval (CI): 1.08–1.31) [15].

PAHs are mainly produced by domestic and vehicular emissions as well as industrial and natural combustion [5]. They have both estrogenic and anti-estrogenic effects [50, 51]. Benzo[a]pyrene (BaP) is one of the best known and well characterized PAH components [25], and is often used as a surrogate for estimating total PAH exposure [7]. The general population is exposed to it through ambient air, tobacco smoke, water, and food [18, 41]. Two epidemiological studies have found a positive association between breast cancer and BaP exposure from traffic [38, 39]. In the XENAIR case–control study, we also highlighted a positive association with long term exposure to airborne BaP concentrations at the residential addresses up to 22 years prior to breast cancer diagnosis: adjusted OR of 1.15 (95% CI: 1.04–1.27) for each increase of 1 interquartile range (IQR) (1.42 ng/m3) in the cumulative airborne BaP concentration levels [1].

None of the previous studies have explored the longitudinal trajectories of exposure to PCB153 or BaP preceding the diagnosis of breast cancer. Studies predominately considered summary estimates of the exposure history, such as time-averaged or cumulative exposure metrics which only partly reflect the evolution of the subjects’ exposure over time. Even in our two previous studies using the XENAIR case–control data [1, 15], the cumulative exposure metric did not reflect the temporal dynamic of exposure. Yet, breast cancer risk may vary depending on the timing of exposure and according to exposure trajectories for different agents [16, 20, 49]. Investigating the exposure trajectories may accelerate the understanding of the effects of airborne PCBs and BaP exposures on breast cancer risk.

In the present study, we address this need by identifying distinct trajectory patterns of outdoor exposure to PCB153 and BaP at the residential addresses, over up to 22 years before breast cancer diagnosis, and estimate their association with the risk of breast cancer, using XENAIR case–control data and a latent class mixed model (LCMM) approach.

Methods

The E3N cohort

E3N is an ongoing French prospective cohort which aims to investigate risk factors for severe chronic conditions in women [11]. It enrolled 98 995 women living in France between June 1990 and November 1991, aged 40–65 years, and insured with a national health insurance covering workers from the French National Education System (MGEN). Participants were followed with a self-administrate mailed questionnaire every 2 or 3 years, collecting data on their lifestyle (smoking and physical activity), reproductive factors (age at menarche and menopause, number of children, age at first full-term pregnancy, and breastfeeding), anthropometry (height, weight), medical history (benign breast disease) and familial history of cancer. To date, a total of 13 questionnaires were sent to participants, with a participation rate around 80% [11]. Breast cancers were first self-declared in each questionnaire, and then confirmed by pathology reports for 93% of the cases. The proportion of false-positive was less than 5%. In addition to the questionnaires, blood samples were collected from 25 000 participants between 1994 and 1999, and saliva samples from 47 000 participants between 2009 and 2011. Residential addresses were collected from questionnaires along the follow-up. Place of birth (postal code and municipality) was collected in the first questionnaire and used to categorize each participant’s birthplace as either urban or rural status based on data from the closest national census [9].

The XENAIR nested case–control study

XENAIR is a case–control study nested within the E3N cohort [1]. The main objective of this study was to investigate chronic long-term effects of exposure to multiple ambient air pollutants on the risk of breast cancer. Cases were all incident cases of primary invasive breast cancer diagnosed from entry into the E3N cohort to the 10th follow-up questionnaire (2011), excluding Paget’s disease and phyllodes tumors as well as women who had any cancer diagnosis before cohort entry. Cases not ascertained by pathology report were not excluded because of the low false-positive proportion in self-reports (< 5%). For each case, one control was randomly selected using incidence density sampling, among E3N cohort participants at risk of breast cancer at the time of case’s diagnosis, using time since cohort entry as the time axis. Cases who gave a blood sample were matched to controls who also gave one on the following factors measured at blood collection: calendar date (± 3 months), age (± 1 year), menopausal status, and French department of residence. Cases who did not provide a blood sample were matched to controls on the same criteria but were collected at cohort entry, as well as on the presence or not of a saliva sample.

All cases’ and controls’ residential addresses that were collected in the E3N questionnaires, from cohort entry to the index date (diagnosis for cases, selection for controls), were geocoded (X and Y coordinates, addresses) using the ArcGIS Software (ArcGIS Locator version 10.0, Environmental System Research Institute – ESRI, Redlands, CA, USA) and its reference street network database (BD Adresse®) from the National Geographic Institute. Geocoding was performed by a trained technician blinded to the case–control status of the participants, using a validated method [19]. Management of missing and incomplete addresses has been described previously [15]. In brief, women who had at least one missing address in questionnaires, or at least one address outside the continental mainland France, were excluded, as well as their matched cases or controls. In the present study, we further excluded all cases and controls with a follow-up shorter than one year, as our statistical analysis relied on mean annual exposure.

PCB153 and BaP exposure assessment

Annual atmospheric concentration levels of PCB153 and BaP were estimated at the subjects’ consecutive geocoded residential addresses from their cohort entry (1990–1991) to their index date (date of diagnosis for case and selection for her individually matched control, which could occurred at any time between 1991 and 2011) using the CHIMERE model [1, 15], a regional chemistry-transport model developed by the National Institute for Industrial Environment and Risks [12, 26]. CHIMERE was designed to simulate pollutant transport using emission data, meteorological fields, and boundary conditions, at a spatial resolution of 0.125° × 0.0625° (approximately 7 × 7 km). This allowed us to compute for each woman, individual exposure trajectories for each pollutant (Figure S1 in Supplementary Material). For BaP, we further described how much of annual concentrations were above the European target value of 10 pg.10/m3, or above a previous estimated acceptable environmental risk for cancer of 1.2 ng.10/m3 [26].

Statistical analysis

To identify distinct profiles of exposure trajectories to PCB153 and BaP, we used a latent class mixed model (LCMM) for each pollutant [33, 42]. The LCMM assumes that the population is composed of G distinct non-observed subgroups (latent classes) of subjects, each subgroup being characterized by a specific pattern of trajectory. The LCMM was defined as follows: The woman membership to each latent class followed a categorical distribution with probabilities to be estimated. Given a specific class-membership, the trajectory of the annual average air pollutant concentration estimates from 1990 to 2011 was modelled by a mixed model with class-specific fixed effects. Because neither PCB153 nor BaP annual average concentrations estimates were normally distributed (Figure S2 in Supplementary Material), the mixed model included an internal transformation using I-splines functions with four knots at quintiles of the pollutant distribution (4.71, 6.93, 9.26 and 12.29 ng/m3 for PCB153; and 0.88, 1.25, 1.67 and 2.35 pg.10/m3 for BaP). Because trajectories of exposure to each pollutant were nonlinear in time (Figure S1 in Supplementary Material), the class-specific mean trajectories were modelled using a natural spline function of time, with two knots at tertiles of follow-up (4.30 and 8.97 years). All spline functions and knots were chosen to maximize the fit to data according to the Akaike Information Criteria (AIC). Multivariate normal random effects on the intercept and each spline coefficient of the time function were added to account for the correlation between repeated pollutant measures of each woman. Their variance–covariance matrix was unstructured and common across classes. Each LCMM was estimated by maximum likelihood using 100 different sets of initial parameter values to ensure convergence to the global maximum [27]. Once the model was estimated, each woman was a posteriori classified in the class of exposure trajectory to which she had the highest probability to belong, given her own observed trajectory. To estimate the number of classes G for each pollutant, we estimated five LCMM for each pollutant, each with a given number of classes (G = 1, 2, 3, 4, 5). We selected the best LCMM for each pollutant with respect to i) the fit to data using the Bayesian Information criterion (BIC), ii) the discrimination between classes using entropy, and iii) a mix of both fit and discrimination using the Integrated Completed Likelihood (ICL) criterion [8].

Once we selected the best LCMM for each pollutant, we described demographic and lifestyle characteristics in each identified classes. We further used conditional logistic regression to estimate the adjusted association between exposure trajectory class membership and the odds of being diagnosed with breast cancer. Because the entropy of the best LCMM was excellent (0.94 for PCB153 and 0.96 for BaP) and the sample size was large, we did not account for classification uncertainty [17]. In addition to matching factors, each model was adjusted for the level of education (secondary, 1 to 2-year university degree, ≥ 3-year university degree) which was the minimally sufficient adjustment set identified from the directed acyclic graph (Figure S3 in Supplementary Material [15]). We also adjusted for age at the index date (in years) to account for potential residual differences in age within each case–control pair. In a sensitivity analysis, we further adjusted for known and potential risk factors of breast cancer: urban residence at birth (yes, no), age at menarche (< 12, 12–13, ≥ 14 years), total physical activity (quartiles of METs-h/week), smoking status (never, current, former smoker), alcohol drinking (0, 1–6.6, ≥ 6.7 g/day), body mass index (< 25, 25–29, ≥ 30 kg/m2), parity and age at the first full-term pregnancy (no child/1–2 children and age < 30 years, 1–2 children and age ≥ 30 years, > 3 children), ever breastfeeding (yes, no), use of oral contraceptives (yes, no), use of menopausal hormone replacement therapy (yes, no), and history of personal benign breast disease (yes, no). All variables were taken at cohort entry except for alcohol consumption which was collected at questionnaire 3 through the dietary questionnaire and for hormonal treatments, we used the information available in the last questionnaire before the index date. Missing data were imputed by the modal class for risk factors with less than 5% of missing data, and a 'missing' category was created for risk factors with more missing data [15, 24].

All analyses were performed using the R software version 3.6.1, and the lcmm R package version 1.9.2 for the identification of classes of exposure trajectories [42].

Results

Cases and controls selection and characteristics

Among the 98,995 women included in the E3N cohort study, 6,298 incident breast cancer diagnoses were identified between 1990 and 2011 (Figure S4 in Supplementary Material [15]. After excluding 19 cases of Paget disease and phyllodes tumors, 3 cases with missing data on matching factors, and 1,054 cases or controls with missing data on addresses or addresses outside metropolitan France, the XENAIR case–control study included 5,222 cases and 5,222 matched controls [15]. For the present study, we further excluded 331 women with follow-up shorter than one year, leaving 5,058 cases and 5,059 controls for the trajectory analysis, and 5,058 case–control pairs for the conditional logistic regression analysis.

For both, cases and controls, the median age at cohort entry was close to 49 years (IQR: 44, 54) and the median follow-up time from cohort entry to the index date was 11.5 years (IQR: 7.0, 15.7) (Table 1). The median of the mean annual concentrations estimates over the follow-up period was 8.1 pg/m3 (IQR: 5.3, 11.5) for PCB153 and 1.5 pg.10/m3 (IQR: 1.1, 2.1) for BaP in cases, and almost identical in controls. Almost all annual BaP concentrations estimates were below the European target value of 10 pg.10/m3, yet about 62% were above the estimated acceptable environmental risk for cancer of 1.2 ng.10/m3. As expected, cases tended to have fewer children than controls and more often personal history of benign breast disease and family history of breast cancer (Table 1).

Table 1 Characteristics of cases and controls, XENAIR case–control study nested in the E3N cohort, France, 1990–2011

Classes of PCB153 trajectories and association with breast cancer

The best LCMM for PBC153 in terms of both fit to data and discrimination capacity between classes had five latent classes (Figure S5, Table S1 Supplementary materials). The mean exposure trajectory estimated in each of the five identified classes are shown in the left panel of Fig. 1. The right panel shows individual observed trajectories of women classified in each class, to illustrate the variability of exposure within each class. Except for Class 4, all classes had a decreasing mean trajectory of the annual average PCB153 concentrations, with varying slopes (Fig. 1). Class 1 included 158 (1.6%) women (67 cases, 91 controls) who were overall exposed to low PCB153 concentrations except in the early 90 s (median at 6.6 pg/m3, IQR: 4.2, 10.5, Table 2). Class 2 was by far the largest class with 9,627 (95.2%) women (4,816 cases, 4,811 controls) and had a linear and slowly decreasing mean trajectory of PCB153 concentrations over the whole study period (1991–2011) (Fig. 1) (median at 8.0 pg/m3, IQR: 5.3, 11.3, Table 2). Class 3 was the smallest class with only 43 (0.4%) women (22 cases, 21 controls). This class had the highest concentration levels in the early 1990s (more than 25 pg/m3, Fig. 1), but exhibited the deepest decline from 1990 to 1996, resulting overall in the lowest median PCB153 concentrations over the whole study period (5.6 pg/m3, IQR: 3.6, 12.5, Table 2). Class 4 was made of 91 (0.9%) women (50 cases, 41 controls) and was the only class with an increasing mean PCB153 trajectory from 1991 to 1996, likely due to a large number of residential moves (Table 2). With a mean trajectory that decreased thereafter but stayed at high concentration levels (Fig. 1), this class presented the second highest median concentrations over the whole period (11.3 pg/m3, IQR: 7.4, 16.1, Table 2). Class 5 was made of 198 (2.0%) women (108 cases, 90 controls) with high concentrations until the early 2000s (Fig. 1), and thus exhibited the highest median PCB153 concentrations levels over the whole study period (16.7 pg/m3, IQR: 10.6, 22.9, Table 2).

Fig. 1
figure 1

Trajectories of PCB153 concentrations, XENAIR case–control study, France, 1990–2011 (n = 5 058 cases, 5 059 controls). The left panel shows the estimated mean trajectory of the average annual PCB153 concentrations over the follow-up in the five classes identified by LCMM model. The right-hand panel shows the individual trajectories of subjects classified in each class, with the black bold line representing the estimated mean trajectory in the class with its 95% CI. The representation of each estimated mean trajectory is truncated at the 95th percentile of the distribution of observed exposure times in women a posteriori classified in the class

Table 2 Characteristics of the participants according to their PCB153 exposure trajectory, in the XENAIR case–control study nested in the E3N cohort, France, 1990–2011 (n = 5 058 cases, 5 059 controls)

Mean age was similar in the five classes of PBC153 trajectories (Table 2). As expected, Class 5 which had the highest median PCB153 concentrations over the whole-time window (1990–2011), also had the highest proportion of women who lived in urban area at cohort entry (1990–1991), but also at birth (Table 2). Compared to other classes, women from Class 5 were more likely to have higher education levels, lower physical activity, an older age at their first full-term pregnancy, and to report a family history of breast cancer (Table 2). Class 4 which had the only non-monotone mean exposure trajectory over time, had a large proportion of women with residential mobility over the study period (Table 2).

Classes 2–5 tended to have a higher risk of breast cancer compared to Class 1, even if the 95% CI was large for Class 3 because of fewer cases and controls (Table 3). In particular, Classes 4 and 5 with the strongest concentrations had more than 60% higher odds of breast cancer than Class 1 after adjustment for education level, age at the index date, as well as matching factors (e.g. OR = 1.69, 95% CI: 1.08, 2.64 for Class 5, Table 3). All results were very similar after further adjustment for risk factors of breast cancer (Table 3).

Table 3 Association between classes of PCB153 trajectories and breast cancer, XENAIR case–control study nested in the E3N cohort, France, 1990–2011 (n = 5,058 case–control pairs)

Classes of BaP trajectories and association with breast cancer

The best LCMM for BaP in terms of both fit to data and discrimination capacity between the classes had four latent classes (Figure S4 and Table S1 in supplementary materials). All the four mean trajectories of BaP exposure were declining from 1990 to 2011, but at different rates and with different average levels, ranging from the lowest concentrations for Class 1 (91.9% of women) to the highest concentrations for Class 4 which showed very high concentrations in the early 1990s (Fig. 2). The median average annual BaP concentrations over the study period was 1.4 ng.10/m3 (IQR: 0.9, 2.0) in Class 1, 1.6 ng.10/m3 (IQR: 0.7, 2.7) in Class 2, 3.0 ng.10/m3 (IQR: 1.9, 4.6) in Class 3, and 4.4 ng.10/m3 (IQR: 2.7, 7.1) in Class 4 (Table 4). In classes 3 and 4, more than 91% of all annual concentration levels were above the estimated acceptable risk level of 1.2 ng.10/m3 compared to less than 62% in Classes 1 and 2.

Fig. 2
figure 2

Trajectories of BaP concentrations, XENAIR case–control study, France, 1990–2011 (n = 5 058 cases, 5 059 controls). The left panel shows the estimated mean trajectory of the average annual BaP concentrations over the follow-up in the four classes identified by LCMM model. The right-hand panel shows the individual trajectories of subjects classified in each class, with the black bold line representing the estimated mean trajectory in the class with its 95% CI. The representation of each estimated mean trajectory is truncated at the 95th percentile of the distribution of observed exposure times in subjects a posteriori classified in the class

Table 4 Characteristics of the participants according to their BaP exposure trajectory, in the XENAIR case–control study nested in the E3N cohort, France, 1990–2011 (n = 5 058 cases, 5 059 controls)

Mean age at cohort entry was similar in all classes of BaP trajectories (Table 4). Like PCB153-Class 5, BaP-Class 4 which had the highest BaP concentrations, also had the highest proportion of women having lived in urban areas at birth and in 1990–1991. Compared to other BaP trajectories classes, they had the highest physical activity levels, less frequently a history of personal benign breast disease, and were more likely to having breastfed (Table 4). Among the 178 women in BaP-Class 4, only 6 were also in PCB153-Class 5, while 159 were in the large PCB153-Class 2 (Table S2, Supplementary materials).

The association between the classes of BaP trajectories and the risk of breast cancer was by far less clear than for PCB153, whatever the set of adjustment factors (Table 5). Confidence intervals were indeed large, even if they may indicate a tendency to a slight stronger risk for Classes 3 and 4 compared to Class 1 (e.g. OR = 1.21, 95% CI: 0.91; 1.62 for Class 3, Table 5).

Table 5 Association between classes of BaP trajectories and breast cancer, XENAIR case–control study nested in the E3N cohort, France, 1990–2011 (n = 5,058 case–control pairs)

Discussion

Our trajectory-based analysis suggested an association between the risk of breast cancer and the trajectories of the outdoor PCB153 concentrations levels estimated at the residential addresses over a period of up to 22 years prior to diagnosis. More specifically, we found five distinct classes of PCB153 exposure trajectories which mostly declined at different rates and levels over the 1990–2011 period, with higher risk of breast cancer for trajectories with highest concentrations over the whole follow-up compared to the class with the lowest concentrations. For BaP, we found four distinct classes of exposure trajectories that all declined over the 1990–2011 period at different rates, which were weaker than rates of decline for PCB153. The confidence intervals for the association with breast cancer were too large to argue that higher trajectories of BaP were at higher risk, even if a few signals emerged.

To our knowledge, our study is the first study to investigate trajectories of long-term atmospheric pollutant exposure and their association with the risk of breast cancer. Overall, the evidence regarding the association between PCB exposure and breast cancer risk is not entirely conclusive, while some studies reported statistically significant associations between PCB exposure and increased breast cancer risk, conflicting results exist. Our results are consistent with previous studies that found associations between breast cancer and measures of PCBs in blood or adipose tissue samples [28, 37, 44, 52], and in particular in studies examining separately group II PCBs, including PCB153 [31, 53, 56]. Our results are also consistent with those of our previous study using the cumulative exposure estimates of PCB153 based on the same XENAIR case–control data [15]. Nevertheless, several other studies showed no evidence of the effect of exposure to PCB on breast cancer risk [32]. However, our trajectory approach did not confirm results of previous studies supporting an association between breast cancer and BaP exposure from traffic [38, 39], nor the results of our previous study using the cumulative dose of BaP based on XENAIR case–control data [1]. The median pollutant concentrations over the whole follow-up were indeed more similar between the classes of BaP than between the classes of PCB153. Of note, women in the class exposed to highest levels of BaP concentrations over the study period had different characteristics; they had the highest physical activity level, less frequently a history of personal benign breast disease, and were more likely to having breastfed. However, adjustment for these factors in the sensitivity analysis did not change the results.

Our study has three major strengths. The first strength comes from its large number of breast cancer cases and the extensive data collected prospectively in the E3N cohort. The second major strength lies in its comprehensive exposure assessment approach, considering participants' residential histories for a period up to 22 years prior to breast cancer diagnosis, accommodating residential mobility and spatio-temporal variations in pollutant emissions. This approach provides a more accurate representation of long-term exposure patterns and distinguishes this work from other studies that have relied on simpler exposure assessments, such as those that only consider current residential addresses. With the reconstruction of emissions data carried out by the reference institute in France (INERIS) and the use of a recognised chemistry-transport model [48], we are fortunate to have a dataset covering France and going back to 1990, which is unprecedented on a European scale. The third major strength of our study lies in its original trajectory-based analytical approach which has several advantages. This approach provides a comprehensive visual description of all available individual exposure history. It also allows similar exposure trajectory patterns to be grouped together, without making any assumptions about these groupings, and thus can distinguish for instance women who were exposed to high concentrations of PCB153 in the early 1990s only, from those who were exposed to high concentrations until the 2000s. In addition, among group-based approaches, the LCMM technique has the asset to account for correlation between the repeated exposure measures of each woman thanks to individual random effects, which is likely to provide much more robust results than group-based approaches that ignore such correlation [22, 36]. Finally, the grouping allowed us to estimate the association between the profile of exposure trajectory and the risk of breast cancer, while fully accounting for the dynamic and the timing of exposures via the trajectory modeling, as opposed to standard statistical analyses relying on individual average exposure over time, or on individual standard cumulative dose of exposure.

Our results should yet be interpreted in light of certain limitations. The first major limitation is the lack of data on air pollution exposure prior to E3N cohort entry [55]. Because women entered the cohort at the age of 45–60 years with a median at nearly 50, this did not allow us to explore critical windows of susceptibility like puberty, pregnancy, and even menopause, which are yet of major importance for understanding breast cancer etiology and latency [49]. Some of the women had menopause during the study period, but the sample size of most identified classes of exposure trajectories was too small to further distinguish them and draw any conclusion. However, we believe that the trajectory approach used in the present study suffers less from the lack of exposure data prior to cohort entry than an analytical approach based on a standard cumulative dose derived from the cohort entry. Indeed, such a cumulative dose fully depends on the length of follow-up, which varied from 1 to 22 years depending on the time at breast cancer diagnosis. The trajectory approach based on the LCMM accounted for these differences in the duration of follow-up, thanks to the linear mixed modeling approach. The mixed modelling approach also allows all available exposure data to be accounted for, without requiring any lagging or weighting procedure as when investigating the cumulative dose. Moreover, using calendar time as the time axis, which actually corresponded to the time since the first exposure assessment at cohort entry in 1990, allows a natural and interpretable description of exposure trajectories over time, as well as a fair comparison between cases and their individually matched controls. Indeed, exposure level highly depends on calendar year as shown in our results, making calendar time an appropriate time axis when regrouping exposure trajectories into latent classes of similar patterns. In addition, the association between the identified classes of exposures and breast cancer was adjusted for age at the index date and follow-up duration by design, resulting in an adjustment for age at cohort entry in 1990, year of birth, and thus total duration of exposure to air pollution. The association was therefore estimated in cases and controls who were exposed to air pollution at the same ages in the same calendar years over lifetime. Future research could yet much leverage our study by utilizing exposure proxies for past periods for which direct exposure data are unavailable. We also experienced a second limitation due to the absence of data on workplace, indoor, and dietary exposure, which are crucial for a comprehensive exposure assessment. Residential exposures indeed represent only a part of the overall daily exposure. Indoor air pollution also contributes due to diverse indoor sources, and has been shown, via PAHs, to be potentially associated with breast cancer risk [23, 40, 51]. All of these sources should be accounted for in future studies. A third limitation lies in the exposure assessment. While the CHIMERE model is a valuable tool for estimating pollutant concentrations over long periods [12], it has limited spatial resolution (7 × 7 km in the present study), making it less effective in capturing variations in pollutant levels in urban and suburban areas where exposure can vary significantly [13] particularly for BaP for which traffic is an important source of emission [5]. This may have potentially resulted in exposure misclassification. However, even if the spatial resolution is coarse we may assume it as non-differential since all the external emission sources are included and moreover, approximately the same proportion of cases and controls (60%) lived in urban area at cohort entry. Yet, it would be important to confirm our results using more refined exposure assessment approaches. The fourth limitation is related to our analytical strategy. We indeed investigated each pollutant separately, without adjusting for other pollutants, nor considering any potential effect modification by other pollutants. Yet, atmospheric pollution is a complex mixture of substances with likely synergistic effects. Several statistical approaches exist to investigate multi-exposures [46]. However, most of them have not been adapted to longitudinal data. An extension of the trajectory-based approach that we used in the present study could actually be used to explore simultaneously the trajectories of several pollutants [43]. However, such modelling is complex and requires further work. Despite our effort to reduce the variability between individual trajectories within each class, we also observed important variability in terms of levels of exposure within each identified class, which might partly explain why most OR comparing each class of trajectories to the least exposed class did not reach statistical significance. We have to develop a method that would give at least as much importance to the level than to the shape of trajectories. In addition, we used a two-stage analytical approach where we first identified the classes of trajectories, and then estimated their association with the risk of breast cancer. In the second stage, we neglected the uncertainty of the classification obtained in the first stage because the discrimination between the classes was very good (entropy above 0.94). However, this would be of interest to confirm our results using a joint modelling approach allowing the two steps to be performed simultaneously [43], when the method will be implemented for case–control data in the lcmm R package. Our analysis also assumed constant measures of association between the identified classes and the odds of breast cancer over the follow-up period. It would be important to investigate how the association might differ across calendar times, using for example interaction terms between the latent classes and time. The last potential limitation of our study lies in the specificity of the studied population. The E3N cohort included women ensured by a national health insurance plan that mainly covers teachers at all levels (primary, secondary, and higher education). This implies a higher level of education than the general population of women [11], and maybe a tendency towards healthier behaviors including diet. This may have potentially induced an underrepresentation of strongly exposed women and thus an underestimation of the association with breast cancer.

Conclusions

This study provides a comprehensive description of the trajectories of residential outdoor exposure to PCB153 and BaP over a period of up to 22 years prior to breast cancer diagnosis. Our findings are consistent with a significantly increased risk of breast cancer among women who had high level of exposure to PCB153 between 1991 and 2011, compared to women less exposed during that period. There was no association for BaP, but the analysis should be replicated by incorporating all sources of dietary and indoor exposures, as well as a proxy for earlier air pollution exposure. From a methodological perspective, our study illustrates how a trajectory-based approach may be used to describe and investigate protracted environmental exposures varying over time, and also open avenues for new methodological developments.