TOOLBOX
Integration of multiple datasets
This part of the project is dedicated to the collection and integration of multiple datasets with all pre-existing health and environmental data relevant to REMEDIA’s research objectives from the participating cohorts and registries.
COHORT PROFILES
Danish cohorts
The Diet, Cancer and Health (DCH) cohort aimed at investigating the associations between dietary habits, lifestyle, and cancer development. The participants were recruited during 1993-1997 where 160,725 invitations were sent. Potential participants were men and women born in Denmark, living in the greater Copenhagen or Aarhus areas, aged 50-64 years, and with no previous cancer diagnosis…
Pelagie
(Perturbateurs Endocriniens: Étude Longitudinale sur les Anomalies de la Grossesse, l’Infertilité et l’Enfance (Endocrine Disruptors: Longitudinal Study of Pregnancy, Infertility and Childhood Abnormalities), is a mother-child cohort that recruited pregnant women in their first trimester of pregnancy (<19 weeks of gestation) who resided in the Brittany region, France, from 2002 to 2005…
Norwegian cohorts
The Tromsø Study is a population-based health survey for Tromsø municipality inhabitants aged between 20-97. Specific age groups of the population were requested to participate in one or more of seven surveys conducted between 1974 and 2016. The first survey included 6595 men. The remaining six surveys included both men and women with 8130 to 27,158 participants…
VLM
The French Cystic Fibrosis (CF) Patient Registry, established in 1992 and coordinated by Vaincre la Mucoviscidose, served as a core data source within the REMEDIA project. This national longitudinal registry includes over 7,500 individuals—covering approximately 95% of the CF population in France—and provides annually collected, clinical data on disease progression, treatment, and outcomes…
ALSPAC
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a UK-based birth cohort initiated in 1991, comprising over 14,000 children and their families. It provides rich longitudinal data on health, genetics, and environmental exposures, including detailed air pollution metrics—making it a valuable resource for investigating respiratory health across the life course…
Environmental data
The data available for calculating the exposure of the PELAGIE and VLM cohorts (France) are either measurement data from the national air quality monitoring network or reanalysed modelling data based on the CHIMERE chemistry-transport model at fine spatial resolution (3-4 km). As the spatial representativeness of the model is greater than that of the stations, we opted for the modelled data…
RESULTS
Source-specific air pollution and risk of chronic obstructive pulmonary disease: A pooled cohort study
Background: The evidence linking long-term exposure to air pollution and development of chronic obstructive pulmonary disease (COPD) is still controversial. Furthermore, most studies have investigated associations with particulate matter (PM) and nitrogen dioxide (NO2), disregarding their emission source and other relevant air pollutants, such as ultrafine particles (UFP) and elemental carbon (EC).
Objectives: This study aimed to assess associations between long-term residential exposure to PM2.5, NO2, UFP, and EC and risk of COPD, distinguishing the effects of air pollution from local traffic and other sources.
Methods: We pooled data from two large Danish cohorts – the Diet, Cancer, and Health cohort and the Danish National Health Survey. For all participants (N = 159,769), we estimated long-term air pollution exposure to total, local traffic, and other contributions, based on complete address histories. We used Cox proportional hazards models to estimate associations between 10-year time-weighted averaged air pollution and incident COPD, adjusting for demographic, socioeconomic, and lifestyle factors, including smoking. We evaluated possible modification of these associations by sex, smoking status, and previous asthma diagnosis.
Results: Long-term exposures to PM2.5, NO2, UFP, and EC were associated with higher risk of COPD. The highest hazard ratio (HR) per interquartile range of total contributions was observed for PM2.5 (HR: 1.11 [95% confidence interval: 1.05, 1.17]), followed by NO2 (1.08 [1.04, 1.13]), UFP (1.05 [0.99, 1.11]), and EC (1.02 [1.00, 1.05]), after full adjustment. PM2.5 from other sources than local traffic was more strongly associated with COPD than PM2.5 from local traffic, while for UFP and EC, the contributions from local traffic seemed most harmful. Effect modification analyses showed stronger associations among women, never smokers, and those with an asthma diagnosis.
Discussion: Our findings suggest that air pollution from local traffic and other sources contribute to COPD risk, with variations depending on the pollutant type. Further research is needed to validate these findings across different populations and geographical settings.”
Documents
Abstract #1
This study examined associations between prenatal exposure to PM2.5 and NO2 and the development of asthma, rhinitis, eczema, and their multimorbidity in children from the Pélagie cohort. Data from 1,322 children at 6-year and 1,118 at 12-year follow-up were analyzed using health questionnaires and pollutants estimates from a European Land Use Regression model. Prevalence rates were 11% for asthma, 23% for rhinitis, 22% for eczema, and 12% for multimorbidity. Associations were generally not statistically significant, but stronger deleterious effects were observed in urban areas, particularly at 6 years. Significant associations were found for PM2.5 and NO2 with eczema in urban areas. Findings suggest urban-rural differences in pollution effects warranting further investigation.

Abstract #2
This study assessed the effects of prenatal exposure to surrounding residential greenness on childhood asthma, rhinitis, eczema, and their multimorbidity in children of the Pélagie cohort. Data from 1,325 and 1,119 children from- the 6- and 12-year follow-up were analyzed using Normalized Difference Vegetation Index (NDVI) within 300m of maternal residence. Asthma, rhinitis, and eczema prevalence were respectively around 10%, 20%, and 20% at both follow-ups. Overall, no significant associations were found, except for a potential protective effect for eczema and the single-disease category of the multimorbidity at age 6. Most mothers lived in rural areas, where NDVI was higher than in urban settings. Stratified analysis suggested a potential protective role of greenness in urban areas. Overall, results were inconclusive, highlighting the need for further research.

Abstract #3
Despite the declining in air pollution levels, WHO NO2 thresholds are still frequently exceeded. This study assessed the associations between prenatal NO2 exceedances and childhood asthma, rhinitis, and eczema in 1,299 children from the 6-year follow-up of the Pélagie cohort. Exposure was estimated using the number of exceedance days and the cumulative exceedance during pregnancy. Eczema risk increased with both measures, especially in urban areas, but no associations were found for asthma or rhinitis. Rural exposures showed no significant effects. Results support stricter compliance with WHO air quality guidelines.
Documents
Uncovering Cystic Fibrosis Patient Profiles and Exposome Associations through Unsupervised Multidimensional Phenotyping
Background: Cystic fibrosis (CF) is a genetic disorder that affects the respiratory and digestive systems. CF patients exhibit considerable variation in their symptoms and disease progression, suggesting complex genotype-phenotype relationships that may involve environmental factors. This study aimed to use unsupervised clustering analyses to identify distinct lung function trajectories of CF patients, while also assessing their associations with various environmental factors.
Methods: Data from the French CF Registry, which covers 90% of CF patients in France and provides comprehensive health information for monitoring and research purposes, were utilized. By employing dimensionality reduction and clustering trajectory analyses (Functional Principal Components Analysis and hierchical agglomerative clustering) based on longitudinal lung function tests, patients were grouped and then characterized based on their clinical characteristics.
Results: Our findings revealed the existence of 4 different subgroups among CF children patients, characterized by significant differences in overall health status, decline in lung function, comorbidities, incidence of infections, and exposure to environmental factors like passive smoking. Additionally, associations were found between the worst cluster and air pollution at the geographic level of French departments.
Conclusion: Applying clustering techniques to large medical datasets reveals valuable insights into the impact of the environment on the physiological and pathological processes of CF. By uncovering distinct patient profiles, this approach can optimize treatment strategies and improve patient outcomes.”

Exposure calculation
In order to best characterise the exposure of individuals in the cohorts (when the home address is available), it is important to determine the typology of the living environment, distinguishing between urban, suburban, rural, etc. environments. To do this, we have developed a classification method with a spatial resolution of 1km, based on the criteria used to determine the nature of a measuring station (urban background, urban traffic, rural background, etc.) at national level (LCSQA). This method is based on the cross-referencing of several parameters: number of inhabitants, typology of urban areas and land use. This enables us to propose a classification into 5 categories: urban, peri-urban, rural, airports and ports, which will then be used to calculate exposure.
Depending on proximity to sources, exposure to pollutants can vary greatly. In the city, for example, it will be higher in a street with a lot of traffic than in a park. However, the CTM is unable to reproduce the urban gradients observed, particularly near road traffic. In response to this shortcoming in the model, we have worked on a statistical modulation of the signal (based on observation data) that corrects the modelled data in order to better reproduce urban road environments.
Application to the VLM cohort
The individual data provided for the VLM cohort are at departmental level. To meet this constraint, we therefore had to aggregate the concentration fields provided by CHIMERE (which have a resolution of between 3 and 5 km) at the scale of the departments. We thus obtained daily average concentrations for each department over the periods covered by the modelling data. Modelling data are available for the 4 regulated pollutants: NO2, O3, PM10 and PM25, and cover different time periods. For NO2 and O3, the data is available between 2000-2019, for PM10, the data covers the period 2007-2019 and for PM25 the data is available over the period 2009-2019.
Application to the PELAGIE cohort
For the PELAGIE cohort, we have the location of individuals’ homes throughout their follow-up period (so we can have several addresses), but we also have information on the dates of follow-up with physicians. Concentrations for this cohort can therefore be assigned in a more detailed way, allowing us to apply concentrations taking into account the type of living environment (urban, rural) and proximity to roads. For each individual in the cohort we therefore determined the type of home environment, then depending on its location and proximity to a major road, we assigned a daily concentration according to several criteria as described in the figure below. We then calculated the indicators for each life period of the individuals in the cohort (pregnancy, childhood, etc.).

Patterns of exposure through statistical indicators
The first step was to propose exposure indicators that could best discriminate between individuals, in order to assess the impact of exposure on health trajectories and to reproduce heavy or chronic exposure. We therefore selected the 90 percentile (P90) of daily values (in µg/m3), the number of days over the year when the WHO2021 threshold was exceeded, and the cumulative difference between the daily value and the WHO threshold (when positive) (in µg/m3).
These indicators can be calculated at different spatial scales, depending on the resolution of the air quality data available. Thus, for the VLM cohort, the indicators were calculated at departmental level, while for the PELAGIE data they were calculated at the level of the individual’s home address.