September 10.1101/2023.06.08.23291168 The Genetic Architecture of Multimodal Human Brain Age Junhao Wen junhaowe@usc.edu 7 Bingxin Zhao 2 Zhijian Yang 0 Guray Erus 0 Ioanna Skampardoni 0 Elizabeth Mamourian 0 Yuhan Cui 0 Gyujoon Hwang 0 Jingxuan Bao 4 Aleix Boquet-Pujadas 1 Zhen Zhou 0 Yogasudha Veturi 3 Marylyn D. Ritchie 5 Haochang Shou 0 Paul M. Thompson 6 Li Shen 4 Arthur W. Toga 8 Christos Davatzikos 0 Artificial Intelligence in Biomedical Imaging Laboratory (AIBIL), Center for AI and Data Science for Integrated Diagnostics (AI Biomedical Imaging Group, EPFL , Lausanne , Switzerland D), Perelman School of Medicine, University of Pennsylvania , Philadelphia , USA Department of Biobehavioral Health and Statistics, Penn State University , University Park, PA , USA Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine , Philadelphia , USA Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania , Philadelphia, PA , USA Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California , Marina del Rey, California , USA Laboratory of AI and Biomedical Science (LABS), Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California , Los Angeles, California , USA Laboratory of Neuro Imaging (LONI), Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California , Los Angeles, California , USA 2023 30 2023 16 57 -

30 31 32 33 36 37 38 39 40 42 43 45 46 47 48 41 untranslated regions; oligodendrocytes and astrocytes, but not neurons, showed significant heritability enrichment in WM and FC-BAG, respectively. Mendelian randomization identified potential causal effects of several exposure variables on brain aging, such as type 2 diabeteosn 44

GM-BAG (odds ratio=1.05 [1.01, 1.09], P-value=1.96x10 -2) and AD on WM-BAG (odds The complex biological mechanisms underlying human brain aging remain incompletely understood, involving multiple body organs and chronic diseases. In this study, we used multimodal magnetic resonance imaging and artificial intelligence to examine the genetic 34

architecture of the brain age gap (BAG) derived from gray matter volume (GM-BAG,N=31,557 35

European ancestry), white matter microstructure (WM-BAG, N=31,674), and functional connectivity (FC-BAG, N=32,017). We identified sixteen genomic loci that reached genomewide significance (P-value<5x10-8). A gene-drug-disease network highlighted genes linked to GM-BAG for treating neurodegenerative and neuropsychiatric disorders and WM-BAG genes for cancer therapy. GM-BAG showed the highest heritability enrichment for genetic variants in conserved regions, whereas WM-BAG exhibited the highest heritability enrichment in the 5' ratio=1.04 [1.02, 1.05], P-value=7.18x10 -5). Overall, our results provide valuable insights into the genetics of human brain aging, with clinical implications for potential lifestylenad therapeutic interventions. All results are publicly available at the MEDICINE knowledge portal: https://labs.loni.usc.edu/medicine.

Main

The advent of artificial intelligence (AI) has provided novel approaches to investigate various aspects of human brain health1,2, such as normal brain aging3, neurodegenerative disorders such as Alzheimer's disease (AD) 4, and brain cancer5. Based on magnetic resonance imaging (MRI), AI-derived measures of the human brain age638 have emerged as a valuable biomarker for evaluating brain health. More precisely, the difference between an individual's AI-perdicted brain age and chronological age 3 brain age gap (BAG) 3 provides a means of quantifying an individual's brain health by measuring deviation from the normative aging trajectory. BAG has demonstrated sensitivity to several common brain diseases, clinical variables, and cognitive functions9, presenting the promising potential for its use in the general population to capture relevant pathological processes.

Brain imaging genomics10, an emerging scientific field advanced by both computational statistics and AI, uses imaging-derived phenotypes (IDP11) from MRI and genetics to offer mechanistic insights into healthy and pathological aging of the human brain. Recent large-scale genome-wide association studies (GWAS) 11318 have identified a diverse set of genomic loci 64 linked to gray matter (GM)-IDP from T1-weighted MRI, white matter (WM)-IDP from diffusion MRI [fractional anisotropy (FA), mean diffusivity (MD), neurite density index (NDI), and orientation dispersion index (ODI)], and functional connectivity (FC)-IDP from functional MRI. While previous GWAS19 have associated BAG with common genetic variants [e.g., single nucleotide polymorphism (SNP)], they primarily focused on GM-BAG9,20322 or did not comprehensively capture the genetic architecture of the multimodal BAG19 via post-GWAS analyses in order to biologically validate the GWAS signals. It is crucial to holistically identify the genetic factors associated with multimodal BAGs (GM, WM, and FC-BAG), where each

BAG reflects distinct and/or similar neurobiological facets of human brain aging. Furthermore, dissecting the genetic architecture of human brain aging may determine the causal implications, 74 which is essential for developing gene-inspired therapeutic interventions. Finally, numerous risk or protective lifestyle factors and neurobiological processes may also exert independent, synergistic, antagonistic, sequential, or differential influences on human brain health. Therefore, 72 73 75 76 78 79 80 81 82 83 85 89 90 91 92 94 and diseases on the three BAGs. 77 a holistic investigation of multimodal BAGs is urgent to fully capture the genetics of human brain aging, including the genetic correlation, gene-drug disease network, and potential causality. In this study, we postulate that AI-derived GM, WM, and FC-BAG can serve as robust, complementary endophenotypes23 3 close to the underlying etiology 3 for precise quantification of human brain health.

The present study sought to uncover the genetic architecture of multimodal BAG and explore the causal relationships between protective/risk factors and decelerated/accelerated brain 84 age. To accomplish this, we analyzed multimodal brain MRI scans from 42,089 participants from the UK Biobank (UKBB) study 24 and used 119 GM-IDP, 48 FA WM-IDP, and 210 FC-IDP to 86

derive GM, WM, and FC-BAG, respectively. Refer toMethod 1 for selecting the final feature 87 sets for each BAG. We first compared the age prediction performance of different machine 88 learning models using these IDPs. We then performed GWAS to identify genomic loci associated with GM, WM, and FC-BAG in the European ancestry population. In post-GWAS analyses, we constructed a gene-drug-disease network, estimated the genetic correlation with several brain disorders, assessed their heritability enrichment in various functional categories or specific cell types, and calculated the polygenic risk scores (PRS) of the three BAGs. Finalyl, we 93 performed Mendelian Randomization (MR)25 to infer the causal effects of several clinical traits

Results 95 97 98 99 100 101 102 103 105 106 107 108 109 110 111 115 116 117 96 In the first section, we objectively compared the age prediction performance of four machine learning methods using these GM, WM, and FC-IDPs (Fig. 1A). To this end, we employed a nested cross-validation (CV) procedure in the training/validation/test dataset ( N=4000); an independent test dataset (N=38,089) 26,27 was held out 3 unseen until we finalized the models using only the training/validation/test dataset (Method 1). The four machine learning models included support vector regression (SVR), LASSO regression, multilayer perceptron (MLP), and a five-layer neural network (i.e., three linear layers and one rectified linear unit layer; hereafter, NN) 28 (Method 3). The second section focused on the main GWASs using the European 104 ancestry population (31,557<N<32,017) and their sensitivity checks in six scenarios ( Method 4A). In the last section, we validated the GWAS findings in several post-GWAS analyses, including genetic correlation, gene-drug-disease network, partitioned heritability, PRS calculation, and Mendelian randomization (Method 4).

GM, WM, and FC-BAG derived from three MRI modalities Several findings were observed based on the results from the independent test dataset N(=38,089, Method 1). First, GM-IDP (4.39<mean absolute error (MAE)<5.35; 0.64<r<0.66), WM-IDP 112

(4.92<MAE<7.95; 0.42<r<0.65), and FC-IDP (5.48<MAE<6.05; 0.43 <r<0.46) achieved 113

gradually a higher MAE and smaller Pearson's correlation (r) ( Fig. 1B, C, and D). Second, 114

LASSO regression obtained the lowest MAE for GM, WM, and FC-IDP; linear models obtained a lower MAE than non-linear networks (Fig. 1B). Third, all models generalized well from the training/validation/test dataset (N=4000, Method 1) to the independent test dataset. However, simultaneously incorporating WM-IDP from FA, MD, NDI, and ODI resulted in severely 122 123 125 126 127 128 129 130 131 132 133 135 136 137 138 139 118

overfitting models (Supplementary eTable 1A). The observed overfitting may be attributed to 119

many parameters (N=38,364) in the network or strong correlations among the diffusion metrics 120

(i.e., FA, MD, ODI, and NDI). Fourth, the experiments stratified by sex did not exhibit 121 substantial differences, except for a stronger overfitting tendency observed in females compared to males using WM-IDP incorporating the four diffusion metricsS(upplementary eTable 1B).

Detailed results of the CV procedure, including the training, validation, test performance, and 124 sex-stratified experiments, are presented inSupplementary eTable 1. In all subsequent genetic analyses, we reported the results using BAG derived from the three LASSO models with the lowest MAE in each modality (Fig. 1A), with the "age bias" corrected as in De Lange et al. 29.

In the literature, other studies30333 have thoroughly evaluated age prediction performance using different machine learning models and input featuresM.ore et al.34 systematically compared the performance of age prediction of 128 workflows (MAE between 5.2338.98 years) and showed that voxel-wise feature representation (MAE approximates 5-6 years) outperformed parcel-based features (MAE approximates 6-9 years) using conventional machine learning algorithms (e.g., LASSO regression). Using deep neural networks, Peng et al.30 and Leonardsen et al.31 reported a lower MAE (nearly 2.5 years) with voxel-wise imaging scans. However, we 134 previously showed that a moderately fitting convolutional neural network (CNN) obtained significantly higher differentiation (a larger effect size) than a tightly fitting CN(Na lower MAE) between the disease and health groups35. To summarize, our study's brain age prediction performance aligns with those reported in the existing literature, considering the utilization of low-dimensional hand-crafted IDPs and conventional machine learning algorithms34.

Finally, we calculated the phenotypic correlation (pc) between GM, WM, and FC-BAG 140

using Pearson's correlation coefficient. GM-BAG and WM-BAG showed the highest positive 141 142 143 correlation (pc=0.38; P-value<1x10 -10; N=30,733); GM-BAG ( pc=0.09; P-value<1x10 -10; N=30,660) and WM-BAG ( pc=0.10; P-value<1x10 -10; N=31,574) showed weak correlations with 147 148 A) Multimodal brain MRI data were used to derive imaging-derived phenotypes (IDP) for T1149 weighted MRI (119 GM-IDP), diffusion MRI (48 WM-IDP), and resting-state functional MRI 150 (210 FC-IDP). IDPs for each modality are shown here using different colors based on predefined 151 brain atlases or ICA for FC-IDP.B) Linear models achieved lower mean absolute errors (MAE) 152 than non-linear models using support vector regression (SVR), LASSO regression, multilayer 153 perceptron (MLP), and a five-layer neural network (NN). The MAE for the independent test 154 dataset is presented, and the # symbol indicates the model with the lowest MAE for each 155 modality. Error bars represent standard deviation (SD). C) Pearson's correlation ( r) between the 156 predicted brain age and chronological age is computed, and statistical significance (P157 value<0.05) - after adjustment for multiple comparisons using the FDR method - is denoted by 158 the * symbol. Error bars represent the 95% confidence interval (CI). D) Scatter plot for the 159 predicted brain age and chronological age. E) Phenotypic correlation ( pc) between the GM, WM, 160 and FC-BAG using Pearson's correlation coefficient r(). 161

b=1.0039 ±0.006). All LDSC intercepts were close to 1, indicating no substantial genomic 170 inflation. The individual Manhattan and QQ plots of the three GWASs are presented in Supplementary eFigure 3 and are also publicly available at the MEDICINE knowledge portal: https://labs.loni.usc.edu/medicine. The three BAGs were significantly heritable (P-value<1x10173

10) after adjusting for multiple comparisons using the Bonferroni method using the genome-wide 174

complex trait analysis (GCTA) software37. GM-BAG showed the highest SNP-based heritability 175 (h2=0.47±0.02), followed by WM-BAG ( h2=0.46±0.02) and FC-BAG ( h2=0.11 ±0.02). GM, WM, and FC-BAG are associated with sixteen genomic loci

In the European ancestry populations, GWAS (Method 4A) revealed 6, 9, and 1 genomic loci 164 linked to GM (N=31,557), WM ( N=31,674), and FC-BAG N(=32,017), respectively ( Fig. 2A). The top lead SNP and mapped genes of each locus are presented inSupplementary eTable 2. We also calculated the genomic inflation factor l() and the linkage disequilibrium score regression (LDSC) intercept ( b) 36 to scrutinize the robustness of the GWAS of GM-BAG (l=1.118; b=1.0016 ±0.0078), WM-BAG ( l=1.124;b=1.0187 ±0.0073), and FC-BAG ( l=1.046;

We performed a query in the GWAS Catalog38 for these genetic variants within each locus to understand the phenome-wide association of these identified loci in previous literature (Method 4C). Notably, the SNPs within each locus were linked to other traits previously reported in the literature (Supplementary eFile 1). Specifically, the GM-BAG loci were uniquely associated with neuropsychiatric disorders such as major depressive disorder (MDD), heart disease, and cardiovascular disease. We also observed associations between these loci and other diseases (including anemia), as well as biomarkers from various human organs (e.g., liver) (Fig. 2B). We then performed positional and functional annotations to map SNPs to genes associated with GM, WM, and FC-BAG loci (Method 4B). Fig. 2C-E showcased the regional 185

Manhattan plot of one genomic locus linked to GM, WM, and FC-BAG. A detailed discussion of these exemplary loci, SNPs, and genes is presented in Supplementary eText 1.

Finally, we calculated the genetic correlation (gc) between the GM, WM, and FC-BAG using the LDSC software. GM-BAG and WM-BAG showed the highest positive correlation (gc=0.49; P-value<1x10-10); GM-BAG ( gc=0.20; P-value=0.025) and WM-BAG ( gc=0.29; Pvalue=0.005) showed weak correlations with FC-BAG ( Fig. 2F). The genetic correlations largely mirror the phenotypic correlations, supporting the long-standing Cheverud's Conjecture39. We also verified that these genetic correlations exhibited consistency between the two random splits (split1 and spit2: 15,778< N<16,008), sharing a similar age and sex distribution ( Supplementary 194

eFigure 2). 186 187 188 189 190 191 192 193 195 196 198 199

200 202 203 204 205 206 4A).

Sensitivity analyses for the genome-wide associations 197

We aimed to check the robustness of the main GWASs using the full sample sizes of the European populations (Fig. 2A). To this end, we performed six sensitivity analyses (Method

Applying the Bonferroni method to correct for multiple comparisons, we noted high 201 concordance rates between the split1 (as discovery, 15,778< N<16,008) and split2 (as replication, 15,778< N<16,008) GWASs. Specifically, for GM-BAG, we observed a concordance rate of 99% [P-value<0.05/3092; 3092 significant SNPs passing the genome-wide P-value threshold (<5x10 8) in the discovery data], and for WM-BAG, the concordance rate reached 100% (Pvalue<0.05/116). FC-BAG did not achieve significant genome-wide results in the spit-sample GWASs (Supplementary eFigure 3 and Supplementary eFile 2). 207 208 209 210 211 212 215 216 217 218 219 220

221 222 223 224 225 227 228

In sex-stratified GWASs, the concordance rates were 100% (P-value<0.05/3072) for GMBAG and 88.6% (P-value<0.05/116) for WM-BAG when comparing the male-GWAS (as replication, 14,969<N<15,127) to female-GWAS (as discovery, 16,588< N<16,890). FC-BAG did not achieve significant genome-wide results (Supplementary eFigure 4 and Supplementary eFile 3).

The concordance rates of the GWASs using non-European ancestry populations (as 213

replication, 4646N<<5091) were low compared to the main GWASs using the European 214 population: only 13.78% for GM-BAG and 41.94% for WM-BAG (P-value<0.05) (Supplementary eFigure 5 and Supplementary eFile 4).

A mixed linear model employed via fastGWA40 (as replication, 31,557<N<32,017) obtained 100% concordance rates for GM, WM, and FC-BAG compared to GWAS using PLINK linear regression (Supplementary eFile 5). The genetic loci, genomic inflation factor (l), and the LDSC intercepts for GM, WM, and FC-BAG were similar between the PLINK and fastGWA analyses (Supplementary eFigure 6).

We found a 100% concordance rate of the SNPs identified for the GM-BAG GWAS using LASSO regression (as discovery, BAG MAE=4.39 years) and SVR (P-value < 0.05/3382, as replication, BAG MAE=4.43 yearsS)(upplementary eFigure 7 and Supplementary eFile 6). The BAGs derived from the two machine larning models were highly correlated (r=0.99; Pvalue<1x10 -10). 226

We finally found a 92.43% concordance rate of the SNPs identified in the GM-BAG GWAS using the 119 MUSE ROIs 41 (as discovery, BAG MAE=4.39 years) and voxel-wide RAVENS42 maps (as replication, P-value < 0.05/3382, BAG MAE=5.12 years) ( Supplementary 229 eFigure 8 and Supplementary eFile 7). The BAGs derived from the two types of features were 230

significantly correlated (r=0.74; P-value<1x10-10). The brain age prediction performance using 231 RAVENS showed marginal overfitting, with an MAE of 4.31 years in the training/validation/test 232 233 235 dataset and an MAE of 5.12 years in the independent test dataset.

These findings suggest that our GWASs were robust across sex, random splits, imaging 234 features, GWAS methods, and machine learning methods within European populations; however, their generalizability to non-European populations is limited. All subsequent postGWAS analyses were conducted using the main GWAS results of European ancestry.

A) Genome-wide associations identified sixteen genomic loci associated with GM (6), WM (9), and FC-BAG (1) using a genome-wide P-value threshold [3log 10(P-value) > 7.30]. The top lead 241 SNP and the cytogenetic region number represent each locus. B) Phenome-wide association 242 query from GWAS Catalog38. Independent significant SNPs inside each locus were largely 243 associated with many traits. We further classified these traits into several trati categories, 244 including biomarkers from multiple body organs (e.g., heart and liver), neurological disorders 245 (e.g., Alzheimer's disease and Parkinson's disease), and lifestyle risk factors (e.g., alcohol 246 consumption). C) Regional plot for a genomic locus associated with GM-BAG. Color-coded 247 SNPs are decided based on their highest r2 to one of the nearby independent significant SNPs. 248 Gray-colored SNPs are below the r2 threshold. The top lead SNP, lead SNPs, and independent 249 significant SNPs are denoted as dark purple, purple, and red, respectively. Mapped, orange250 colored genes of the genomic locus are annotated by positional, eQTL, and chromatin interaction 251 mapping (Method 4B). D) Regional plot for a genomic locus associated with WM-BAG. E) The 252 novel genomic locus associated with FC-BAG did not map to any genes. We used the Genome 253 Reference Consortium Human Build 37 (GRCh37) in all genetic analyses. F) Genetic correlation 254 (gc) between the GM, WM, and FC-BAG using the LDSC software. Abbreviation: AD: 255 Alzheimer's disease; ASD: autism spectrum disorder; PD: Parkinson's disease; ADHD: attention256 deficit/hyperactivity disorder. 257

The gene-drug-disease network highlights disease-specific drugs that bind to genes associated with GM and WM-BAG We investigated the potential "druggable genes43" from the mapped genes by constructing a gene-drug-disease network (Method 4F). The network connects genes with drugs (or drug-like molecules) targeting specific diseases currently active at any stage of clinical trails.

We revealed clinically relevant associations for 4 and 6 mapped genes associated with 264

GM-BAG and WM-BAG, respectively. The GM-BAG genes were linked to clinical trials for 258 259 260 261 262 263 265 266 267 268 269 270 271 treating heart, neurodegenerative, neuropsychiatric, and respiratory diseases. On the other hand, the WM-BAG genes were primarily targeted for various cancer treatments and cardiovascular diseases (Fig. 3). To illustrate, for the GM-BAG MAPT gene, several drugs or drug-like molecules are currently being evaluated for treating AD. Semorinemab (RG6100), an anti-tau IgG4 antibody, was being investigated in a phase-2 clinical trial (trial number: NCT03828747), which targets extracellular tau in AD, to reduce microglial activation and inflammatory responses44. Another drug is the LMTM (TRx0237) - a second-generation tau protein aggregation inhibitor currently being tested in a phase-3 clinical trial (trial number:

NCT03446001) for treating AD and frontotemporal dement4i5a. Regarding WM-BAG genes, 274

they primarily bind with drugs for treating cancer and cardiovascular diseases. For instance, the 275

PDIA3 gene, associated with the folding and oxidation of proteins, has been targeted for 272 273 276 277

278 developing several zinc-related FDA-approved drugs for treating cardiovascular diseases. Another example is the MAP1A gene, which encodes microtubule-associated protein 1A. This gene is linked to the development of estramustine, an FDA-approved drug for prostate cancer (Fig. 3). Detailed results are presented in Supplementary eFile 8. 293 The gene-drug-disease network derived from the mapped genes revealed a broad spectrum of targeted diseases and cancer, including brain cancer, cardiovascular system diseases, Alzheimer's disease, and obstructive airway disease, among others. The thickness of the lines represented the P-values (-log10) from the brain tissue-specific gene set enrichment analyses using the GTEx v8 dataset. We highlight several drugs under the blue-colored and bold text. Abbreviation: ATC: Anatomical Therapeutic Chemical; ICD: International Classification of Diseases.

Multimodal BAG is genetically correlated with AI-derived subtypes of brain diseases We calculated the genetic correlation using the GWAS summary statistics from 16 clinical traits to examine genetic covariance between multimodal BAG and other clinical traits. The selection procedure and quality check of the GWAS summary statistics are detailed inMethod 4D. These traits encompassed common brain diseases and their AI-derived disease subtypes, as well as 294

education and intelligence (Fig. 4A and Supplementary eTable 3). The AI-generated disease 295 296 297 298 299 300 301

302 303 304 305 306 307 308 309 310 311 312 313 315 316 317 subtypes were established in our previous studies utilizing semi-supervised clustering methods46 and IDP from brain MRI scans.

Our analysis revealed significant genetic correlations between GM-BAG and AI-derived subtypes of AD (AD14), autism spectrum disorder (ASD) (ASD1 and ASD3 47), schizophrenia (SCZ148), and obsessive-compulsive disorder (OCD) 49; WM-BAG and AD1, ASD1, SCZ1, and SCZ2; and FC-BAG and education50 and SCZ1. Detailed results forrg estimates are presented in Supplementary eTable 4. These subtypes, in essence, capture more homogeneous disease effects than the conventional "unitary" disease diagnosis, hence serving as robust endophenotypes23.

Multimodal BAG shows specific enrichment of heritability in different functional categories and cell types We conducted a partitioned heritability analysis51 to investigate the heritability enrichment of genetic variants related to multimodal BAG in the 53 functional categories (Method 4E). Our results revealed that GM and WM-BAG exhibited significant heritability enrichment across numerous annotated functional categories. Specifically, some categories displayed greater enrichment than others, and we have outlined some in further detail.

For GM-BAG, the regions conserved across mammals, as indicated by the label "conserved" in Fig. 4B, displayed the most notable enrichment of heritability: approximately 314 2.61% of SNPs were found to explain 0.43±0.07 of SNP heritability (P-value=5.80x10-8). Additionally, transcription start site (TSS) 52 regions employed 1.82% of SNPs to explain 0.16±0.05 of SNP heritability (P-value=8.05x10 -3). TSS initiates the transcription at the 5' end of a gene and is typically embedded within a core promoter crucial to the transcription machinery53.

The heritability enrichment of Histone H3 at lysine 4, as denoted for "H3K4me3_peaks"Fiing. 4B, and histone H3 at lysine 9 (H3K9ac) 54 were also found to be large and were known to highlight active gene promoters55. For WM-BAG, 5' untranslated regions (UTR) used 0.54% of SNPs to explain 0.09±0.03 of SNP heritability (P-value=4.24x10-3). The 5' UTR is a crucial 318 319 320 321 324 325 326 327 328 329 330 331 332 333 335 336 337 338 339 322

region of a messenger RNA located upstream of the initiation codon. It is pivotal in regulating 323 transcript translation, with varying mechanisms in viruses, prokaryotes, and eukaryotes.

Additionally, we examined the heritability enrichment of multimodal BAG in three different cell types F(ig. 4C). WM-BAG (P-value=1.69x10 -3) exhibited significant heritability enrichment in oligodendrocytes, one type of neuroglial cells. FC-BAG (P-value=1.12x10-2) showed such enrichment in astrocytes, the most prevalent glial cells in the brain. GM-BAG showed no enrichment in any of these cells. Our findings are consistent with understanding the molecular and biological characteristics of GM and WM. Oligodendrocytes are primarily responsible for forming the lipid-rich myelin structure, whereas astrocytes play a crucial role in various cerebral functions, such as brain development and homeostasis. Convincingly, a prior GWAS14 on WM-IDP also identified considerable heritability enrichment in glial cells, especially oligodendrocytes. Detailed results for the 53 functional categories and cell-specific 334 analyses are presented in Supplementary eTable 5.

Prediction ability of the polygenic risk score of the multimodal BAG We derived the PRS for GM, WM, and FC-BAG using the conventional C+T (clumping plus Pvalue threshold) approach 56 via PLINK and a Bayesian method via PRS-CS57 (Method 4H).

We found that the GM, WM, and FC-BAG-PRS derived from PRS-CS significantly 340 predicted the phenotypic BAGs in the test data (split2 GWAS, 15,697< N<15,940), with an 341

incremental R2 of 2.17%, 1.85%, and 0.19%, respectively ( Fig. 4D). Compared to the PRS 342 derived from PRS-CS, the PLINK approach achieved a lower incrementalR2 of 0.81%, 0.45%, 343

and 0.14% for GM, WM, and FC-BAG, respectivelyS(upplementary eFigure 9). Overall, the 344 predictive capacity of PRS is moderate, in line with earlier discoveries involving raw imaging345 derived phenotypes, as demonstrated in Zhao et al.13, where PRSs developed for seven selective 346

brain regions were able to explain roughly 1.18% to 3.93% of the phenotypic variance associated with these traits. 348 349 350 351 A) Genetic correlation ( gc) between GM, WM, and FC-BAG and 16 clinical traits. These traits 352 include neurodegenerative diseases (e.g., AD) and their AI-derived subtypes (e.g., AD1 and 353 AD24), neuropsychiatric disorders (e.g., ASD) and their subtypes (ASD1, 2, and 3 47), 354 intelligence, and education. B) The proportion of heritability enrichment for the 53 functional 355 categories51. We only show the functional categories that survived the correction for multiple 356 comparisons using the FDR method. C) Cell type-specific partitioned heritability estimates. We 357 included gene sets from Cahoy et al.58 for three main cell types (i.e., astrocyte, neuron, and 358 oligodendrocyte). After adjusting for multiple comparisons using the FDR method, the * symbol 359 denotes statistical significance (P-value<0.05). Error bars represent the standard error of the 360 estimated parameters. D) The incremental R2 of the PRS derived by PRC-CS to predict the GM, 361 WM, and FC-BAG in the target/test data (i.e., the split2 GWAS). The y-axis indicates the 362 proportions of phenotypic variation (GM, WM, and FC-BAG) that the PRS can significantly and 363 additionally explain. The x-axis lists the seven P-value thresholds considered. Abbreviation: AD: 364 Alzheimer's disease; ADHD: attention-deficit/hyperactivity disorder; ASD:autism spectrum 365 disorder; BIP: bipolar disorder; MDD: major depressive disorder; OCD: obsessive-compulsive 366 disorder; SCZ: schizophrenia; CAD: coronary artery disease; CD: Crohn's disease; BMD: bone genomic loci, we first defined lead SNPs (correlationr2 f 0.1, distance < 250 kilobases) and

To check the robustness of our GWAS results using European ancestry, we performed six sensitivity checks, including i) split-sample GWAS by randomly dividing the entire population into two sex and age-matched splits, ii) sex-stratified GWAS for males and femalesi,ii) nonEuropean GWAS, iv) fastGWA40 for a mixed linear model that accounts for cryptic population stratification,v) machine learning-specific GWAS, and vi) feature type-specific GWAS. (B): Annotation of genomic loci and genes: The annotation of genomic loci and mapped genes was performed via FUMA101 (https://fuma.ctglab.nl/, version: v1.5.0). For the annotation of assigned them to a genomic locus (non-overlapping); the lead SNP with the lowe st P-value (i.e., the top lead SNP) was used to represent the genomic locus. For gene mappings , three different strategies were considered. First, positional mapping assigns the SNP to its physically nearby genes (a 10)kb window by default). Second, eQTL mapping annotates SNPs to genes ba sed on eQTL associations. Finally, chromatin interaction mapping annotates SNPs to genes when there is a significant chromatin interaction between the disease-associated regions and nearby or distant genes.101 The definition of top lead SNP, lead SNP, independent significant SNP, andcandidate SNP can be found inSupplementary eMethod 1.

672 673 674 675 676 (C): Phenome-wide association query for genomic loci associated with other traits in the 691

literature: We queried the significant independent SNPs within each locus in the GWAS 692

Catalog (query date: January 10th, 2023 via FUMA version: v1.5.0) to determine their previously identified associations with other traits. For these associated traits, we furhter mapped them into several high-level categories for visualization purposes (Fig. 2B). (E): Partitioned heritability estimate: Partitioned heritability analysis estimates the percentage of heritability enrichment explained by annotated genome regions51. First, the partitioned 714

heritability was calculated for 53 main functional categories. The 53 functional categories arenot 715 specific to any cell type, including coding, UTR, promoter, and intronic regions. Details of the (D): Genetic correlation: We used LDSC36 to estimate the pairwise genetic correlation (rg) between GM, WM, and FC-BAG and several pre-selected traits (Supplementary eTable 3) by using the precomputed LD scores from the 1000 Genomes of European ancestry. The following pre-selected traits were included: Alzheimer's disease (AD), autism spectrum di sorder (ASD), attention-deficit/hyperactivity disorder (ADHD), obsessive-compulsive disorder (OCD), major depressive disorder (MDD), bipolar disorder (BIP), schizophrenia (SCZ), education and intelligence, as well as the AI-derived subtypes for AD (AD1 and AD2102), ASD (ASD1, ASD2, and ASD347), and SCZ (SCZ1 and SCZ2 103) 3 serving as more robust endophenotypes than the 704 disease diagnoses themselves. To ensure the suitability of the GWAS summary statistics, we first checked that the selected study's population was European ancestry; we then guaranteed a moderate SNP-based heritability h2 estimate and excluded the studies with spurious low h2 (<0.05). Notably, LDSC corrects for sample overlap and provides an unbiased estimate of genetic correlation104. The h2 estimate from LDSC is, in general, lower than that of GCTA because LDSC uses GWAS summary statistics and pre-computed LD information and has slightly different model assumptions across different softwa1r0e5. 53 categories are described elsewhere51 and are also presented in Supplementary eTable 5A. Subsequently, cell type-specific partitioned heritability was estimated using gene sets from

Cahoy et al.58 for three main cell types (i.e., astrocyte, neuron, and oligodendrocyte) (Supplementary eTable 5B). 721 (F): Gene-drug-disease network construction: We curated data from the Drug Bank database (v.5.1.9) 106 and the Therapeutic Target Database (updated by September 2 9 th, 2021 ) to construct a gene-drug-disease network. Specifically, we constrained the target to human organisms and included all drugs with active statuses (e.g., patented and approved) but excluded inactive ones (e.g., terminated or discontinued at any phase). To represent the disease, we mapped the identified drugs to the Anatomical Therapeutic Chemical (ATC) classification system for the 727

Drugbank database and the International Classification of Diseases (ICD-11) for the Therapeutic Target Database. (G): Two-sample Mendelian Randomization: We investigated whether the clinical traits previously associated with our genomic loci (Fig. 2B) were a cause or a consequence of GM, WM, and FC-BAG using a bidirectional, two-sample MR approach. GM, WM, and FC-BAG are 733

the outcome/exposure variables in the forward/inverse MR, respectively. We applied five 734 different MR methods using the TwoSampleMR R package59, including the inverse variance weighted (IVW), MR Egger 107, weighted median108, simple mode, and weighted mode methods.

We reported the results of IVW in the main text and the four others in theSupplementary eFile 737 9. MR relies on a set of crucial assumptions to ensure the validity of its results. These assumptions include the requirement that the chosen genetic instrument exhibits a strong association with the exposure of interest while remaining free from direct associations whit 740 confounding factors that could influence the outcome. Additionally, the genetic variant used in 741

MR should be independently allocated during conception and inheritance, guaranteeing its 742 autonomy from potential confounders. Furthermore, this genetic instrument must affect the 743

outcome solely through the exposure of interest without directly impacting alternative pathways 744 that could influence the outcome (no horizontal pleiotropy). The five MR methods handle 745pleiotropy and instrument validity assumptions differently, offering various degrees of 746 robustness to violations. For example, MR Egger provides a method to estimate and correct for 747 pleiotropy, making it robust in the presence of horizontal pleiotropy. However, it assumes that 748 directional pleiotropy is the only form of pleiotropy present.

749 750 751 752 753 754 755 756 758 759 760 761 762

To ensure an unbiased selection of exposure variables, we followed a systematic procedure guided by the STROBE-MR Statement109. We pre-selected exposure variables across various categories based on our phenome-wide association query. These variables encompassed neurodegenerative diseases (e.g., AD), liver biomarkers (e.g., AST), cardiovascular diseases (e.g., the triglyceride-to-lipid ratio in VLDL), and lifestyle-related risk factors (e.g., BMI). Subsequently, we conducted an automated query for these traits in the IEU GWAS database110, which provides curated GWAS summary statistics suitable for MR, using the available_outcomes() function. We ensured the selected studies used European ancestry 757 populations and shared the same genome build as our GWAS (HG19/GRCh37). Additionally, we manually examined the selected studies to exclude any GWAS summary statistics overlapping with UK Biobank populations to prevent bias stemming from sample overlap111. This process yielded a set of seven exposure variables, comprising AD, breast cancer, type 2 diabetes, renin level, triglyceride-to-lipid ratio, aspartate aminotransferase (AST), and BMI. The details of the selected studies for the instrumental variables (IVs) are provided in Supplementary eTable 6. 763 765 766 767 check for violating the IV assumptions. Horizontal pleiotropy was estimated to navigate the violation of the IV's exclusivity assumption64 using a funnel plot, single-SNP MR approaches, and MR Egger estimator107. Moreover, the leave-one-out analysis excluded one instrument (SNP) at a time and assessed the sensitivity of the results to individual SNP. (H): PRS prediction: We calculated the PRS using the GWAS results from the split-sample analyses. The weights of the PRS were defined based on split1 data (training/base data), and the split2 GWAS summary statistics were used as the test/target data. The QC steps for the base data are as follows:i) removal of duplicated and ambiguous SNPs for the base data;ii) clumping the base GWAS data; iii) pruning to remove highly correlated SNPs in the target data; iv) removal of 774

high heterozygosity samples in the target data; v) removal of duplicated, mismatching and 775 ambiguous SNPs in the target data. After rigorous QC, we employed two methods to derive the three BAG-PRS in the split2 population: i) PLINK with the classic C+T method (clumping + thresholding) and ii) PRS-CS 57 with a Bayesian approach.

To determine the "best-fit" PRS P-value threshold, we performed a linear regression 779

using the PRS calculated at different P-value thresholds (0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5),

We performed several sensitivity analyses. First, a heterogeneity test was performed to controlling for age, sex, total intracellular volume, brain position during scanning (lateral, transverse, and longitudinal), and the first forty genetic PCs. A null model was established by including only the abovementioned covariates. The alternative model was then constructed by introducing each BAG-PRS as an extra independent variable. 784 785

Data Availability

The GWAS summary statistics corresponding to this study are publicly available on the MEDICINE knowledge portal (https://labs.loni.usc.edu/medicine). The software and resources used in this study are all publicly available:

MLNI: https://anbai106.github.io/mlni/ , brain age prediction (V0.1.2) MEDICINE: https://labs.loni.usc.edu/medicine, knowledge portal for dissemination and

GWAS summary statistics sharing

MUSE: https://www.med.upenn.edu/sbia/muse.html, image preprocessing for GM-IDP PLINK: https://www.cog-genomics.org/plink/, GWAS and PRS FUMA: https://fuma.ctglab.nl/, gene mapping, genomic locus annotation GCTA: https://yanglab.westlake.edu.cn/software/gcta/#Overview, heritability estimates, and fastGWA LDSC: https://github.com/bulik/ldsc, genetic correlation, partitioned heritability TwoSampleMR: https://mrcieu.github.io/TwoSampleMR/index.html, MR PRS-CS: https://github.com/getian107/PRScs , PRS 800 801

Competing Interests

None

Authors' contributions

804

Dr. Wen has full access to all the data in the study and takes responsibility for the integrity of the 805 806 807 808 809 810 data and the accuracy of the data analysis.

Study concept and design: Wen Acquisition, analysis, or interpretation of data: Wen Drafting of the manuscript: Wen Critical revision of the manuscript for important intellectual content: all authors Statistical analysis: Wen

31338 (2022).

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat Med 28, Hassabis, D., Kumaran, D., Summerfield, C. & Botvinick, M. Neuroscience-Inspired Artificial Intelligence.Neuron 95, 2453258 (2017).

Lee, J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat 817

Aging 2, 4123424 (2022). 818

Wen, J.et al. Genetic, clinical underpinnings of subtle early brain change along 821 5.

Hollon, T. et al. Artificial-intelligence-based molecular classification of diffuse gliomas 811 812 813 814 815 816 819 820 822 823 824 825 826 828 827 829 830 831 832 using rapid, label-free optical imaging.Nat Med 135 (2023) doi:10.1038/s41591-02302252-4.

Cole, J. H., Marioni, R. E., Harris, S. E. & Deary, I. J. Brain age and other bodily 8ages9: implications for neuropsychiatry.Mol Psychiatry 24, 2663281 (2019).

Jones, D. T., Lee, J. & Topol, E. J. Digitising brain age. The Lancet 400, 988 (2022). Tian, Y. E. et al. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med 1311 (2023) doi:10.1038/s41591-023-02296-6. Kaufmann, T. et al. Common brain disorders are associated with heritable patterns of apparent aging of the brain.Nat Neurosci 22, 161731623 (2019). 10. Shen, L. & Thompson, P. M. Brain Imaging Genomics: Integrated Analysis and Machine

Learning. Proceedings of the IEEE 108, 1253162 (2020). 833 834

Biobank. Nature 562, 2103216 (2018). 836 837 838 839 843 844 846 850

851 852

853 854 855

(2021). 835 influencing regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. Nat Genet 51, 163731644 (2019). 840

14. Zhao, B.et al. Common genetic variation influencing human white matter microstructure. 841

Science 372, (2021). 842 15.

Wen, J. et al. Novel genomic loci and pathways influence patterns of structural covariance

eaay6690 (2020). 845

16. Grasby, K. L. et al. The genetic architecture of the human cerebral cortex.Science 367, 847

17. Brouwer, R. M. et al. Genetic variants associated with longitudinal changes in brain 848

structure across the lifespan.Nat Neurosci 25, 4213432 (2022). 849 18. Zhao, B. et al. Common variants contribute to intrinsic human brain functional networks.

Nat Genet 54, 5083517 (2022).

with distinct genetic and biophysical associations. eLife 9, e52677. 19. Smith, S. M. et al. Brain aging comprises many modes of structural and functional change 20. Ning, K. et al. Improving brain age estimates with deep learning leads to identification of novel genetic factors associated with brain aging.Neurobiology of Aging 105, 1993204 856

21. Leonardsen, E. H. et al. Genetic architecture of brain age and its causal relations with brain 857

and mental disorders. Mol Psychiatry 1310 (2023) doi:10.1038/s41380-023-02087-y. 858 22. Jonsson, B. A. et al. Brain age prediction using deep learning uncovers associated sequence variants. Nat Commun 10, 5409 (2019). 859 860 861 865

866 867 869 870 871 872

873 874 875 876 877 797 (2010). 1926 (2017). 5043521 (2018).

282532830 (2011).

(2021).

26. Varoquaux, G. et al. Assessing and tuning brain decoders: Cross-validation, caveats, and 868 27. Samper-González, J. et al. Reproducible evaluation of classification methods in guidelines. NeuroImage 145, 1663179 (2017).

Alzheimer9s disease: Framework and application to MRI and PET data. NeuroImage 183, 862

24. Bycroft, Ce.t al. The UK Biobank resource with deep phenotyping and genomic data. 863

Nature 562, 2033209 (2018). 864 25. Emdin, C. A., Khera, A. V. & Kathiresan, S. Mendelian Randomization. JAMA 318, 19253 28. Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 29. de Lange, A.-M. G. & Cole, J. H. Commentary: Correction procedures in brain-age prediction. Neuroimage Clin 26, 102229 (2020). 30. Peng, H., Gong, W., Beckmann, C. F., Vedaldi, A. & Smith, S. M. Accurate brain age prediction with lightweight deep neural networks. Medical Image Analysis 68, 101871 than to longitudinal brain change. eLife 10, e69995 (2021).

Neuroimage 249, 118871 (2022).

workflows.NeuroImage 270, 119947 (2023). 882

Wood, D. A. et al. Accurate brain-age models for routine clinical MRI examinations. 884

More, S.et al. Brain-age prediction: A systematic comparison of machine learning 878 31. Leonardsen, E. H. et al. Deep neural networks learn general and clinically relevant representations of the ageing brain.NeuroImage 256, 119210 (2022). 880 32. Vidal-Pineiro, D. et al. Individual variations in 8brain age9 relate to early-life factors more 35. Bashyam, V. M. et al. MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14)468 individuals worldwideB.rain 143, 231232324 (2020). 36. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 2913295 (2015). 37. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: A Tool for Genome-wide

Complex Trait Analysis. Am J Hum Genet 88, 76382 (2011). 38. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47, D10053D1012 39. Cheverud, J. M. A COMPARISON OF GENETIC AND PHENOTYPIC 879 881 883 885 886 887 888 889 891 890 892 893 894 895

CORRELATIONS. Evolution 42, 9583968 (1988). 897

40. Jiang, L.et al. A resource-efficient tool for mixed model association analysis of large-scale 898

data. Nat Genet 51, 174931755 (2019). 41. Doshi, J.et al. MUSE: MUlti-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters, and locally optimal atlas selection. Neuroimage 127, 1863195 902 RAVENS Maps: Methods and Validation Using Simulated Longitudinal Atrophy.

43. Hopkins, A. L. & Groom, C. R. The druggable genomeN.at Rev Drug Discov 1, 7273730 899 900 901 903 904 905 906 908 909 910 911 913 914 915 916 917 91 9 (2016 ).

(2002). 907 44. Antibody-Mediated Targeting of Tau In Vivo Does Not Require Effector Function and

Microglial Engagement - PubMed. https://pubmed.ncbi.nlm.nih.gov/27475227/. 45.

Wilcock, G. K.et al. Potential of Low Dose Leuco-Methylthioninium Bis(Hydromethanesulphonate) (LMTM) Monotherapy for Treatment of Mild Alzheimer9s Disease: Cohort Analysis as Modified Primary Outcome in a Phase III Clinical Trial.J 912

Alzheimers Dis 61, 4353457. 46.

Wen, J.et al. Subtyping brain diseases from imaging data. Preprint at 47. Hwang, Ge.t al. Assessment of Neuroanatomical Endophenotypes of Autism Spectrum Disorder and Association With Characteristics of Individuals With Schizophrenia and the General Population. JAMA Psychiatry (2023) doi:10.1001/jamapsychiatry.2023.0409. 918 48. Chand, G. B.et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using 920

49. International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF921

GC) and OCD Collaborative Genetics Association Studies (OCGAS). Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis.Mol Psychiatry 23, 118131188 (2018). 924

50. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated 925

with educational attainment. Science 340, 146731471 (2013). 926 51. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 122831235 (2015).

Nucleic Acids Res 41, 8273841 (2013).

initiation. Nat Rev Mol Cell Biol 19, 6213637 (2018). 928

52. Hoffman, M. M.et al. Integrative annotation of chromatin elements from ENCODE data. 930

53. Haberle, V. & Stark, A. Eukaryotic core promoters and the functional basis of transcription 936 56. Choi, S. W., Mak, T. S.-H. & O9Reilly, P. F. Tutorial: a guide to performing polygenic risk Cell 129, 8233837 (2007).

score analyses. Nat Protoc 15, 275932772 (2020). 57. Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via

Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). 58. Cahoy, J. D. et al. A Transcriptome Database for Astrocytes, Neurons, and

Oligodendrocytes: A New Resource for Understanding Brain Development and Function.J.

Neurosci. 28, 2643278 (2008). 932

54. Trynka, G.et al. Chromatin marks identify critical cell types for fine mapping complex trait 933

variants. Nat Genet 45, 1243130 (2013). 934

55. Barski, A. et al. High-Resolution Profiling of Histone Methylations in the Human Genome. 944

human phenome. eLife 7, e34408 (2018). 60. Borges, M. C. et al. Circulating Fatty Acids and Risk of Coronary Heart Disease and Stroke: Individual Participant Data Meta-Analysis in Up to 16 126 Participants. J Am Heart 948

Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes.Nat Genet 44, 9813990 (2012).

62. K, M. et al. Association analysis identifies 65 new breast cancer risk loci.Nature 551, 946 947 949 950 951 955 957 958 959 960 961 965 966 data Mendelian randomization. Stat Med 36, 178331802 (2017).

the MR-Egger method. Eur J Epidemiol 32, 3773389 (2017).

952 63. Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility 953

loci for Alzheimer9s disease.Nat Genet 45, 145231458 (2013). 954 64. Bowden, J.et al. A framework for the investigation of pleiotropy in two-sample summary 956 65. Burgess, S. & Thompson, S. G. Interpreting findings from Mendelian randomization using 66. Klarin, D. et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the

Million Veteran Program. Nat Genet 50, 151431523 (2018). 67. Nelson, C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease.Nat Genet 49, 138531391 (2017). 962

68. Horie, K. et al. CSF tau microtubule-binding region identifies pathological changes in 963

primary tauopathies. Nat Med 28, 254732554 (2022). 964 69. Gusev, A. et al. Partitioning Heritability of Regulatory and Cell-Type-Specific Variants across 11 Common Diseases. The American Journal of Human Genetics 95, 5353552 (2014). 967 968 969 970 971 972

973 976 977

978 979 980 982 983 984 985 986 987 988

6263635 (2014). 70. Stamatoyannopoulos, J. A. What does our genome encode? Genome Res 22, 160231611 72. Antal, B. et al. Type 2 diabetes mellitus accelerates brain aging and cognitive decline:

Complementary findings from UK Biobank and meta-analyses.Elife 11, e73138 (2022).

Wu, A. M. L. et al. Aging and CNS Myeloid Cell Depletion Attenuate Breast Cancer Brain 974

Metastasis. Clinical Cancer Research 27, 442234434 (2021). 975 74. Gkatzionis, A. & Burgess, S. Contextualizing selection bias in Mendelian randomization: how bad is it likely to be? International Journal of Epidemiology 48, 6913701 (2019). 75. Sachdev, P. S., Zhuang, L., Braidy, N. & Wen, W. Is Alzheimer9s a disease of the white matter? Curr Opin Psychiatry 26, 2443251 (2013). 76.

Watanabe, K. et al. Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nat Genet 54, 112531132 (2022). 981 77. Cermakova, P. et al. Parental education, cognition and functional connectivity of the salience network. Sci Rep 13, 2761 (2023). 78. Cao, H., Zhou, H. & Cannon, T. D. Functional connectome-wide associations of schizophrenia polygenic risk. Mol Psychiatry 26, 255332561 (2021). 79. Yu, M., Sporns, O. & Saykin, A. J. The human connectome in Alzheimer disease 4 relationship to biomarkers and genetics. Nat Rev Neurol 17, 5453563 (2021). 80. Tost, H., Champagne, F. A. & Meyer-Lindenberg, A. Environmental influence in the brain, human welfare and mental health.Nat Neurosci 18, 142131431 (2015). 990 81. Petersen, R. C. et al. Alzheimer9s Disease Neuroimaging Initiative (ADNI): clinical 82. Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience 32, 43354 (2018). 83. Di Biase, M. A. et al. Mapping human brain charts cross-sectionally and longitudinally.

Proceedings of the National Academy of Sciences 120, e2216798120 (2023).

995

Wen, J.et al. Convolutional neural networks for classification of Alzheimer9s disease: Overview and reproducible evaluation. Medical Image Analysis 63, 101694 (2020). 85. Smith, S. M. et al. Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data.Neuroimage 31, 148731505 (2006).

131031320 (2010). 86. Tustison, N. J. et al. N4ITK: improved N3 bias correction.IEEE Trans. Med. Imaging 29, 87. Zhang, H., Schneider, T., Wheeler-Kingshott, C. A. & Alexander, D. C. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain.NeuroImage 61, 100031016 (2012). 1004 88. Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1, S208-19 (2004). 89.

Mori, S., Wakana, S., Nagae-Poetscher, L. & van Zijl, P. MRI Atlas of Human White

Matter. (Elsevier, 2005).

white matter. NeuroImage 36, 6303644 (2007). 90.

Wakana, S. et al. Reproducibility of quantitative tractography methods applied to cerebral 91. Alfaro-Almagro, F. et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank.Neuroimage 166, (2018). 92. Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective 1014

93. Beckmann, C. F. & Smith, S. M. Probabilistic independent component analysis for functional magnetic resonance imaging.IEEE Transactions on Medical Imaging 23, 1373 152 (2004). 1012 1013 1015 1016 1018 1019 1017 94. Inference for the Generalization Error | SpringerLink. 95. Wen, J. et al. Characterizing Heterogeneity in Neuroimaging, Cognition, Clinical

Symptoms, and Genetics Among Patients With Late-Life Depression.JAMA Psychiatry 1021 1022

96. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. 1023 1024 97. Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies.Nat Rev Genet 11, 4593463 (2010). 98. Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobankscale genotype datasets. Bioinformatics 33, 277632778 (2017). 1028

99. Wen, J. et al. The Genetic Architecture of Biological Age in Nine Human Organ Systems. 029 1030 100. Purcell, S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based

Linkage Analyses. Am J Hum Genet 81, 5593575 (2007). 101. Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA.Nat Commun 8, 1826 (2017). machine learning. Brain 143, 102731038 (2020).

Nat Genet 47, 123631241 (2015). 1037 103. Chand, G. B. et al. Two distinct neuroanatomical subtypes of schizophrenia revealed using 104. Bulik-Sullivan, B.et al. An atlas of genetic correlations across human diseases and traits. 1034

102. Wen, J. et al. Genetic, clinical underpinnings of subtle early brain change along 1035 1036 1038 1039 1041 105. Zhang, Y. et al. Comparison of methods for estimating genetic correlation between 1042

complex traits using GWAS summary statistics. Brief Bioinform 22, bbaa442 (2021). 1043

106. Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. 1044

Nucleic Acids Res 46, D10743D1082 (2018). 1045 107. Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression.Int J Epidemiol 44, 5123525 (2015). 108. Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol 40, 3043314 (2016). 109. Skrivankova, V. W. et al. Strengthening the Reporting of Observational Studies in Epidemiology Using Mendelian Randomization: The STROBE-MR Statement. JAMA 326, 161431621 (2021). 1054

110. Elsworth, B. et al. The MRC IEU OpenGWAS data infrastructure. 2020.08.10.244293 1055 1056 1057

Acknowledgments

1061

We want to express our sincere gratitude to the UK Biobank team for their invaluable 1062

contribution to advancing clinical research in our field. The primary funding support for this 1063

present study is from the initial funding package provided by Stevens Neuroimaging and 1064Informatics Institute, Keck School of Medicine of USC, University of Southern California for 1065 WJ. The iSTAGING consortium is a multi-institutional effort funded by NIA by RF1 AG054409 1066

for DC.This research has been conducted using the UK Biobank Resource under Application 1067 1068 Number 35148. We thank Caroline O'Driscoll for her work creating the MEDICINE web portal, which has been instrumental in showcasing and disseminating our scientific findings.

9. 11 . Elliott , L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK 12 . Smith , S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank . Nat Neurosci 24 , 7373745 ( 2021 ). 13 . Zhao , B. et al. Genome-wide association analysis of 19,629 individuals identifies variants 23 . Kendler , K. & Neale , M. Endophenotype: a conceptual analysis . Mol Psychiatry 15 , 7893 42 . Davatzikos, C. , Genc , A. , Xu , D. & Resnick , S. M. Voxel-Based Morphometry Using the 71 . Nordestgaard , B. G. & Varbo , A. Triglycerides and cardiovascular disease . Lancet 384 , epidemiological study . Nature Neuroscience 19 , 152331536 ( 2016 ). 111. Burgess , S. , Davies , N. M. & Thompson , S. G. Bias due to participant overlap in two0 sample Mendelian randomization . Genet Epidemiol 40 , 5973608 ( 2016 ).