February Multi-gradient Permutation Survival Analysis Identifies Mitosis and Immune Signatures Steadily Associated with Cancer Patient Prognosis Xinlei Cai 0 1 3 4 5 6 7 Yi Ye 0 1 3 4 5 6 9 Xiaoping Liu 0 1 3 4 5 6 7 Zhaoyuan Fang 0 1 10 11 3 4 5 6 Luonan Chen 0 1 3 4 5 6 7 8 9 Fei Li 0 1 2 3 4 5 6 Hongbin 0 1 3 4 5 6 Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences , Shanghai Department of Pathology and Frontier Innovation Center, School of Basic Medical Sciences Fei Li, Department of Pathology and Frontier Innovation Center, School of Basic Medical Fudan University , Shanghai , China Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences Hongbin Ji, State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science School of Life Science and Technology, Shanghai Tech University , Shanghai 200120 , China State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology The Second Affiliated Hospital, Zhejiang University School of Medicine , Hangzhou Zhejiang University-University of Edinburgh Institute, Zhejiang University School of 2024 22 2024 57 98

Running title: Mitosis and immune signatures in predicting cancer patient prognosis

-

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 li_fei@fudan.edu.cn. 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 The inconsistency of the association between genes and cancer prognosis is often attributed to many variables that contribute to patient survival. Whether there exist the Genes Steadily Associated with Prognosis (GEARs) and what their functions are remain largely elusive. We have developed a novel method called "Multi-gradient Permutation Survival Analysis " (MEMORY) to screen the GEARs using RNA-seq data from the TCGA database. Then we employed a network construction approach to identify hub genes from GEARs, and utilized them for cancer classification. In the case of LUAD, the GEARs were found to be related to mitosis. Our analysis suggested that LUAD cell lines carrying PIK3CA mutations exhibit increased drug resistance. For BRCA, the GEARs were related to immunity. The analysis revealed that CDH1 mutation might influence immune infiltration through the EMT process in BRCA. We further explored the prognostic relevance of mitosis and immunity through their respective scores. This study offers significant biological insights into GEARs and highlights their potential as robust prognostic indicators across diverse cancer types. Associated with Prognosis (GEAR), Hub Genes, Mitosis, Immune 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104

Introduction

Cancer is one of the leading causes of death worldwide, with over 200 types identified. Despite of huge advancements in the field, accurately predicting cancer patient prognosis still remains a significant challenge1,2. Previous studies have highlighted the complexity of this task, which is influenced by various factors including cancer type, clinical stage, therapeutic interventions, nursing care, unexpected comorbidities, and other non-cancer related illnesses that may interplay3-6. Furthermore, the gene expression plays a crucial role in predicting cancer patient prognosis, such as HER2, VEGF, Ki67, etc7-12. Nevertheless, there is still a need for further improvement in prognostic accuracy to better inform treatment decisions and patient outcomes.

Survival analysis is commonly used to assess the correlation between genes and cancer patient prognosis13. However, inconsistent findings are frequently observed, even within the same type of cancer. For instance, studies exploring the correlation between CCND1 and prognosis in NSCLC reported contrasting results, including positive correlation, inverse correlation, or negligible influence14-16. These discrepancies can be attributed to various influential factors, such as differences in sample size, cohort characteristics (including cancer subtypes and tumor staging) and variations in therapy approaches17. Such observations raise an important question:

whether there exist Genes Steadily Associated with Prognosis (GEARs) that consistently correlate with patient outcomes across different conditions, particularly considering the varying sample sizes used in different studies? Affirmative answers to this question would have significant implication not only for the development of more accurate prognostic models but also for enhancing our understanding of cancer biology. The Cancer Genome Atlas (TCGA) is a comprehensive cancer genomics project initiated in the United States. It consists of transcriptome data, genomic data, and clinical information pertaining to 33 different cancer types, making it the largest cancer clinical sample database currently available. In this study, we developed a novel method called "Multi-gradient Permutation Survival Analysis" (MEMORY) with the utilization of TCGA RNA-seq data, and accessed the potential existence of GEARs across 15 cancer types, each comprising a cohort 106 107 of over 200 patients. Furthermore, we also evaluated the prognostic predictive power of these GEARs and explored their potential biological functions in driving cancer progression. 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135

Results Multi-gradient permutation survival analysis identifies GEARs associated with mitosis and immune across multiple cancer types

GEARs are a group of genes consistently and significantly correlate with patient survival, independent of the sample size. To identify these GEARs, we developed a novel method called "Multi-gradient Permutation Survival Analysis " (MEMORY). This method allows us to assess the correlation between a specific gene and cancer patient prognosis using available transcriptomic dataset (Figure 1A, Table 1). Initially, we started with a sampling number at 10% of the cohort size initially, and gradually increased the sampling number with each 10% increment, which was analyzed with 1000 permutations. By calculating the statistical probability of each gene's association with patient survival, we can identify a group of

GEARs for further analyses.

In this study, we utilized the TCGA datasets for several reasons, including the diversity of cancer types, the availability of gene expression profiles, and patient prognosis information. We set a minimum cohort size of 200 and included 15 eligible cancer types for analysis. These cancer types include bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical Adenocarcinoma (CESC), colon Adenocarcinoma (COAD), kidney renal papillary cell carcinoma (KIRC), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell sarcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrial carcinoma (UCEC). As the number of samples increases, the survival probability of certain genes gradually approaches 1. Once this score reaches 0.8 and remained consistent with further sample gradient increase, the gene is considered a GEAR (Figure 1B). Remarkably, we successfully identified a set of GEARs across all 15 cancer types (Supplementary Figure 1-2,Table 2). The GEAR counts in most cancer types ranged from 100 to1000, with exceptions of CESC, KIRC, LGG, and PAAD with over 1000 GEARs, and THCA with only 22 GEARS (Table 2). 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164

In LUAD, the top 10 GEARs with the highest significance probabilities were TLE1, GNG7, ERO1A, ANLN, DKK1, TMEM125, S100A16, KNL1, STEAP1, and BEX4. Most of these genes are known to promote LUAD malignant progression except for S100A1618-26. In other cancer types, BEX4 was identified as a common GEAR in KIRC, LGG, PAAD, and STAD (Table 2). BEX4 is reported as a proto-oncogene promoting cancer onset and malignant progression in multiple cancers including LUAD, glioblastoma multiforme and oral squamous cell carcinoma 25,27,28. We also identified the most significant GEARs in individual cancer types, such as TLL1 (BLCA), PGK1 (BRCA), RFXANK (CESC), DPP7 (COAD), VWA8 (KIRC), SCMH1 (LGG), HILPDA (LIHC), TLE1 (LUAD), CD151 (LUSC), ANKRD13A (OV), SOCS2 (PAAD), DRG2 (PRAD), ADAMTS6 (STAD), PSMB8 (THCA), ASS1 (UCEC) (Table 3). Many of these genes were previously reported to associate with tumorigenesis18,2940. For example, TLE1 is known as a transcriptional repressor that promotes cell proliferation, migration, and inhibits apoptosis in LUAD 41,42. Additionally, PGK1, a key enzyme in the glycolytic process, has been shown to promote cell proliferation, migration, and invasion in multiple cancers43,44. These findings demonstrate the intricate link between the functionality of GEARs and the initiation and progression of cancer.

To gain deep insights into the biological functions of these identified GEARs, we conducted Gene Ontology Enrichment Analysis (GOEA) (Figure 1C). Interestingly, we found the mitosis-related pathways were enriched in LICH, LUAD, LGG, and PAAD, and the immunerelated pathways were enriched in BRCA and UCEC45. Additionally, other cancer types exhibited enrichment in various pathways, such as organic acid metabolism pathway in KIRC, oxidative phosphorylation pathway in CESC, organ development-related pathways in LUSC, neurogenesis-related pathways in BLCA, and sustenance metabolism pathways in THCA. Given the crucial role of mitosis in cancer progression and the significance of the immune system in cancer-host interactions, we specifically focused on the mitosis and immune-related GERAs, which accounted for approximately 40% of all the analyzed cancers.

Identification of hub genes in mitosis and immune-related cancers

To further identify crucial genes within the GEARs, we conducted a significant survival network (SSN) and extracted the higher-ranked edges to construct the core survival network 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 (CSN) (Figure 2A, Table 4). We created 6 CSNs with varying numbers of edges ( 250, 500, 1000, 2000, 4000, 8000). The top 10 GEARs with highest degree in these networks were defined as hub genes 46. We then classified the samples using the hub genes derived from these networks and evaluated their predictive power for patient survival. We determined that the optimal number of edges for the network ranged from 1000 to 2000, as it demonstrated the best clustering effect of hub genes (Figure 2B, Supplementary Figure 3A-O, Supplementary Figure 4A-O). Furthermore, we conducted GOEA on the hub genes selected from the network with 1000 edges and found that the results were consistent with the GEAR analysis (Table 5). Specifically, the hub genes in LIHC, LGG, and LUAD were enriched in mitosis-related pathways, whereas the hub genes in BRCA and UCEC were enriched in immune-related pathways (Figure 2C). For instance, the 9 hub genes in LUAD were associated with functions related to mitosis, whereas the 8 hub genes of BRCA were associated with immune-related functions (Figure 2D-E).

Furthermore, we conducted survival-dependent analyses on the hub genes of LGG, LIHC, and LUAD. These analyses revealed that mitosis-related hub genes are closely associated with cancer cell viability, especially those hub genes that are correlated with multiple cancer types such as CDC20, TOP2A, BIRC5 and TPX247-50 (Supplementary Figure 5A-C). The importance of these genes is well established. For instance, TPX2, a hub gene in all three cancers, is known to play a crucial role in normal spindle assembly during mitosis and is essential for cell proliferation51. The significance of these hub genes for the survival of cancer cells suggests that the expression levels of these hub genes can be used to screen for inhibitors of tumor cell growth. By integrating the Genomics of Drug Sensitivity in Cancer (GDSC) and Connectivity Map (cMAP) databases, we identified 76 individual compounds that are able to effectively suppress the expression of the 10 hub genes in LUAD cell lines (Figure 2F, Supplementary Figure 5D-E). These compounds also significantly inhibit LUAD cell growth and may serve as potential therapeutic agents for the treatment of LUAD. We then analyzed the LUAD dataset, which had a larger sample size compared to LGG and LIHC. Based on the expression of the hub genes, we classified the LUAD samples into three 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 subgroups: mitosis low (ML), mitosis medium (MM), and mitosis high (MH) (Figure 2G, terminal respiratory unit (TRU), the proximal-proliferative (PP), and the proximalinflammatory (PI) subgroups52. Our analysis revealed that the ML subgroup was primarily enriched with TRU subgroup and a small number of samples from the PP subgroup, but not the PI subgroup. The MH subgroup showed high expression of mitosis-related genes and predominantly encompassed PI and PP subgroups, whereas the MM group exhibited intermediate levels of mitosis-related gene expression (Figure 2G-H). This suggests that the new categorization method can help identify new factors that influence patient prognosis (Supplementary Figure 5F-G).

Distinct genetic mutation landscapes characterize different clusters of LUAD The expression of hub genes provided crucial indicators for distinguishing various subgroups of LUAD. However, the underlying mechanisms driving these variations in expression patterns remain unknown. In our subsequent research, we aimed to explore the genomic differences, particularly in terms of gene mutations, among different LUAD subgroups. Initially, we analyzed the variations of genetic mutation landscape of different LUAD subgroups by assessing the tumor mutation burden (TMB). Interestingly, we found significant differences in TMB among the various subgroups of LUAD (Figure 3A). Analysis of common driver genes further revealed distinct proportions of tumors with ALK and ROS1 fusions in the favorable prognosis ML and MM subgroups, compared to tumors with mutations in KRAS, EGFR, BRAF, and ERBB2 (Figure 3B). Furthermore, we observed unique mutation characteristics in each of these 3 subgroups (Figure 3C-E). For example, TP53 mutation, a prevalent tumor suppressor gene mutation, was frequently observed in the MH subgroup with a mutation rate of 73%, compared to mutation rates of 14% in the ML subgroup and 39% in the MM subgroup. Additionally, genes such as CSMD3, RP1L1, ZFHX4 exhibited similar trend in mutation frequency across the three subgroups. These findings indicate substantial genomic differences among these three LUAD subgroups based on the hub genes. 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249

We identified gene mutations that showed significant changes through gene dependence analysis (Figure 4A). To further the functional implications of these mutations, we enriched them using a pathway system called Nested Systems in Tumors (NeST)53. The results revealed notable differences in multiple functional pathways, including androgen receptor, cell cycle, TP53, EGFR, IL-6, and PIK3CA-related pathways (Figure 4B). To gain further insights into the functional implications of these different mutations, we assessed the gene dependency of cells with these mutations using the DepMap database. Importantly, we observed that the pathway differences between the MM and MH clusters were primarily associated with PIK3CA-related pathways. Previous studies have reported that PIK3CA mutation confers resistance in colorectal cancer, lung cancer, and breast cancer54-56. Therefore, we aimed to validate the role of PIK3CA mutations in drug resistance using public databases. We further analyzed A549 cells (a lung adenocarcinoma cell line with KRAS mutation) and SW1573 cells (a lung adenocarcinoma cell line with both KRAS and PIK3CA mutations) using their respective GDSC data and identified five potentially effective inhibitors (BMS345541, Dactinomycin, Epirubicin, Irinotecan, and Topotecan) from the previously screened set of 76 LUAD cell growth inhibitors. Strikingly, SW1573 cells exhibited increased resistance to all these 5 inhibitors when compared to A549 cells (Figure 4C). In line with this, clinical data also supports the notion that concurrent PIK3CA mutation is a poor prognostic factor for LUAD patients57. These findings suggest that PIK3CA mutation may contribute to drug resistance in LUAD.

Distinct clusters of BRCA exhibit different immune infiltration landscapes We further focused on the immune-related cancer types, with BRCA standing out as a representative immune-related cancer. This prompted us to conduct a more comprehensive analysis (Figure 1C). We hierarchically grouped the BRCA subgroups into three distinct clusters based on the expression of hub genes: Immune low (IL), Immune medium (IM), and Immune high (IH) (Figure 5A, Table 7). Interestingly, all the classifications based on the widely used PAM50 classification contained samples from the IL, IM, and IH subgroups, although there were minor differences in proportions58 (Figure 5B, Supplementary Figure 6A-B). In basal-type breast cancer, which generally has a poor prognosis, a set of samples 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 exhibited high immune infiltration and consequently better outcomes. Conversely, within the HER2 subgroups, known for a better prognosis, there were samples with poor outcomes. Integrating our method with traditional classification may enable a more detailed stratification of breast cancer samples. Mutation analysis revealed distinct patterns of gene mutations among the three subgroups, with significant differences in mutation frequencies of genes such as TP53, CDH1, and LRP2 (Figure 5C-E). We conducted a comparison of the proportions of immune cells across these three subgroups and observed a strong association between the overall immune cell proportion and the clustering results based on hub genes (Figure 5F, Supplementary Figure 6C-L). Specifically, the IH subgroup exhibited significantly higher proportions of CD8+ T cells and Treg cells, with mean percentages at 5.0% and 6.0%, respectively, compared to 1.4% and 3.2% in the IM subgroup, and 0.3% and 1.6% in the IL subgroup. These findings suggest the presence of distinct immune responses in these three subgroups, indicating a potential association between genetic mutation pattern and the immune microenvironment landscape.

Next, we explored the specific genomic factors influencing immune infiltration in BRCA. Mutation analysis revealed that CDH1 gene mutation ranked as the second most prevalent genetic alteration in BRCA samples, following TP53 mutation (Figure 5C-E). Previous study has indicated that CDH1 is involved in mechanisms regulating cell-cell adhesions, mobility, and proliferation of epithelial cells59. We found that CDH1-mutant samples exhibited significantly lower CDH1 expression compared to CDH1-wildtype samples, and CDH1 expression showed a close correlation with immune cell infiltration (Figure 5G, Supplementary Figure 6M). These results were consistent with clinical observations of CDH1 mutation and high immune infiltration in invasive lobular carcinoma of the breast 60. We further investigated how CDH1 mutation influenced immune infiltration. In BRCA, we observed a significant association between the expression of CDH1 and the expression of EMT marker genes VIM and TWIST2 (Supplementary Figure 6N-O)61,62. This suggests that the regulation of CDH1 is intricately linked with EMT process. Through calculating the EMT score, we observed a positive correlation between the EMT score and the proportion of immune infiltration (Figure 5H). Consequently, CDH1 might influence immune infiltration 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 through the EMT process.

Mitotic and immune signatures predict patient prognosis at pan-cancer level We further sought to explore the prognostic relevance of mitosis and immune by calculating their scores and assessing their correlation with patient outcomes at the pan-cancer level. The scores for these biological processes were computed using RNA-seq data from the TCGA database, and the median score was used as a cut-off to categorize patients into different groups. Out of the 33 cancer types analyzed, the prognosis of 17 cancer types showed a significant correlation with at least one of these two pathways (Figure 6). Specifically, 10 cancer types were exclusively associated with the mitosis score, 2 cancer types were exclusively associated with the immune score, and 5 cancer types exhibited a correlation with both mitosis and immune scores simultaneously. Overall, the identification of mitosis and immune-related biological processes as significant prognostic factors at the pan-cancer level suggests their potential utility as valuable biomarkers for predicting patient prognosis.

Discussion

Due to the multifaceted nature of variables affecting cancer patient prognosis, it remains uncertain whether there exist a set of genes steadily associated with cancer prognosis, regardless of sample size and other factors. Here, we utilized the MEMORY method to address this question and discovered that all the cancer types have GEARs. We observed significant variation in the number of GEARs among different cancer types, indicative of cancer type-specific patterns. The substantial heterogeneity in driver gene and mortality rates among various cancer types could potentially explain this phenomenon. For example, THCA, known for its low malignancy, displays the fewest genetic expression alterations among all the studied cancer types. This observation could be correlated with the relatively high fiveyear survival rate in THCA, which exceeds 50% even in advanced stage63. In contrast, PRAD, despite of a favorable prognosis similar to THCA, exhibits a significantly higher number of genetic expression alterations.

We hypothesize that the positive prognosis in PRAD cases

might be largely attributed to early diagnosis, which enables timely treatment for the majority of PRAD patients 64.

This discrepancy between the number of genetic alterations and 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 prognosis in THCA and PRAD highlights the intricate nature of cancer genetics and emphasizes the importance of personalized considerations in cancer treatment and prognosis evaluation. Furthermore, certain genes are commonly found in the GEARs of various cancer types, indicating their potential significance in cancer development. For instance, BEX4 is present in the GEARs of LUAD, KIRC, LGG, PAAD, and STAD, and known to play oncogenic role in inducing carcinogenic aneuploid transformation via modulating the acetylation of α-tubulin65,66. Aneuploidy is a hallmark characteristic of cancer and can lead to alterations in the dosage of oncogenes or tumor suppressor genes, thereby influencing tumor initiation and progression 67-70. The regulation of aneuploid transformation by BEX4 may represent a common mechanism through which this gene impacts prognosis across different cancer types. Subsequently, we discovered that GEARs across different cancers displayed distinct functional characteristics. Notably, a recurrent theme was the prominence of mitosisrelated and immune-related features. Specifically, the GEARs of LGG, LIHC, and LUAD were enriched in

mitosis-related pathways, whereas BRCA and UCEC showed the enrichment of immune-related pathways. Mitosis and immune processes have been widely recognized as having a significant impact on patient prognosis71-75. However, current clinical guidelines do not yet recommend the use of transcriptomic data to assess scores related to mitotic and immune pathways for predicting patient outcomes, despite the well-established association between these pathways and prognosis in cancer76-81. Cancers enriched with mitosis pathways often exhibit heterogeneous tumor growth kinetics across individuals, with tumor size being one of the crucial factors influencing patient survival 82-84. For instance, in the case of LGG, the growth rate of tumors directly impacts the patient's prognosis due to the primary location in the brain. Consequently, alterations in the expression levels of genes related to mitosis are often indicative of the prognosis of LGG 85. Cancer types that are enriched in immune-related pathways, such as BRCA and UCEC, are closely associated with deficiencies in DNA mismatch repair (MMR) 86,87. Cancers enriched in immune-related pathways often exhibit heterogeneous tumor growth kinetics across individuals, where tumor size is closely correlated with patient survival.

To explore the underlying differences between samples associated with distinct mitotic and 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 immune-related pathways, we employed GEAR analysis to identify hub genes for further molecular classification and mutation analysis. Specifically, we focused on LUAD and BRCA, two representative cancer types exhibiting enriched mitosis and immune signatures in GEARs. By classifying the hub gene, we divided LUAD into three subgroups: ML, MM and MH, which displayed significant difference in survival outcomes. Subsequently, we utilized NeST to analyze the mutations within these subgroups. Notably, there were significant differences observed in cell cycle and signal transduction-related pathways among ML, MM and MH subgroups. Of particular importance, the PIK3CA-related pathway emerged as a key differentiating pathway between the MH and MM subgroups. Our analyses suggest that LUAD cell line carrying PIK3CA mutation may exhibit increased drug resistance. This aligns with previous studies demonstrating that targeting the PI3K pathway can overcome drug resistance88,89. Of course, future research is warranted to elucidate the exact functional role of

PIK3CA mutations in LUAD.

To investigate the factors influencing immune infiltration in BRCA, we classified the samples based on hub genes and analyzed the differentially mutated genes. Interestingly, we found that CDH1 mutation occurred at a high frequency in the IH subgroups. This led us to speculate that CDH1 plays a crucial role in the regulation of immune infiltration in BRCA. Furthermore, previous studies have reported that CDH1 inactivation promotes immune infiltration in breast cancer60. CDH1 is a vital gene associated with EMT, and various studies have demonstrated the close relationship between EMT and the tumor immune microenvironment in different cancers90. Consistently, our results also supported the correlation between EMT score and immune infiltration in BRCA. These findings suggest that the mutations in PIK3CA and CDH1, identified through GEAR analysis, have significant impacts on cancer development and hold potential value in improving clinical therapies. This further emphasizes the importance of GEARs in understanding cancer biology and guiding treatment strategies.

Lastly, we investigated the prognostic predictive capabilities of the mitosis score and immune score at the pan-cancer level. Surprisingly, we found that approximately half of the cancer types exhibited significant correlations between these two scores and patient prognosis. 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391

Interestingly, even for cancers originating from the same primary location, their correlations with these scores could differ, indicating the potential diversity of mechanisms underlying cancer-related mortality. For instance, both LGG and GBM are brain tumors, but the primary risk factor for patient prognosis in LGG is closely linked to tumor diameter, whereas the main risk factor for GBM patients lies in its high invasiveness and challenge of surgical resection91,92.

Despite of the strong association between GEAR and patient prognosis, the edges of the CSN constructed based on GEARs was undirected. As a result, the hub genes identified from

GEAR in the CSN

may primarily serve as stable and effective biological markers.

Additionally, through multi-omics analysis, we obtained some functional mutations, but the therapeutic significance of these mutations remains to be elucidated. For example, further study is needed to understand the exact role of PIK3CA mutations in promoting tumor cell proliferation and drug resistance. Similarly, the association of CDH1 mutations with the infiltration of multiple immune cell types also warrants additional experimental investigation. In our future studies, we plan to utilize protein-protein interaction networks and pathway databases to construct a novel network based on the CSN. This approach will allow us to directly screen genes from GEARs that could potentially serve as therapeutic targets. Undoubtedly, future efforts are still required to utilize protein-protein interaction networks and pathway databases to construct a new network based on CSN.

In conclusion, our study utilized the MEMORY algorithm to identify GEARs in 15 cancer types, highlighting the significance of mitosis and immunity in cancer prognosis. Our findings demonstrate that GEARs possess substantial biological significance beyond their role as prognostic biomarkers. This study provides valuable guidance for establishing standards for survival analysis evaluation and holds potential for the development of novel therapeutic strategies.

Methods 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417

Datasets

The gene expression profiles and clinical information of 33 cancers were obtained from the TCGA database and downloaded by the GDC data website (https: //portal.gdc.cancer.gov/).

All gene expression data and survival data were integrated and normalized.
Multi-gradient permutation survival analysis

Gene expression data of cancer patients were obtained from the RNA-seq expression matrix of TCGA after TPM normalization. We randomly sampled the gene expression data according to the gradient. The sampling strategy was as follows: Ten gradient increases in sample size were pre-set, ranging from approximately 10% to about 100%, with intervals of 10%. Random samples were taken from total samples of each cancer 1000 times, based on each pre-set sample size. Multiple sampling matrix was obtained after the sampling strategy was performed in each gradient. The survival analysis was performed for every sampling matrix by “survival” and “survminer”. “survival” and “survminer” are R packages for survival analysis. The survival analysis method was as follows according to the median expression value of every gene, every sampling matrix was divided into a high and low expression group and performed survival analysis by “survival” and “survminer”. We used 1 for significant survival analysis results (P < 0.05) and 0 for non-significant results (P > 0.05).

The survival analysis matrices were obtained after these processes.
Construction of core survival network

The survival analysis matrices were integrated to a significant-probability matrix by a = ∑10=010 × × 1000 (1) where the Aij is the value from row i and column j in the significant-probability matrix. We defined the sampling size kj reached saturation when the max value of column j was equal to 1 in a significant-probability matrix. The least value of kj was selected, and the genes with their corresponding Aij greater than 0.8 were extracted as GEAR. 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444

We also defined survival analysis similarity (SAS) as the similarity of the effect on patient prognosis in two genes. The SAS computational formula is as follows: =

∩ ∪ −2( ∩ )+1 (2) Where A and B are results of 1000 survival analyses at the first sampling gradient, which are sequences of 0 and 1 of length 1000. ∩ means the number of events that are simultaneously 1 in both A and B. ∪ means the size of the union of A and B. The SAS of significant genes with total genes was calculated, and the significant survival network was constructed. Then some SAS values at the top of the rankings were extracted, and the SAS was visualized to a network by Cytoscape93. The network was named core survival network (CSN). The degree of each node was calculated, and the nodes with the top10 degrees were defined hub genes. The effect of molecular classification by hub genes is indicated that 1000 to 2000 was a range that the result of molecular classification was best. One thousand was chosen as the number of SAS. Therefore, one thousand was selected as the number of SAS values for constructed significant survival network.

Gene ontology enrichment analysis

Gene ontology has been used to classify genes based on functions. The gene functions were divided three types, includes molecular function (MF), biological process (BP), and cellular component (CC). ClusterProfiler is an R package for gene set enrichment analysis94. The gene ontology enrichment analysis (GOEA) was processed in significant gene sets and hub genes by ClusterProfiler.

Hub gene classification

Tumor samples were genotyped. First, the expression matrix of hub genes corresponding to cancer types was extracted from RNA-seq data. The matrix M is log-transformed after adding a pseudo count of 1: ′ =

2( + 1). Finally, ′ was grouped by hierarchical clustering.

Calculation of tumor mutation burden (TMB) and differential mutation The TMB and differential mutation gene analysis were carried out to explore the differences between different genotyped samples at the genomic level. The data are from the gene preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 mutation data of TCGA tumor samples, and the grouping information is from the hub gene hierarchical clustering results. Maftools is an R package for the analysis of somatic variant data, which can export results in form of charts and graph95. The calculation of TMB and difference mutation analysis were used by the maftools.

Quantifying the effect of gene mutations for tumor cell viability

Gene dependence refers to the extent to which genes are essential for cell proliferation and survival. The Cancer Dependency

Map (Depmap, https://depmap.org/portal/) database provides genome-wide gene dependence data for large number of tumor cell lines96. Gene mutation data of cell lines were obtained from the

CCLE database (https://sites.broadinstitute.org/ccle). The genes appearing in the gene mutation data of cell lines were extracted and sorted into mutation list. Each gene in the list was then analyzed for survival-dependent differences. For each gene in the mutation list, we compared the gene dependence score (S) between cell lines with and without the gene mutation. Specifically, cell lines from the CCLE mutation dataset harboring a mutation in a given gene were classified into one group, while cell lines without the mutation constituted the control group. S for the mutation (Sm) and wild-type (Swt) cell lines were then extracted from the DepMap database. A two-sample t-test was used to assess the statistical significance of differences between Sm and Swt. The test yielded a P-value, with an alpha level set at 0.05, to determine the significance of the difference in means. Dm was defined as gene mutation dependence, the computational formula was as follows: Subsequently, we conducted a standardization assessment of Dm, employing the following 2( ̅

− ̅ ) ̅ + ̅ (4) In this study, we categorized mutations identified in cell lines into three groups based on their impact on gene dependency. Functional mutations refer to the mutations that exhibit 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 significant changes in gene dependency scores, with a P-value < 0.05 and Sdm > 0.1. Nonfunctional mutations were those without significant changes in gene dependency scores compared to wildtype. Subtype-associated functional mutations were a subset of functional mutations that show statistically significant differences in occurrence frequency across different LUAD subtypes.

Identification of drugs downregulating hub gene expression

The IC50 data of 76 compounds of LUAD cell lines were obtained from Genomics of Drug Sensitivity in Cancer (GDSC) database97. The hub gene expression data of A549 cell lines after treatment with 76 compounds was obtained from

Connectivity MAP (cMAP,

https://clue.io/) database, and the data were grouped by hierarchical clustering98. The drugs which were presented only in the hub gene expression suppression group were considered to effective against the LUAD cell.

Immune infiltration analysis

Immune infiltration data was calculated by quantiseq method99. Immunedeconv was an R package for unified access to computational methods for estimating immune Cell fractions from bulk RNA-seq data100. The data were from the gene expression data of TCGA tumor samples after TPM was standardized. Then the immune cell fractions of tumor samples were calculated by quantiseq method invoking immunedeconv.

Biological function score calculation

GSVA is an R package for calculate biological function score based on a single sample101. The data was from TCGA in the calculation process, and the software package was GSVA. The gene set of the immune score was from other study (Table 8) 99. The gene sets that calculated the mitosis score were obtained from gene ontology (GO:0140014).

Acknowledgment

This work was supported by the National Key Research and Development Program of China (grants 2022YFA1103900 to H.J., F.L.; 2020YFA0803300 to H.J.); the National Natural Science Foundation of China (grants 82341002 to H.J., 32293192 to H.J., 82030083 to H.J.,

from T cell basic science to clinical practice. Nat Rev Immunol 20, 651-668. 10.1038/s41577020-0306-5.

Zhou, Z., Li, Y., Hao, H., Wang, Y., Zhou, Z., Wang, Z., and Chu, X. (2019). Screening Hub Genes as Prognostic Biomarkers of Hepatocellular Carcinoma by Bioinformatics Analysis. Cell Transplant 28, 76S-86S. 10.1177/0963689719893950.

Hwang, L.H., Lau, L.F., Smith, D.L., Mistrot, C.A., Hardwick, K.G., Hwang, E.S., Amon, A., and Murray, A.W. (1998). Budding yeast Cdc20: a target of the spindle checkpoint. Science 279, 1041-1044. 10.1126/science.279.5353.1041.

Uuskula-Reimand, L., and Wilson, M.D. (2022). Untangling the roles of TOP2A and TOP2B in transcription and cancer. Sci Adv 8, eadd4920. 10.1126/sciadv.add4920.

Lamers, F., van der Ploeg, I., Schild, L., Ebus, M.E., Koster, J., Hansen, B.R., Koch, T., Versteeg, R., Caron, H.N., and Molenaar, J.J. (2011). Knockdown of survivin (BIRC5) causes apoptosis in neuroblastoma via mitotic catastrophe. Endocr Relat Cancer 18, 657-668. 10.1530/ERC-11-0207.

Wittmann, T., Wilm, M., Karsenti, E., and Vernos, I. (2000). TPX2, A novel xenopus MAP involved in spindle pole organization. J Cell Biol 149, 1405-1418. 10.1083/jcb.149.7.1405. Kufer, T.A., Sillje, H.H., Korner, R., Gruss, O.J., Meraldi, P., and Nigg, E.A. (2002). Human TPX2 is required for targeting Aurora-A kinase to the spindle. J Cell Biol 158, 617-623. 10.1083/jcb.200204155.

Cancer Genome Atlas Research, N. (2014). Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543-550. 10.1038/nature13385.

Zheng, F., Kelly, M.R., Ramms, D.J., Heintschel, M.L., Tao, K., Tutuncuoglu, B., Lee, J.J., Ono, K., Foussard, H., Chen, M., et al. (2021). Interpretation of cancer mutations using a multiscale map of protein systems. Science 374, eabf3067. 10.1126/science.abf3067. Wang, Q., Shi, Y.L., Zhou, K., Wang, L.L., Yan, Z.X., Liu, Y.L., Xu, L.L., Zhao, S.W., Chu, H.L., Shi, T.T., et al. (2018). PIK3CA mutations confer resistance to first-line chemotherapy in colorectal cancer. Cell Death Dis 9, 739. 10.1038/s41419-018-0776-6.

Shibata, T., Kokubu, A., Tsuta, K., and Hirohashi, S. (2009). Oncogenic mutation of PIK3CA in small cell lung carcinoma: a potential therapeutic target pathway for chemotherapy-resistant lung cancer. Cancer Lett 283, 203-211. 10.1016/j.canlet.2009.03.038.

Hanker, A.B., Pfefferle, A.D., Balko, J.M., Kuba, M.G., Young, C.D., Sanchez, V., Sutton, C.R., Cheng, H., Perou, C.M., Zhao, J.J., et al. (2013). Mutant PIK3CA accelerates HER2driven transgenic mammary tumors and induces resistance to combinations of anti-HER2 therapies. Proc Natl Acad Sci U S A 110, 14372-14377. 10.1073/pnas.1303204110. Eng, J., Woo, K.M., Sima, C.S., Plodkowski, A., Hellmann, M.D., Chaft, J.E., Kris, M.G., Arcila, M.E., Ladanyi, M., and Drilon, A. (2015). Impact of Concurrent PIK3CA Mutations on Response to EGFR Tyrosine Kinase Inhibition in EGFR-Mutant Lung Cancers and on Prognosis in Oncogene-Driven Lung Adenocarcinomas. J Thorac Oncol 10, 1713-1719. 10.1097/JTO.0000000000000671.

Thorsson, V., Gibbs, D.L., Brown, S.D., Wolf, D., Bortone, D.S., Ou Yang, T.H., Porta-Pardo, E., Gao, G.F., Plaisier, C.L., Eddy, J.A., et al. (2018). The Immune Landscape of Cancer. Immunity 48, 812-830 e814. 10.1016/j.immuni.2018.03.023.

Meigs, T.E., Fedor-Chaiken, M., Kaplan, D.D., Brackenbury, R., and Casey, P.J. (2002).

Galpha12 and Galpha13 negatively regulate the adhesive functions of cadherin. J Biol Chem

10.1111/his.13236.

Byrne, A., Savas, P., Sant, S., Li, R., Virassamy, B., Luen, S.J., Beavis, P.A., Mackay, L.K., Neeson, P.J., and Loi, S. (2020). Tissue-resident memory T cells in breast cancer control and immunotherapy responses. Nat Rev Clin Oncol 17, 341-348. 10.1038/s41571-020-0333-y. Liu, W., Sun, L., Zhang, J., Song, W., Li, M., and Wang, H. (2021). The landscape and prognostic value of immune characteristics in uterine corpus endometrial cancer. Biosci Rep 41. 10.1042/BSR20202321.

Ducreux, M., Cuhna, A.S., Caramella, C., Hollebecque, A., Burtin, P., Goere, D., Seufferlein, T., Haustermans, K., Van Laethem, J.L., Conroy, T., et al. (2015). Cancer of the pancreas: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 26 Suppl 5, v56-68. 10.1093/annonc/mdv295.

Cardoso, F., Kyriakides, S., Ohno, S., Penault-Llorca, F., Poortmans, P., Rubio, I.T., Zackrisson, S., Senkus, E., and clinicalguidelines@esmo.org, E.G.C.E.a. (2019). Early breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-updagger. Ann Oncol 30, 1194-1220. 10.1093/annonc/mdz173.

Oaknin, A., Bosse, T.J., Creutzberg, C.L., Giornelli, G., Harter, P., Joly, F., Lorusso, D., Marth, C., Makker, V., Mirza, M.R., et al. (2022). Endometrial cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Ann Oncol 33, 860-877. 10.1016/j.annonc.2022.05.009.

Vogel, A., Cervantes, A., Chau, I., Daniele, B., Llovet, J.M., Meyer, T., Nault, J.C., Neumann, U., Ricke, J., Sangro, B., et al. (2018). Hepatocellular carcinoma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 29, iv238-iv255. 10.1093/annonc/mdy308.

Stupp, R., Brada, M., van den Bent, M.J., Tonn, J.C., Pentheroudakis, G., and Group, E.G.W. (2014). High-grade glioma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 25 Suppl 3, iii93-101. 10.1093/annonc/mdu050.

Postmus, P.E., Kerr, K.M., Oudkerk, M., Senan, S., Waller, D.A., Vansteenkiste, J., Escriu, C., Peters, S., and Committee, E.G. (2017). Early and locally advanced non-small-cell lung cancer (NSCLC): ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 28, iv1-iv21. 10.1093/annonc/mdx222.

Gui, C., Kosteniuk, S.E., Lau, J.C., and Megyesi, J.F. (2018). Tumor growth dynamics in serially-imaged low-grade glioma patients. J Neurooncol 139, 167-175. 10.1007/s11060-0182857-x.

Alvarez, M., Benhammou, J.N., Darci-Maher, N., French, S.W., Han, S.B., Sinsheimer, J.S., Agopian, V.G., Pisegna, J.R., and Pajukanta, P. (2022). Human liver single nucleus and single cell RNA sequencing identify a hepatocellular carcinoma-associated cell-type affecting survival. Genome Med 14, 50. 10.1186/s13073-022-01055-5.

Infante, M., Berghmans, T., Heuvelmans, M.A., Hillerdal, G., and Oudkerk, M. (2013). Slowgrowing lung cancer as an emerging entity: from screening to clinical management. Eur Respir J 42, 1706-1722. 10.1183/09031936.00186212.

Hoshino, T. (1984). A commentary on the biology and growth kinetics of low-grade and highgrade gliomas. J Neurosurg 61, 895-900. 10.3171/jns.1984.61.5.0895.

Sajjadi, E., Venetis, K., Piciotti, R., Invernizzi, M., Guerini-Rocco, E., Haricharan, S., and Fusco, N. (2021). Mismatch repair-deficient hormone receptor-positive breast cancers:

Biology and pathological characterization. Cancer Cell Int 21, 266. 10.1186/s12935-02101976-y.

Doghri, R., Houcine, Y., Boujelbene, N., Driss, M., Charfi, L., Abbes, I., Mrad, K., and Sellami, R. (2019). Mismatch Repair Deficiency in Endometrial Cancer: Immunohistochemistry Staining and Clinical Implications. Appl Immunohistochem Mol Morphol 27, 678-682. 10.1097/PAI.0000000000000641.

Zhang, Q., Sun, T.Y., Kang, P.M., Qian, K., Deng, B., Zhou, J.H., Wang, R.W., Jiang, B., Li, K., Liu, F., et al. (2016). Combined analysis of rearrangement of ALK, ROS1, somatic mutation of EGFR, KRAS, BRAF, PIK3CA, and mRNA expression of ERCC1, TYMS, RRM1, TUBB3, EGFR in patients with non-small cell lung cancer and their clinical significance. Cancer Chemoth Pharm 77, 583-593. 10.1007/s00280-016-2969-y.

Donev, I.S., Wang, W., Yamada, T., Li, Q., Takeuchi, S., Matsumoto, K., Yamori, T., Nishioka, Y., Sone, S., and Yano, S. (2011). Transient PI3K inhibition induces apoptosis and overcomes HGF-mediated resistance to EGFR-TKIs in EGFR mutant lung cancer. Clin Cancer Res 17, 2260-2269. 10.1158/1078-0432.CCR-10-1993.

Wang, G., Xu, D., Zhang, Z., Li, X., Shi, J., Sun, J., Liu, H.Z., Li, X., Zhou, M., and Zheng, T. (2021). The pan-cancer landscape of crosstalk between epithelial-mesenchymal transition and immune evasion relevant to prognosis and immunotherapy response. NPJ Precis Oncol 5, 56. 10.1038/s41698-021-00200-4.

Brown, T.J., Bota, D.A., van Den Bent, M.J., Brown, P.D., Maher, E., Aregawi, D., Liau, L.M., Buckner, J.C., Weller, M., Berger, M.S., and Glantz, M. (2019). Management of lowgrade glioma: a systematic review and meta-analysis. Neurooncol Pract 6, 249-258. 10.1093/nop/npy034.

Chen, L., Ma, J., Zou, Z., Liu, H., Liu, C., Gong, S., Gao, X., and Liang, G. (2022). Clinical characteristics and prognosis of patients with glioblastoma: A review of survival analysis of 1674 patients based on SEER database. Medicine (Baltimore) 101, e32042. 10.1097/MD.0000000000032042.

Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-2504. 10.1101/gr.1239303.

Yu, G., Wang, L.G., Han, Y., and He, Q.Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284-287. 10.1089/omi.2011.0118. Mayakonda, A., Lin, D.C., Assenov, Y., Plass, C., and Koeffler, H.P. (2018). Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res 28, 17471756. 10.1101/gr.239244.118.

Tsherniak, A., Vazquez, F., Montgomery, P.G., Weir, B.A., Kryukov, G., Cowley, G.S., Gill, S., Harrington, W.F., Pantel, S., Krill-Burger, J.M., et al. (2017). Defining a Cancer Dependency Map. Cell 170, 564-576 e516. 10.1016/j.cell.2017.06.010.

Yang, W., Soares, J., Greninger, P., Edelman, E.J., Lightfoot, H., Forbes, S., Bindal, N., Beare, D., Smith, J.A., Thompson, I.R., et al. (2013). Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res 41, D955-961. 10.1093/nar/gks1111.

Subramanian, A., Narayan, R., Corsello, S.M., Peck, D.D., Natoli, T.E., Lu, X., Gould, J., 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853

L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437-1452 e1417. 10.1016/j.cell.2017.10.049. 99.

Finotello, F., Mayer, C., Plattner, C., Laschober, G., Rieder, D., Hackl, H., Krogsdam, A., modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data.

Genome Med 11, 34. 10.1186/s13073-019-0638-6.

Sturm, G., Finotello, F., and List, M. (2020). Immunedeconv: An R Package for Unified Access to Computational Methods for Estimating Immune Cell Fractions from Bulk RNASequencing Data. Methods Mol Biol 2120, 223-232. 10.1007/978-1-0716-0327-7_16.

Hanzelmann, S., Castelo, R., and Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7. 10.1186/1471-2105-14-7. 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881

Figure Legends cancers. (A) Sample sizes, ranging from 10% to 100% with 10% intervals (“gradient”), were used for 1000 permutations of survival analyses of all 15 cancer types Each matrix was divided into high and low expression groups based on median gene expression values, and survival analyses were performed. Log-rank test significance results were coded as 1 for significant (P < 0.05) and 0 for non-significant (P > 0.05) outcomes, forming survival analysis matrices and the summarized significant probability matrix, which allowed the identification of GEARs. (B) The maximum of significant probability for each gradient sample size of every cancer type. Sample number ratio refers to the percentage of samples in each sampling gradient compared to the total number of samples. (C) The pathways were enriched by GOEA based on the GEARs of each cancer type. The displayed pathways represent the top 5 most significant pathways for each cancer type. Mitosis-related pathways were marked in red whereas immune-related pathways were marked in blue. all GEARs was calculated, and CSN was constructed. (B) The P-value of survival analysis was determined based on hub genes grouped by edge numbers (250, 500, 1000, 2000, 4000 and 8000 edges) from six CSNs. (C) The left nodes represent five different cancers, while the right nodes detail the functional types of hub-gene pathways. The height of the edges in the middle represents the proportion of hub-gene pathways corresponding to a specified functional type. (D-E) The CSN of LUAD and BRCA. The enlarged section represents the hub gene. (F) LUAD clustering based on hub gene expression, compared with inhibitor-based classification52. (G) LUAD clustering based on hub gene expression in comparison with the classic classification. (A) TMB analysis in three groups of LUAD samples. ***P < 0.0005. (B) Proportion of different oncogenic drivers including KRAS, EGFR, BRAF, ERBB2 mutations, and ALK, ROS1 fusions in three LUAD subgroups. (C-E) Comparison of top gene mutations in three 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907

LUAD clusters, including ML vs. MM (C), MM vs. MH (D), ML vs MH (E). Cancer-related gene mutations were annotated based on gene dependency scores data from Depmap database. Functional mutations were indicated in green; functional (subtypeassociated) mutations were highlighted in red; non-functional mutations were indicated in gray. (B) The analysis of the NeST differential pathways across three groups including ML vs. MM, MM vs. MH, ML vs MH. (C) The heat map showed the hub gene expression of A549 after compounds treatment from cMAP database98. Blue annotations meant 76 compounds that can inhibit hub gene expression. (D) Comparison of the IC50 z-score of five tumor cell growth inhibitors for A549 (KRAS mutation) and SW1573 (KRAS and PIK3CA mutation). EMT and immune infiltration. (A) The hub gene expression heatmap containing the result of hub gene classification and PAM50 classification of BRCA58. (B) Comparison of molecular classification based on hub genes with PAM50 classification. (C-E) Comparison of gene mutations in three groups of BRCA samples, including IL vs. IM (C), IM vs. IH (D), IL vs. IH (E). (F) Comparison of the total immune cell rate in three BRCA clusters. ***P < 0.0005. (G) Comparison of CDH1 expression in BRCA samples with wildtype or mutant CDH1. ***P < 0.0005. (H) A schematic diagram illustrating the correlation between the EMT score and immune cell rate. The red line represents the fitting curve. The correlation analysis was performed using Pearson correlation coefficient. The mitosis and immune-related pathway scores of 33 cancer types were analyzed, and the median score was utilized as a threshold to categorize patients for survival analysis. Among the 33 cancer types examined, 10 cancer types were exclusively associated with the mitosis score, 2 cancer types were exclusively associated with the immune score, and 5 cancer types showed a correlation with both mitosis and immunity scores concurrently.

Sampling ×1000 times each gradient kt j{1,2,···,10} kj = j×k1 g1 g2 ... gn

S1

Sk1

Multiple Sampling Matrix ...

×1000 Sk2

×1000 ...

Sk10 ×1000 find min(j) to max(kj)=1 Aij > 0.80

GEARs

C

Gene Exprssion Matrix

S1 S2 ...

Sm g1 g2 ... gn g1 g2 ... gi ... gn

Aij Significant Probability Matrix B

k2 1 0 0 1 0 1 1 0 1 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 sample_number_ratio t1000 1 0 0 0 0 1 BLCA BRCA CESC COAD KIRC LGG LIHC LUAD LUSC OV PAAD PRAD STAD THCA UCEC

k10 GeneRatio p.adjust i g r e 1 ... ...

rank top 1000 rank > 1000 ... s R A E G ...

... rank top1000 s e n e g l a t o t F i g r e 2

Significant survival Networks Core Survival Networks

A E NKG7 TIGIT CD69 G CIRBP SCIMP

CD200R1 HELLS HASPIN

KIF20B IGKV2D-24

INCENP

DSCC1 1000 2000 edge number

IGKV6-21

NEK2

CDC6 CDC25C

IGKV3S-7PAG5 FOXM1

IGHV3-38 4000 NCAPG2 HROB

8000 WAS

ESPL1 CDCA5

EXO1 CEP55

BRMCAKASI16P7M SGO1 CENPE IGHV3OR16-1B3CL11B

PLK1 IRF8 TRAV8-2PRKCQGTSE1

KIF20ABUB1B

CD52 CD3D 2 Classic

PI PP

TRU Hub

ML MM MH BIRC5

CXCRP6KMYT1 CENPU

CDC45 IL7R H2AZ1 LY9 CENPW

SPIB

TRAV12-1

UBE2C

IL2RB

JAMLARHGAP15

PAZRAPPB70P TRBV19 CD19 TRBV14

MS4A1

H

AURKB

TROAP

CDC20 ORC1

TESPA1 IKZF1 FCRL5

ACAP1 SELL

BLK 1.00

FCRLA

HLA-DMA

mune Others 4 0 −2 −4 subtype

PI PP TRU ASF1B

SEPTIN1

GIMAP7

ARHG2EF6 TBC1D10C

IRAG2 FCER2

IL16

CDCA3

CDCA8 AUNIP TNFAIP8L2 MYBL2

KIF18B TACC3 TICRRLST1 TK1 0.75 SAAt2-SAA4 n e r0.50 c e P 0.25

ML MH ML

MM cluster MH

KRAS EGFR BRAF ERBB2 ALK ROS1

A D E 30 n e d r 2u0 b l a n o it a t u m r o 10 m u T 0 14% 19% 1% 0% 3% 0% 1% 1% 1% 5% 39% 16% 24% 3% 2% 33% 4% 4% 0% 0% 14% 19% 19% 3% 22% 5% 16% 8% 3% 0%

ML (N = 74) MM (N = 205) ML (N = 74)

TP53 TTN TENM1 ASTN2 FLG2 SLC8A1 CPS1 LAMA2 NLRP3 ANK2

*** 1.00 0.75 t n e rc0.50 e P 39% 39% 13% 10% 15% 9% 11% 11% 11% 19% 71% 33% 43% 13% 11% 51% 14% 14% 7% 7% 71% 56% 51% 25% 48% 25% 41% 29% 20% 14% i g r e

A C D E F 1% 0% 1% 1% 26% 8% 2% 13% 0% 0% 0% 0% 0% 0% 1% 1% 0% 1% 0% 0% 26% 1% 8% 1% 0% 0% 1% 2% 0% 0% 1.00 0.25 0.00

XIRP2 AMER1 LRP2 HERC1 TP53 CDH1 GNPTAB TTN ADAM21

CEP162 G 2500 2000 n o i s se1500 r p x e 1 H CD1000 500 IL (N = 382) IM (N = 399) Frame_Shift_Del Missense_Mutation

Frame_Shift_Ins Nonsense_Mutation

Splice_Site In_Frame_Del

Multi_Hit Nonstop_Mutation

In_Frame_Ins IM (N = 399)

IH (N = 135) IL (N = 382) IH (N = 135) MED12L EXO1 CLTC ZNF609 HTT DAB1 SGIP1 SDK1 CAPRIN2

NKAPL

IL

IM cluster

IH mut

wt CDH1 type 10 6 4 2 0 Classic

LumA LumB Her2

Basal Hub

IL IM IH

B t n rce0.50 e P

H i g r e 6 ) e u l a v _ p 4 ( 0 1 g o l − 0 mitotic_score

immune_score Survival analysis − Median grouping i g r e 6

Multi-gradient Permutation Survival Analysis Identifies Mitosis and Immune Signatures

Steadily Associated with Cancer Patient Prognosis

Xinlei Cai1, Yi Ye2, Xiaoping Liu1, Zhaoyuan Fang4,5, Luonan Chen1,2,6, Fei Li3*, Hongbin 1Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences,

Hangzhou, 310024, China.

200031, China. 2State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 3Department of Pathology and Frontier Innovation Center, School of Basic Medical Sciences,

Fudan University, Shanghai, China.

4Zhejiang University-University of Edinburgh Institute, Zhejiang University School of

Medicine, Haining,314400, China

310000, China 5The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 6School of Life Science and Technology, Shanghai Tech University, Shanghai 200120, China. Running title: Mitosis and immune signatures in predicting cancer patient prognosis Hongbin Ji, State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China, Phone: 86-21-54921108, Fax: 86

Sciences, Fudan University, 131 Dong An Road, Shanghai, 200032, China, E-mail address:

Supplementary Figure 1: (A) to (O) The number of genes significantly related to prognosis which were obtained through MEMORY was positively correlated with sample size. The vertical value of the dotted line is 0.8.

Supplementary Figure 2: (A) to (O) The significant probability of GEAR at each gradient.

The vertical value of the dotted line is 0.8.

Supplementary Figure 3: (A) to (O) The core survival networks of 15 cancer types. Red nodes are hub genes.

Supplementary Figure 4: (A) to (O) 15 cancer types were genotyped through hierarchical clustering based on the expression of hub genes, and the samples of each cancer were divided into three clusters

Supplementary Figure 5: (A-C) The gene dependency analysis of hub genes for LGG (A), LIHC (B), LUAD (C). (D) The number of compounds associated with A549 in the cMAP and GDSC databases. (E) Relative sensitivity of A549 cell line to all compounds from the GDSC database, calculated based on IC50 values. (F) Kaplan-Meyer overall survival curves for LUAD classic subgroups. (G) Kaplan-Meyer overall survival curves for LUAD hub gene

Supplementary Figure 6: (A) Kaplan-Meyer overall survival curves for BRCA classic subgroups. (B) Kaplan-Meyer overall survival curves for BRCA PAM50 subgroups. (C-L) Comparison of infiltration of different types of immune cells in three group BRCA samples. ***P < 0.0005, **P < 0.005 and *P < 0.05. (M) Comparison of the proportion of immune cells in the total cells in BRCA samples with high CDH1 expression and low CDH1 expression. ***P < 0.0005. (N) Schematic diagram of the correlation between CDH1 expression and VIM expression. The red line was a fitting curve. Pearson correlation coefficient was used for correlation analysis. (O) Schematic diagram of the correlation between CDH1 expression and TWIST2 expression. The red line was a fitting curve. Pearson

Davis , J.F. , Tubelli , A.A. , Asiedu , J.K. , et al. ( 2017 ). A Next Generation Connectivity Map: Loncova , Z. , Posch , W. , Wilflingseder , D. , et al. ( 2019 ). Molecular and pharmacological ML MM MH