December Identification of a short, single site matriglycan that maintains neuromuscular function in the mouse Tiandi Yang 0 1 2 Ishita Chandel 0 1 2 Miguel Gonzales 0 1 2 Hidehiko Okuma 0 1 2 Sally J. Prouty 0 1 2 Sanam Zarei 0 1 2 Soumya Joseph 0 1 2 Keith W. Garringer 0 1 2 Saul Ocampo Landa 0 1 2 Takahiro 0 1 2 Yonekawa 0 1 2 Ameya S. Walimbe 0 1 2 David P. Venzke 0 1 2 Mary E. Anderson 0 1 2 Jeffery M. Hord 0 1 2 Kevin P. Campbell 0 1 2 Iowa City 0 1 2 Department of Immunology, Harvard Medical School , Boston, MA 02115 , USA Department of Neurology, University of Iowa Roy J. and Lucille A. Carver College of Medicine , USA Howard Hughes Medical Institute, Department of Molecular Physiology and Biophysics 2023 21 2023 -

*Corresponding author

Matriglycan (-1,3-β-glucuronic acid-1,3-α-xylose-) is a polysaccharide that is synthesized on α-dystroglycan, where it functions as a high-affinity glycan receptor for extracellular proteins, such as laminin, perlecan and agrin, thus anchoring the plasma membrane to the extracellular matrix. This biological activity is closely associated with the size of matriglycan. Using highresolution mass spectrometry and site-specific mutant mice, we show for the first time that matriglycan on the T317/T319 and T379 sites of α-dystroglycan are not identical. T379-linked matriglycan is shorter than the previously characterized T317/T319-linked matriglycan, although it maintains its laminin binding capacity. Transgenic mice with only the shorter T379-linked matriglycan exhibited mild embryonic lethality, but those that survived were healthy. The shorter T379-linked matriglycan exists in multiple tissues and maintains neuromuscular function in adult mice. In addition, the genetic transfer of α-dystroglycan carrying just the short matriglycan restored grip strength and protected skeletal muscle from eccentric contraction-induced damage in muscle-specific dystroglycan knock-out mice. Due to the effects that matriglycan imparts on the extracellular proteome and its ability to modulate cell-matrix interactions, our work suggests that differential regulation of matriglycan length in various tissues optimizes the extracellular environment for unique cell types.

INTRODUCTION

Glycosylation is carbohydrate modification found on biological macromolecules including nucleotides (1), lipids (2, 3) and especially proteins (4, 5). Protein glycosylation has been the target of extensive research efforts because glycan modifications not only modulate the functions of the proteins but also act as receptors/ligands for their lectin counterparts. The glycan-lectin interactions are associated with numerous biological processes, such as quality control of protein folding (6), cellular growth and migration (7), host-pathogen/microbiome interaction (8), transmembrane signaling (9), and more. One of the best functionally characterized glycan receptors is a unique matriglycan found on dystroglycan (DG). DG is encoded by the DAG1 gene (10) which is widely expressed in different tissues and is conserved among various organisms. DAG1 is transcribed into a single mRNA, but the translated DG polypeptide post-translationally cleaves into DG N-terminus (DG-Nt) (11), α- and β-DG (10). The α subunit has a heavily O-glycosylated mucin-like domain that harbors the matriglycan, and a C-terminal domain (DG-Ct) that forms a complex with the transmembrane β-DG. Matriglycan serves as the carbohydrate receptor for extracellular matrix (ECM) proteins containing laminin G-like (LG) domain (12, 13), such as laminin (10, 14), agrin (15), and perlecan (12); while αand β-DG are sub-units of a dystrophin-glycoprotein complex (DGC) (16) that anchors to cytoskeletal actin fibers (Fig. 1a). Therefore, matriglycan plays a central role of bridging the ECM to the plasma membrane, thus mediating cell-matrix interactions in different tissues including skeletal muscle and the central nervous system.

Matriglycan is a polymer of [1,3-β-glucuronic acid (GlcA)-1,3-α-xylose (Xyl)-] repeating disaccharide (17) and its biological activities almost solely depend on its length: longer matriglycans exhibit drastically higher affinity to laminin than shorter ones (18). Matriglycan is synthesized on a β-GlcA-1,4-β-Xyl-1,2-ribitol-1-phosphate (RboP)-5-RboP adaptor (19–24) (Fig. 1b) by the bi-functional glycosyltransferases LARGE1 or LARGE2 (17, 25–27). The adaptor is attached to phosphorylated coreM3 (28–32) [coreM3(P)] forming the matriglycan precursor. Multiple enzymes are required for proper matriglycosylation, and an interruption of the process leads to a group of disorders in muscle and the central nervous systems, which are called αdystroglycanopathies (33). α-dystroglycanopathy can range in phenotypic expression from Limbseverity hinges on the ability of matriglycan to bind ECM ligands which in turn depends on the expression and length of matriglycan. A complete absence of matriglycan leads to mammalian embryonic lethality. Normally, dystroglycanopathy patients carry matriglycan, but they are drastically reduced and insufficient to maintain normal cell-matrix interactions (34, 35)

The mucin-like domain of α-DG that harbors the matriglycan contains roughly 50 potential O-glycosylation sites, but the coreM3(P) was merely found on three threonine residues (T317, T319, and T379), and matriglycan was only found on a T317/T319 motif before (23, 24, 32, 36, 37). It was reported that T379 can receive LARGE-mediated matriglycosylation under the LARGE over-expression condition (38). In the present work, we established an ultra-high resolution and sensitivity mass spectrometric method mapping matriglycosylation sites of α-DG. We confirmed T379 as a secondary matriglycosylation site harboring a short matriglycan without LARGE over-expression and existing in multiple organs. Although the short matriglycan led to embryonic lethality in mice, it appeared to be completely functional in adult skeletal muscle. Simultaneous mutations of the two matriglycosylation sites led to muscular pathology, while neuromuscular junction (NMJ) integrity required modification at both matriglycan sites. In addition, cellular proteomics indicated that matriglycan can modulate the ECM proteome possibly by altering its size or switching matriglycan sites.

RESULTS Matriglycosylation site(s) on α-DG

Our research started with the question of whether there are more than one matriglycan strongly suggests the T317A/T319A mutation fails to eliminate laminin binding (Fig. s1a). Because DG-Fc showed significantly lower electrophoretic migration, we performed laminin overlay guided in-gel digestion and mass spectrometry (MS) based proteomics to verify DG-Fc as the main protein in the 250 KDa bands (Table s1 and 2).

These data indicated additional matriglycosylation site(s) without LARGE overexpression and drove us to establish a nano-flow liquid chromatography (nanoLC) MS method to map matriglycosylation sites. Matriglycosylated peptides are extraordinarily challenging targets for electrospray ionization (ESI) MS, and to date, there are no published ESI-MS data identifying which mimics the extracellular region of human pro-dystroglycan with R311A/R312A mutations (Fig. 1b, amino acid 28-749, DG-Nt attached). O-glycomics showed the protein carried multiple mucin-type O-glycans and O-mannose glycans, indicating proper glycosylation and secretion by our cell model (Fig. s1b). We then used a glycosidase mixture (β-hexosaminidase, β-1,4galactosidase, fucosidase, sialidase, and O-glycosidase, see Materials and Methods) to reduce the sample complexity caused by numerous glycoforms. Previous analytical efforts normally introduced a lysine residue for Lys-C digestion so that truncated matriglycopeptides can be enriched by ion exchange chromatography and characterized by matrix-laser desorption/ionization time-of-flight (MALDI-TOF) MS (23). A similar strategy was used to collect high-resolution ESI-Orbitrap data on truncated DG glycopeptides carrying the precursor of matriglycan (24). Our approach keeps longer peptide backbones to generate comprehensive information on O-mannosylation and to facilitate electrospray ionization for ESI-Orbitrap to achieve ultra-high resolution and sensitivity. The addition of RboP and GlcA-Xyl residues was shown to delay glycopeptides LC elution (24). Therefore, we eluted the LC column with a more concentrated acetonitrile solution to compensate for the possible ion-paring effect.

Our optimized MS strategy detected the canonical T317/T319 motif containing glycopeptide with an amino acid sequence of VAAQIHATPTPVTAIGPPTTAIQEPPSR (note the R to A mutations in Fig. s1c-f). This glycopeptide carries the matriglycan precursor, five mannose, and one hexNAc, giving rise to an MS signal at m/z 1305.554+. We were able to get a glycopeptide with a nearly native sequence analyzed, and our data are consistent with previous research identifying T317/T319 as a matriglycan motif. We were also able to detect matriglycan on the secondary T379/T381 site in a glycopeptide with a sequence of GAIIQTPTLGPIQPTR. This is supported by two MS signals at m/z 1124.463+ and 1227.153+ which correspond to the glycopeptides carrying the two mannoses plus the matriglycan precursor and one matriglycan disaccharide elongation, respectively (Fig. s1g-m). This is the first time that a matriglycosylated peptide has been successfully analyzed by nanoLC-ESI-MS, and our data indicate all three previously mapped coreM3 can be elongated into matriglycans under normal physiological conditions.

Yang et al.

Confirmation of matriglycosylation on T317/T319 and T379/T381 two motifs

As crystallographic research has revealed that matriglycans with two disaccharide repeats strongly bind to the laminin LG4 domain (13), we next sought to better characterize matriglycosylated peptides for both motifs. The excessive peptide and glycopeptide MS signals detected in the DAG1749 sample could have suppressed MS signals of matriglycosylated peptides, so we expressed a shorter His-tagged DG390 protein with HEK293F cells. DG390 is designed based on the rabbit DAG1 sequence (see Fig 1b. native rbt DG), covering both matriglycan motifs identified on DAG1749. DG390 was also subjected to O-glycomic analysis, and the detected mucin-type and O-mannose O-glycans proved this much shorter construct was heavily glycosylated (Fig. s2a and b). In addition, MALDI-TOF also detected sodiated permethylated (GlcA-Xyl)n-Rbo glycan released by mild hydrofluoric acid (HF) hydrolysis from DG390 (Fig. s2c). The MS detected matriglycan repeats up to two times, and the MS peak intensities of the matriglycan decreased as the matriglycan got longer. The longer matriglycans are expected to show even lower intensities and might not be detected because of a reduced sensitivity when extensive chemical reactions are implemented before MS analysis. Subsequent MALDITOF/TOF analysis of the glycans confirmed their sharing of a linear architecture (Fig. 2d-f).

The DG390 was enzymatically de-glycosylated (Fig. s2b), tryptically digested, and subjected to nanoLC-ESI-MS-based glycoproteomics. A representative MS peak at m/z 1415.044+ (Fig. 2b) corresponds to a T317/T319 glycopeptide with a sequence of QIHATPTPVTAIGPPTTAIQEPPSR carrying two matriglycan repeats on the precursor, five hexoses, one hexNAc, and an extra phosphate or sulfate (Fig. 2a and Fig. s3). As DG390 does not bear any mutation, the peptide backbone is slightly shorter than its counterpart found in DAG1749. The glycan composition indicates all serine and threonine residues are modifiable by

Yang et al.

O-mannoses, and one mannose in the T317/T319 motif is further synthesized to mature matriglycans (Fig. 2a). Further MS/MS analysis confirmed our structural annotation (Fig. 2c). First, we found fragment ions at m/z 632.14 , 940.21+ and 1248.28+ showing the matriglycan + assembled on the RboP-containing adaptor. Secondly, the molecular ion of the peptide backbone was detected at m/z 1282.182+, and two peptide fragment ions were observed at m/z 456.26+ (y4) and 647.352+ (y12). An addition of mannose-phosphate (242.02 Da) shifts the peptide molecular ion from 1282.182+ to 1403.192+, to which more mannose is added, forming glycopeptide ions with m/z of 1484.212+, 1565.252+, 1646.272+, and 1727.292+. This proves at least five out of six O-glycosylation sites in the peptides are O-mannosylated. No fragment ion corresponds to a HexNAc addition to the peptide backbone, indicating all six O-glycosylation sites are Omannosylated.

The T379/T381 motif is in the GAIIQTPTLGPIQPTR glycopeptide (also see Fig. s4), which is the same as in DAG1749. In addition to the MS signals at m/z 1124.463+ and 1227.153+, we were able to find a representative signal at m/z 1356.503+ (Fig. 2e) corresponding to the peptide with two matriglycan repeats on the precursor with two hexoses, one HexNAc and an extra phosphate or sulphate (Fig. 2d). The glycan composition again suggests that all three Oglycosylation sites are modified by O-mannoses and one of them was biosynthesized into the + + matriglycan. MS/MS analysis (Fig. 2f) detected matriglycan fragments at m/z 632.14 , 940.21 , and 1556.36 , and the molecular ion of the peptide backbone at m/z 831.982+. The peptide is + + + + partially sequenced by fragment ions at m/z 242.15 (b3), 355.23 (b4), 483.29 (b5), and 584.34+ (b6), and y ions at m/z 1079.62+ (y10) and 711.412+ (y13). A combination of the presence of fragment ions at m/z 952.992+ (addition of a ManP), 1034.022+ (addition of man,

Yang et al.

HexP), and 1115.042+ (addition of 2Man, ManP) and absence of 933.522+ ion further confirms all three potential O-glycosylation sites are O-mannosylated.

To verify our MS observation that there are two matriglycan motifs among the roughly 50 O-glycosylation sites, we infected DG KO HEK293 cells with adenoviral vectors encoding WT DG and DG-bearing T-to-A mutations in the MS-characterized matriglycan motifs (Fig. 2g). When both motifs get mutated (4X mut.), no matriglycan was detected by both immunoblots and laminin overlay despite an over-expression of dystroglycan backbone. This confirms there are only two matriglycan motifs on α-DG. α-DG with T317/T319 matriglycan exhibited lower electrophoretic migration in polyacrylamide gels than with T379/T381 matriglycan, indicating the former matriglycan is considerably longer than the latter when the protein is expressed in

HEK293F cells.

T379 carries a matriglycan repeating for at least nine disaccharide units

Previous research has revealed both T317 and T319 can carry matriglycan (37). To pinpoint the exact site for matriglycosylation in the T379/T381 motif, we performed electrontransfer dissociation (ETD) on the T379/T381 glycopeptides (Fig. 3a-c). We found a series of C7 + + + ions at m/z 1346.60 , 1560.62 , and 1774.63 , proving the T379 residue carries a coreM3(P) that can be synthesized towards matriglycan. These MS spectra determine T379 as a major carrier of the matriglycan in the motif.

We next measured the length of matriglycan on the T379/T381 motif. Note that we deliberately attached a His-tag to DG390 when designing it, and this plays a critical role in the nanoLC-MS analysis of longer matriglycans. The poly-histidine tag has two functions: first, it normalizes the nanoLC retention of matriglycosylated peptides (Fig. s5c and d) and second, it greatly facilitates positive ESI of matriglycosylated peptides (Fig. s5e and f). We found the

Yang et al.

T379/T381 motif carrying matriglycans up to nine GlcA-Xyl repeats (Fig. 3d-e) and the biggest observed matriglycosylated peptide had a molecular weight of around 8 KDa. Structural annotations of the dominant MS signals were verified by MS/MS analysis (Fig. s5g-j), but some MS intensities for longer matriglycan get too low to analyze. strongly suggested MS intensities of matriglycosylated peptides drop rapidly as the matriglycan length increases (Fig. s5k). This phenomenon is also observed with our matriglycan in vitro biosynthesis assay (Fig. 3f). The synthesized matriglycan can only get up to 10 repeats after 48 hours of incubation with excessive LARGE1, and such length is comparable with our DG390 terminated matriglycans are considerably more abundant than xylose-terminated ones. In addition, MS intensities of the matriglycans decrease as they get longer just like the T379 matriglycan synthesized in HEK293 cells.

Collectively, the data suggest that matriglycan synthesized on T379 in HEK293 cells is roughly nine to ten units long; and a longer matriglycan corresponds to the lower amount. Matriglycan synthesis could be rate-determined by the xylosyltransferase activity. Previous MALDI-TOF analysis indicated that T317/T319 matriglycan repeats 17 times (23), which is considerably longer than the T379 matriglycan that is characterized in this research. This means the two matriglycan motifs are not necessarily functionally identical, as the matriglycan length is important for its receptor binding affinity.

Termination of matriglycan elongation can be achieved by sulphate and hexose capping

MS signals for both T317/T319 and T379/T381 glycopeptides imply an extra modification of roughly 79.96-79.97 Da mass (Fig. 2 a-f), which matches the mass of a sulphate or phosphate group. This group can be found on all dominant polymeric matriglycans on the T379/T381 motif (Fig. 3d and e), although uncapped polymeric matriglycans were also observed at considerably lower intensities. We luckily found a single MS/MS spectrum showing the 79.97 Da modification directly linked to the matriglycan fragments (Fig. 3g and h). The extremely labile modification was kept on glycans most likely due to the long peptide backbone dispersing the fragmentation energy during the MS/MS processes. Our MS analysis cannot directly determine if the group is a sulphate or phosphate, but we attributed the extra mass to a sulphate cap because of two reasons. First, previous research identified HNK-1 as a sulfotransferase that caps the matriglycan (38, 39). Secondly, MS/MS analysis of matriglycosylated peptides with the extra 79.97 Da did not detect fragment ions corresponding to glycopeptides carrying two mannose-phosphate (Fig. s5i) that were easily detected in non-matriglycosylated peptides (Fig. s5n-o). In addition to the presumable sulphate caps, we found a small fraction of the matriglycan precursors are capped by a hexose residue. We are not able to determine if the hexose modification is a consequence of enzymatic catalysis or chemical ligation. However, we only found the matriglycan precursor capped by the hexose, indicating that the hexose cap can compete with LARGE to prevent the matriglycan biosynthesis from the very beginning if the process is enzymatically catalyzed.

SLC35A1 serves as a RboP transporter

The characterization of the T379/T381 matriglycosylation provided us with a platform to explore the functions of all genes involved in matriglycan biosynthesis. Since the T379/T381 matriglycopeptides exist in far simpler glycoforms than the T317/T319 glycopeptides, we can use a series of representative MS signals at m/z 879.413+, 950.763+, 1022.103+, 1124.463+, and 1227.153+ (their extracted ion chromatograms in Fig. s6a) to monitor glycan structures along the matriglycan biosynthesis in cell models. Therefore, we applied the MS strategy to characterize two novel pathological genes, SLC35A1 encoding a CMP-sialic acid transporter, and POMGnT1 encoding an O-mannose β-1,2-GlcNAc transferase. Both activities are not directly associated with the matriglycosylation pathway, but their mutations have previously been reported to cause human muscular disorders resembling dystroglycanopathies (40–42).

We first tried to knockout (KO) SLC35A1 in our HEK293F bioreactor (Fig. s6b) with a CRISPR/Cas9 system. Although sialylation is significantly reduced, we were still able to glycomically observe sialylated N-glycans. Further analysis indicated nearly all sialylation is absent on the secreted DG390 (Fig. s6c), whereas a highly intense MS signal is present for the glycopeptide carrying the precursor of matriglycan (Fig. s6d). To eliminate the possible effects of residual activity of SLC35A1 and/or other similar-function transporters in the HEK cells, we infected HAP1 SLC35A1 KO cells (Fig. s6e) with an adenoviral vector expressing DG390 and were able to detect the matriglycosylation pathway halted before RboP (FKTN activity) by MS (Fig. 4a), as the relative intensity of one and two RboP containing glycopeptides reduced by about 70% and 97%, respectively. Only a trace amount of matriglycan precursor was observed and matriglycan was nearly undetectable (Fig. s6f and g). This is consistent with immuno- and laminin blots, which also support a sharp downregulation of matriglycan (Fig. 4b). The SLC35A1 and ISPD (encoding a CDP-ribitol synthase) gene transfer can rescue the phenotype independently (Fig. 4c), indicating SLC35A1 could be involved in CDP-Rbo transport. These observations provide MS evidence for recently published research showing SLC35A1 is involved in the translocation of RboP (43).

We next infected HAP1 POMGnT1 KO cells with the same viral vector and detected all the representative ions of matriglycosylation. The MS signal at m/z 1124.453+ corresponding to the T379 glycopeptides carrying the precursor of matriglycosylation is very strong. The MS signal at m/z 1227.153+ characteristic of matriglycosylation is low (Fig. s6g and h), but it unambiguously proves the existence of matriglycan. This is partially contrary to previously reported immunoblotting results (44), where no or very little matriglycan signal was observed. Both matriglycans exist and are functional in vivo

After the comprehensive MS analysis of matriglycosylation based on cell models, we set out to create transgenic mice carrying T315A/T317A and T377A/T379A mutations for further interrogating the MS-determined matriglycan motifs in vivo. Note that murine DG is two amino acids shorter within its sequence signal than the human and rabbit versions (Fig. 1b), so the amino acid number is different. Despite both DG T315A/T317A and T377A/T379A mice are viable, the former mice suffer mild embryonic lethality: of 220 liveborn mice only 22 were T315A/T317A homozygous, giving a birth rate of 10% that fails the χ2 Goodness of Fit Test (Table s3 and 4); while the rate was 22% for T377A/T379A homogeneous mice (53 out of 239). Considering Largemyd/Largemyd (myd) mice also show a reduced birth rate, our observation indicates the long T315/T317 matriglycan is important for mice reproduction and/or embryonic development. We were unable to generate live mice carrying both T315A/T317A and T377A/T379A mutations (4X mut.). This is not surprising because the 4X mutation in theory eliminates functions of both LARGE1 and 2, and this observation supports there are only two matriglycan motifs in vivo.

We first tested for matriglycans' presence and function in skeletal muscle, in our various mouse models. Being consistent with the cell models, immunoblotting confirmed matriglycosylation in skeletal muscle of both T315A/T317A and T377A/T379A mice (Fig. 5c), and the former motif carries a longer matriglycan (Fig. 5a). Immunofluorescence and hematoxylin and eosin (H&E) staining further supported that matriglycans exist in both mice models. Skeletal muscle from both mouse models shows normal morphology (Fig. 5b) but losing one matriglycan site seems to slightly reduce fiber size (Fig. 5c). When compared to WT controls, extensor digitorum longus (EDL) from both mouse models shows similar absolute and specific tentative forces (Fig. 5d and f) and similar susceptibility to injuries induced by eccentric contractions (ECCs, Fig. 5g), which indicates a single matriglycan, even relatively short, is sufficient to maintain healthy muscle in adult mice. The only observed difference is slightly altered cross-sectional areas (CSA) in the T315A/T317A EDL muscle (Fig. 5e).

To further compare the two matriglycosylation motifs, we rescued muscle-specific DAG1 KO mice with gene transfer mediated by adeno-associated viruses (AAV) encoding T315A/T317A and T377A/T379A DG. Although the myf5-Cre system cannot eliminate dystroglycan in muscle stem cells, most matriglycan signals detected by immunoblotting are showed a mildly higher absolute tetanic force (F0) and specific force (sF0) than AAV encoding injuries. These data indicate longer matriglycan is superior in terms of rescuing dystroglycanopathy-related muscle pathology (also check Fig. s7a and b).

We then characterized the two matriglycan motifs in the brain, the kidney, and the heart (Fig. 5l). Matriglycan length varies in different tissues. For example, matriglycan in the heart and skeletal muscle is a lot longer than that in the brain. However, T315/T317 matriglycan is consistently longer than T377/T379 matriglycan. In addition, the length of the two matriglycans is individually regulated in different tissues. In skeletal muscle and brain, the T315/T317 matriglycan is far longer than the T377/T379, whereas these two matriglycans are similar in size when synthesized in the kidney, suggesting that there are tissue-specific factors that regulate site specific matriglycan length.

Skeletal muscle pathophysiology emerges when both matriglycan motifs are mutated

Next, we wanted to know what happens when both matriglycan motifs are mutated. Since direct breeding of heterozygous mice did not generate any 4X mutant mice, we created heterozygous mice whose one allele encodes a floxed WT DAG1 and another allele encodes a DAG1 4X mutant (Fig. 6a). Immunoblotting shows that there is no matriglycan present in skeletal muscle when one copy of WT DAG1 is removed by breeding the heterozygous 4x that there are only two matriglycan sites on DG and enables us to pinpoint the neuromuscular functions of matriglycan in the mouse models. The skeletal muscles of 4X mutant mice showed fibers with central nuclei, adipose tissue infiltration, and fibrosis (Fig. 6c), which resembles the muscular phenotypes of the myd mice (45). AAV encoding 4X mutant DG lost its ability to rescue grip strength (Fig. 6d). The 4X mutant mice consistently show lower forelimb grip strength and F0 than WT mice (Fig. 6e-h, j-l). They are also much more susceptible to ECCinduced injuries (Fig. 6i and m). In addition, NMJs in tibialis anterior (TA), EDL, and soleus (SOL) muscles from Pax7-Cre 4X mutant mice are more fragmented than WT mice and mice with a single matriglycan (Fig. 6n and o). Interestingly, NMJs from mice possessing only one matriglycan show a low degree of irregularity, which indicates the existence of two matriglycans is critical for NMJ integrity. These data once again highlight the function of matriglycan as an ECM receptor and strongly support that there are at least two matriglycan motifs in skeletal muscle in vivo.

Yang et al.

ECM proteome is modulated by the length of matriglycan

We initially expected the mice with only the T377/T379 matriglycan to show severe pathologic phenotypes as the matriglycan is short compared to the T315/T317 matriglycan. In the brain, the matriglycosylated T315A/T317A α-DG smear shows less apparent molecular not only are the transgenic DG T315A/T317A mice generally healthy, but even the AAV speculated that other ECM proteins might compensate for the effect of the reduced matriglycosylation. We sought to tackle the question with label-free, quantitative shotgun cells to confirm the reproducibility of our proteomic workflow and to make sure we can identify LARGE1 over-expression (Fig. s8a and Table s6).

Considering one of the major secretors of ECM proteins is the fibroblast cell, and matriglycans are differentially downregulated in human patients suffering different types of dystroglycanopathies, we decided to compare the proteome of cultured primary fibroblast cells from a WWS patient carrying an ISPD mutation and LGMD patients carrying FKRP mutations. The ISPD mutation can severely truncate the matriglycan, so the WWS dystroglycan was selected to resemble the glycan found in the DG T315A/T317A mice. Similarly, the downregulation of matriglycan caused by the FKRP was selected to resemble the DG T377A/T379A mutation. We performed the same workflow on fibroblast cells from three different WWS patients and three different LGMD patients and quantified 3582 (Fig. s8b and c, and Fig. 7a) out of 4235 detected proteins at 1% cut-off false discovery rate (FDR) of both proteins and peptides. Principle component analysis (PCA) accurately separated WWS from

Yang et al.

LGMD samples (Fig. 7b), while unsupervised hierarchical clustering analysis (HCA) robustly segregated their proteomes (Fig. s8d). Although certain collagens are mildly upregulated, most detected fibroblast biomarkers including many collagens, integrins, and vimentin were not that differently expressed (Fig. s8e), so they should not be used for separating the two groups. On the contrary, WWS cells are characterized by ECM and membrane proteins such as THBS2, COL18A1, FBN2, and HLA-A, while LGMD cells are better characterized by cytosolic and nucleus proteins such as CUL7, RIF1, and KMT2A (Fig. 7c).

Further analysis screened out 25 differentially expressed protein groups (Fig. 7d). While the proteins with higher expression in LGMD cells did not show any apparent features, half of NAGLU, and ADAMTSL1). The 25 altered proteins were further manually scrutinized against the single-cell RNA sequencing database on the Human Protein Atlas website (https://www.proteinatlas.org/) to pinpoint proteins mainly expressed by fibroblast cells in skeletal muscle (Fig. 7e). The proteins upregulated in LGMD cells were mostly filtered out and six out of seven remaining targets, which are mostly upregulated in WWS cells, belong to the ECM proteome, and the left one is an enzyme related to ECM degradation (46, 47). Meanwhile, HCA also leads to the clustering of the WWS samples characterized by upregulated ECM region proteins (Fig. s8f). Collectively, our data indicate the ECM proteome is directly modulated by the matriglycan and shorter matriglycan tends to upregulate ECM proteins. The modulation can be achieved by directly altering the secretion protein profile of fibroblasts.

Yang et al.

molecular weight glycoconjugates

The nanoLC-ESI-MS technique has become the most important tool for protein PTM investigations due to its high throughput, ultrahigh sensitivity, and wide compatibility with all phosphate and sulphate, are inherently challenging for existing glycoproteomic methods because both their high molecular weight carbohydrate moieties and the modification groups can hinder ionization. This severely limits the application of MS in characterizing glycosylation processes including glycosaminoglycans, matriglycans, and sometimes sulphated mucin-type O-glycans. Here, we provide a nanoLC-ESI-MS-based solution for analyzing matriglycosylated peptides. Instead of inserting additional protease sites to digest big glycopeptides into shorter, more manageable, parts, we selected to maintain the peptide backbone and added a hexa-histidine tag to aid the ionization. The resulting His-tagged peptides get ionized into ions with multiple charges, comfortably fitting the mass range of modern MS instruments. More importantly, their big sizes stabilize fragile groups and keep protein modifications intact when collisional energy is imposed during MS/MS experiments. This is especially beneficial to characterize sulphated and phosphorylated carbohydrates. In addition, even when we used an LC elution program with more concentrated acetonitrile, the sulphate modification still delays matriglycopeptides to very late retention times, which happens to be compensated by adding His-tags to glycopeptides (Fig. s5c and d). All these experimental considerations and designs facilitated a routine detection of nearly intact matriglycans in a positive ion mode bottom-up shotgun glycoproteomic setup. Our MS strategy might contribute to further research on glycosaminoglycans and sulphated mucin-type O-glycans, which are attracting more and more biological research interests.

Elongation of matriglycans is potentially regulated by two mechanisms

Our MS methods and data provide new insights into matriglycan elongation, although being limited by the DG over-expression model. The biosynthesis of matriglycan might not be the most efficient, considering only a very small portion of matriglycan precursor eventually gets matriglycosylated (Fig.s6a). When the matriglycan reaches about 10 disaccharide units during its again for another round of elongation but it can also be capped by sulphate or a hexose group by other enzymes. At least in our cell models, LARGE xylosyltransferase activity tends to be lower than its GlcA transferase activity, it might be outcompeted by capping enzymes. As a result, the matriglycans tend to get capped and their elongation processes are terminated. More interestingly, capping reactions can happen on the adaptor of matriglycan (Fig. 3g-j), which might stop matriglycosylation at the very beginning. This partially explains why the amount of matriglycosylated peptides is so small. The sulphation process mediated by HNK-1 explained why matriglycans in brains are a lot shorter, but it seems that a similar mechanism exists in other tissues, such as the kidney. As such, our model provides a possible mechanism for the postnatal regulation of matriglycan elongation in different tissues.

LARGE1 activity may be increased by molecular recognition of DG-Nt and the phosphate on coreM3, thereby explaining the existence of very long matriglycans (11, 48, 49). If the observed 80 Da modification corresponds to a phosphate on an O-mannose (see Fig.2), it is likely the extra mannose-phosphate can further boost LARGE activity. We speculate the T317/T319 matriglycan represents the boosted LARGE1 activity whereas the T379/T381 considerably higher electrophoretic migration than the WT. This indicates both matriglycosylation sites can be simultaneously occupied, although the T-to-A mutation also affects mucin-type O-glycosylation. We have not elucidated the reason why two matriglycans are required, but our limited evidence indicates this is important for the integrity of NMJ. In addition, we noticed both matriglycan motifs from α-DG can be completely deglycosylated by the glycosidases that remove mucin-type O-glycans. This indicates α-DG exists in more than one glycoform carrying only the long, the short, or both matriglycans. Two different matriglycan motifs might provide another straightforward mechanism to combinatorially regulate the length of matriglycan tissue-specifically, during development and muscle regeneration.

Matriglycans modulates ECM proteome

We found that mice with only the short matriglycan suffer mild embryonic lethality. This reveals the critical roles of the elongation of matriglycan during mammalian embryonic development. However, our data very strongly indicate that if the mice survived the developmental stage, they grew into healthy adults. More surprisingly, it is evident that the T377/T379 motif carrying the shorter matriglycan is more capable of restoring grip strength. This means α-DG with only shorter matriglycan maintains cell-matrix interactions well at least in mice after development. The phenotypical comparison, especial for the fiber size (Fig.5c) and NMJ morphology (Fig.6o), between the DG T315A/T317A and T377A/T379A mice needs further validation because the current data are limited by a small number of mice available for experiments.

We tried to address these physiological observations by performing label-free proteomic quantification on fibroblast cells that are known to secrete ECM proteins directly related to bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Transmembrane anterior  available under aCC-BY 4.0 International license. posterior transformation  protein 1 homolog FLJ90013;TAPT1 2 2 2 11 11 11 52.104 0.0015444 2.3912 Syntaxin‐10 STX10 1 1 1 15.1 15.1 15.1 8.4503 0.0013598 2.4628 Codanin‐1 CDAN1 1 1 1 7.5 7.5 7.5 37.928 0 5.5072 SUZ domain‐containing protein  1 SZRD1 1 1 1 7.9 7.9 7.9 16.997 0.0035369 1.8728 Ral GTPase‐activating protein  subunit alpha‐2 RALGAPA2 3 3 3 2.5 2.5 2.5 210.77 0 4.7646 RNA polymerase II elongation  factor ELL MEN;ELL 4 4 4 15 15 15 62.123 0 9.9698 57468000 57415000 57352000 Putative phospholipase B‐like  2;Putative phospholipase B‐like  2 32 kDa form;Putative  phospholipase B‐like 2 45 kDa  form Sorting nexin‐15 Protein CASC4 RNMT‐activating mini protein DNA repair protein XRCC4 Sickle tail protein homolog E3 ubiquitin‐protein ligase  TRIM32

PLBD2 hCG_2044837;SNX15 CASC4 FAM103A1 XRCC4 KIAA1217 TRIM32

POLR1E Biotin‐‐protein ligase;Biotin‐‐ [methylmalonyl‐CoA‐ carboxytransferase]  ligase;Biotin‐‐[propionyl‐CoA‐ carboxylase [ATP‐hydrolyzing]]  ligase;Biotin‐‐[methylcrotonoyl‐ CoA‐carboxylase] ligase;Biotin‐‐ [acetyl‐CoA‐carboxylase] ligase HLCS

HEL‐176 Ubiquitin‐conjugating enzyme  E2 G1;Ubiquitin‐conjugating  enzyme E2 G1, N‐terminally  processed Kinetochore‐associated protein  1 Tetratricopeptide repeat  protein 19, mitochondrial Adenylyl cyclase‐associated  protein;Adenylyl cyclase‐ associated protein 2 52 kDa repressor of the  inhibitor of the protein kinase CSC1‐like protein 2 Membrane‐associated  phosphatidylinositol transfer  protein 1 Histone‐lysine N‐ methyltransferase 2B COX assembly mitochondrial  protein homolog Nuclear respiratory factor 1

UBE2G1 KNTC1 TTC19 PRKRIR TMEM63B PITPNM1 KMT2B CMC1 IKBKG NRF1

DKFZp686D0714;CAP2 Cytochrome c oxidase subunit 2 COX2;COII;cox2 Transforming acidic coiled‐coil‐ containing protein 2

TACC2

SCARB1 CNTN1 HSP90AB2P 3 12 2 4 2 1 3 1 54119000 SLC2A3;SLC2A14

RAVER2 Solute carrier family 2,  facilitated glucose transporter  member 3;Solute carrier family  2, facilitated glucose  transporter member 14 Ribonucleoprotein PTB‐binding  2 TraB domain‐containing protein TRABD Uncharacterized protein  C9orf40 Spastin Pre‐B‐cell leukemia  transcription factor‐interacting  protein 1 PBXIP1

C9orf40

SPAST Guanine nucleotide‐binding  protein subunit alpha‐ 11;Guanine nucleotide‐binding  protein subunit alpha‐14

GNA11;GNA14 Pre‐mRNA‐splicing factor SLU7 SLU7 Phosphatidylserine synthase 1 Regulatory factor X‐associated  protein

PTDSS1

RFXAP RING finger and SPRY domain‐ containing protein 1 Protein‐methionine sulfoxide  oxidase MICAL3 Transcription termination  factor 2

RSPRY1 MICAL3

TTF2 Heat shock 70 kDa protein 12A HSPA12A RNA‐binding protein 6 RBM6 MHC class II regulatory factor  RFX1 RFX1 NF‐kappa‐B inhibitor epsilon Interferon‐stimulated 20 kDa  exonuclease‐like 2 Paired amphipathic helix  protein Sin3b 4‐aminobutyrate  aminotransferase,  mitochondrial Formin‐binding protein 1 Zinc finger CCCH domain‐ containing protein 7A Semaphorin‐3C GPN‐loop GTPase 2 Elongator complex protein 4 Coiled‐coil domain‐containing  protein 85C CDK5 regulatory subunit‐ associated protein 1 22 4 11 1 1 4 2 3 0 0 4 2 1 5 0 1 4 7 5 6 1 6 3 5 1 0 6 7 4 7 12 1 4 5 21 6 1 6 3 17 14 5 7 Histone acetyltransferase p300 EP300 15 kDa selenoprotein Protein KHNYN INO80 complex subunit B E3 ubiquitin‐protein ligase  RNF13 Protein FAM134C Mismatch repair endonuclease  PMS2 PMS2 DNA polymerase subunit  gamma‐2, mitochondrial POLG2 Endoplasmic reticulum  aminopeptidase 2 Protein FAM122B Cyclin‐C Uncharacterized protein  C11orf84 Bone morphogenetic protein 2 BMP2

CENPB KRT1 LRRC16A EFR3B PIK3CB hCG_37731;RAD1 SNAP47 ZWINT PHRF1;KIAA1542

15‐Sep KIAA0323;KHNYN INO80B;INO80B‐WBP1 RNF13 FAM134C ERAP2 SPACIA2;FAM122B CCNC C11orf84 2 8 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made NKG2D ligand 2;Retinoic acid  available under aCC-BY 4.0 International license. early transcript 1L  protein;Retinoic acid early  RAET1L;RAET1G;RAET1H transcript 1G protein ;ULBP2 1 1 1 12.6 12.6 12.6 9.9203 0.0017308 2.3475 WD repeat‐containing protein  41 WDR41 2 2 2 8.3 8.3 8.3 51.727 0 12.105 Major centromere autoantigen  B Neuropilin‐1 E3 ubiquitin‐protein ligase  TRIM11 Uncharacterized protein  C17orf62 Syntaxin‐2 Centrosomal protein of 135  kDa

NRP1;DKFZp781F1414 TRIM11 C17orf62 STX2

CEP135 Cx9C motif‐containing protein 4 CMC4 FERM, RhoGEF and pleckstrin  domain‐containing protein 2 Metalloproteinase inhibitor 2 Diphthamide biosynthesis  protein 2 Transcriptional adapter 3 Transmembrane protein 245 Cytochrome P450 2S1 Ankyrin repeat domain‐ containing protein 40 Inactive serine/threonine‐ protein kinase VRK3 Acyl‐coenzyme A thioesterase 8 ACOT8;PTE1 6 0 4 2 0 6 4 48149000 47976000 47898000 47846000 47685000 47457000 47324000 47302000 47200000 47180000 46622000 46596000 46490000 46096000 45965000 45320000 45317000 45313000 45207000 45188000 45122000 44788000 44781000 44328000 43997000 43978000 43966000 43947000 43916000 43866000 43859000 43684000 43402000 43382000 4.3396 0 0 0 0 42958000 42913000 G2/mitotic‐specific cyclin‐B2 Zinc finger protein 219 Bifunctional lysine‐specific  demethylase and histidyl‐ hydroxylase MINA

CCNB2V;CCNB2 ZNF219 MINA

QARS Transmembrane protein 194A Protein misato homolog 1

TMEM194A

MSTO1 Activating molecule in BECN1‐ regulated autophagy protein 1 AMBRA1 Actin‐binding LIM protein 1 ABLIM1 Glucoside xylosyltransferase 1

GXYLT1 Myotubularin‐related protein 6 MTMR6 Neuromodulin GAP43 N‐alpha‐acetyltransferase 16,  NatA auxiliary subunit Vacuolar protein sorting‐ associated protein 33A Homeobox protein Hox‐D11 MAX gene‐associated protein Endothelin‐converting enzyme  1 Calcium signal‐modulating  cyclophilin ligand Hippocalcin‐like protein 1 Putative coiled‐coil domain‐ containing protein 144C;Coiled‐ coil domain‐containing protein  144A

NAA16;NARG1L VPS33A HOXD11 MGA ECE1 CAMLG HPCAL1

CCDC144A;CCDC144CP Guanine nucleotide‐binding  protein subunit  gamma;Guanine nucleotide‐ binding protein G(I)/G(S)/G(O)  hCG_1992840;hCG_1994 subunit gamma‐10 888;GNG10 PDZ and LIM domain protein 7 PDLIM7 Cytoplasmic dynein 2 light  intermediate chain 1 Uncharacterized protein  C6orf47 C6orf47 Single‐stranded DNA‐binding  protein 2;Single‐stranded DNA‐ binding protein 4;Single‐ stranded DNA‐binding protein  3 Ubiquitin‐protein ligase E3A PAS domain‐containing  serine/threonine‐protein  kinase PASK Dynamin‐1 DNM1

DYNC2LI1 General transcription factor IIH  subunit 2‐like protein;General  transcription factor IIH subunit  2 Beta‐1‐syntrophin SH3 and PX domain‐containing  protein 2B Mitochondria‐eating protein

SPATA18 GTF2H2C;GTF2H2 SNTB1 SH3PXD2B SSBP4;SSBP2;SSBP3 UBE3A 3 23 11 2 22 6 15 2 3 5 5 39712000 39695000 39665000 40115000 40007000 CAMK1D

GATA6 Calcium/calmodulin‐dependent  protein kinase type 1D Transcription factor GATA‐6 KDEL motif‐containing protein  1 PHD finger protein 21A Sentrin‐specific protease 8

KDELC1 PHF21A SENP8

KDM6A;UTX;DKFZp451J0 Lysine‐specific demethylase 6A 23

CAST Coiled‐coil domain‐containing  protein 50 Cyclin‐T2 AT‐rich interactive domain‐ containing protein 3B Histone‐lysine N‐ methyltransferase 2A;MLL  cleavage product N320;MLL  cleavage product C180 Centrosomal protein of 131  kDa Trophoblast glycoprotein Transcription elongation factor  A protein‐like 3;Transcription  elongation factor A protein‐like  6;Transcription elongation  factor A protein‐like 5 Signal transducer and activator  of transcription 2;Signal  transducer and activator of  transcription Baculoviral IAP repeat‐ containing protein 5 Mediator of RNA polymerase II  transcription subunit 12 Serine/threonine‐protein  kinase ULK3 OTU domain‐containing protein  7B Non‐canonical poly(A) RNA  polymerase PAPD5 Meteorin‐like protein G‐protein coupled receptor  family C group 5 member C Iron‐sulfur protein NUBPL Integral membrane protein  2B;BRI2, membrane form;BRI2  intracellular domain;BRI2C,  soluble form;Bri23 peptide

CCDC50 CCNT2 ARID3B KMT2A CEP131;AZI1 TPBG TCEAL6;TCEAL3;TCEAL5 TNRC11;MED12 ULK3;hCG_40815 STAT2 BIRC5 OTUD7B PAPD5 METRNL GPRC5C NUBPL

ITM2B Matrix metalloproteinase‐15 Protein LAP2

MMP15

ERBB2IP;HEL‐S‐78 Receptor protein‐tyrosine  EPHB2;EPHB2 variant  kinase;Ephrin type‐B receptor 2 protein Prolyl 3‐hydroxylase 3 LEPREL2 Breast cancer metastasis‐ suppressor 1‐like protein BRMS1L Coiled‐coil‐helix‐coiled‐coil‐ helix domain‐containing  protein 10, mitochondrial CHCHD10 6 20 25 4 3 2 14 10 5 0 3 6 1 3 37294000 TGF‐beta‐activated kinase 1  and MAP3K7‐binding protein 2 TAB2 Mitogen‐activated protein  kinase kinase kinase  kinase;Mitogen‐activated  protein kinase kinase kinase  kinase 5 LIM and calponin homology  domains‐containing protein 1 Paraneoplastic antigen Ma2 Lysosomal acid phosphatase MLN64 N‐terminal domain  homolog PDZ domain‐containing protein  GIPC3 Oxysterol‐binding protein‐ related protein 6 Protein lin‐37 homolog

MAP4K5 LIMCH1 PNMA2 ACP2 STARD3NL GIPC3 OSBPL6

LIN37 Receptor‐binding cancer  antigen expressed on SiSo cells EBAG9;RCAS1 Tetratricopeptide repeat  protein 33 TTC33 NEDD8‐conjugating enzyme  UBE2F Tyrosine‐protein  kinase;Tyrosine‐protein kinase  Fer;Non‐specific protein‐ tyrosine kinase Zinc finger protein 148 3‐5 exoribonuclease 1 Alpha‐aminoadipic  semialdehyde synthase,  mitochondrial;Lysine  ketoglutarate  reductase;Saccharopine  dehydrogenase AASS Cyclin‐D1‐binding protein 1 CCNDBP1 Protein ITFG3 ITFG3 FUN14 domain‐containing  protein 1 FUNDC1

UBE2F;UBE2F‐SCLY Pe1Fe3;FER;Pe1Fe6 ZNF148

ERI1;THEX1 Integrin beta;Integrin beta‐5 TBC1 domain family member  22A Protein kinase C zeta type Zinc finger protein 362 Centrosomal protein of 170  kDa protein B Solute carrier family 12  member 9 Transmembrane protein 177 Vesicle transport through  interaction with t‐SNAREs  homolog 1B CDKN2AIP N‐terminal‐like  protein Uncharacterized protein  C2orf47, mitochondrial Tight junction protein ZO‐1 Thymosin beta‐15B;Thymosin  beta‐15A

ITGB5 TBC1D22A PUM1 PRKCZ ZNF362 CEP170B SLC12A9;FLJ00059;FLJ00 009;FLJ00100;FLJ00010 MGC10993;TMEM177 VTI1B CDKN2AIPNL FLJ22555;C2orf47 GFRA1 SPG7 HOXA10 DKFZp686M05161;TJP1; DKFZp686A1195 TMSB15B;TMSB15A 1 1 3 2 2 1 2 3 2 1 2 2 4 3 1 1 1 1 2 2 1 6 2 2 3 1 2 1 1 1 1 2 1 2 7 2 4 2 9 3 2 2 2 5 6 4 0 1 4 2 2 2 4 3 2 9 0 1 3 4 0 5.2 37.713 6.9 88.234 3.2 42.414 0.0028027

0 0.0009821 2.4 142.57 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Ras‐related protein R‐Ras RRAS avai2lable unde1r aCC-BY14.0 Inte1r2n.8ational lic5e.5nse. 5.5 23.48 0.0005922 2.7411 34324000 Histone‐lysine N‐ methyltransferase 2D KMT2D 3 3 3 1 1 1 593.38 0 3.4239 34260000 ACSS2;DKFZp762G026 8.5 11.3 Multiple PDZ domain protein Methyltransferase‐like protein  13

MPDZ feat;METTL13 Ubiquitin‐associated domain‐ containing protein 2 Protein UXT Lysosomal amino acid  transporter 1 homolog Smith‐Magenis syndrome  chromosomal region candidate  gene 8 protein Hepatoma‐derived growth  factor‐related protein 3 ADP‐ribosylation factor‐like  protein 6 STAGA complex 65 subunit  gamma Coiled‐coil domain‐containing  protein 28A Homeobox protein Hox‐D13 Trafficking protein particle  complex subunit 10 Carboxyl‐terminal PDZ ligand of  neuronal nitric oxide synthase  protein NF‐X1‐type zinc finger protein  NFXL1 Major facilitator superfamily  domain‐containing protein 10 Sulfurtransferase Kelch‐like protein 26 BRCA1‐associated ATM  activator 1 Transforming growth factor‐ beta receptor‐associated  protein 1 C‐X‐C motif chemokine 16 Lon protease homolog 2,  peroxisomal Condensin‐2 complex subunit  G2 Ran‐binding protein 3 cAMP‐dependent protein  kinase catalytic subunit beta Hermansky‐Pudlak syndrome 6  protein RNA‐binding protein 47 Max‐like protein X Checkpoint protein HUS1 CDK2‐associated and cullin  domain‐containing protein 1 HAUS augmin‐like complex  subunit 3

PHGDHL1;UBAC2 UXT PQLC2 SMCR8 HDGFRP3 ARL6 SUPT7L CCDC28A HOXD13 TRAPPC10 TSC22D3 NOS1AP URCC5;NFXL1 MFSD10 KLHL26 BRAT1 TGFBRAP1 CXCL16 LONP2 NCAPG2 RANBP3 PRKACB HPS6 FLJ20273;RBM47 MLX HUS1 CACUL1 HAUS3 POGK SAV1

ZSCAN18;ZNF447 D‐2‐hydroxyglutarate  dehydrogenase, mitochondrial D2HGDH Acetyl‐coenzyme A  synthetase;Acetyl‐coenzyme A  synthetase, cytoplasmic 2 4 4 2 4 2 0 2 6 13 3 10 4 1 31009000 30861000 F‐BAR domain only protein 2 Ena/VASP‐like protein

FCHO2

EVL Receptor‐type tyrosine‐protein  phosphatase;Receptor‐type  tyrosine‐protein phosphatase  alpha Claudin;Claudin‐12 Transmembrane protein 68 Kinesin‐like protein;Kinesin‐like  protein KIF17

PTPRA CLDN12 TMEM68 SH3BP1

KIF17 B‐cell CLL/lymphoma 9 protein BCL9 Multidrug resistance‐ DKFZp781G125;ABCC1; associated protein 1 MRP Kinase suppressor of Ras 1 KSR1 Calmodulin‐regulated spectrin‐ associated protein 1 Filamin‐interacting protein  FAM101B Spindle assembly abnormal  protein 6 homolog Sorting nexin‐8 Pumilio domain‐containing  protein KIAA0020 Ribonuclease P protein subunit  p14

CAMSAP1 FAM101B SASS6 SNX8 KIAA0020

RPP14 Steroid hormone receptor ERR1 ESRRA Calpain‐7 CAPN7 Histone‐lysine N‐ methyltransferase SETDB1 Serine/arginine‐rich splicing  factor 4

SETDB1

SRSF4;SFRS4 Unhealthy ribosome biogenesis  protein 2 homolog Myb‐related protein B Homeobox protein PKNOX1 Dachshund homolog 1

URB2 MYBL2 PKNOX1

DACH1 RalBP1‐associated Eps domain‐ containing protein 2 MKL/myocardin‐like protein 1 E3 ubiquitin‐protein ligase Itchy  homolog Mucosa‐associated lymphoid  tissue lymphoma translocation  protein 1 E3 ubiquitin‐protein ligase  RNF123 Probable ATP‐dependent RNA  helicase DHX34 Stimulator of interferon genes  protein

REPS2;REPS1 SF1 MKL1;mkl1 ITCH MALT1 hCG_20123;RNF123 DHX34 TMEM173 5 9 7.4267 0.0024687 2.1873

27357000 3.9 128.12 1 1 2 1 30600000 30418000 30409000 30229000 30123000 30087000 30086000 30058000 29957000 29908000 29884000 29358000 29114000 29078000 28666000 28559000 28345000 28323000 28193000 28125000 28074000 27831000 27803000 27750000 27626000 27417000 14 2 0 11 6 19 2 16 2 2 3 2 4 0 5 4

EGF‐containing fibulin‐like  extracellular matrix protein 1 GMP reductase 1 Cellular tumor antigen p53 Activating signal cointegrator 1  complex subunit 1 Proline‐rich protein 11 Putative RNA polymerase II  subunit B1 CTD phosphatase  RPAP2 COMM domain‐containing  protein 4 Zinc finger protein 800 F‐box/LRR‐repeat protein 12 Overexpressed in colon  carcinoma 1 protein Leptin receptor gene‐related  protein

EFEMP1 GMPR TP53 ACTG1 ASCC1 PRR11 RPAP2 COMMD4 ZNF800 FBXL12 C12orf75;OCC1

LEPROT Dynein light chain 1, axonemal DNAL1 LIM domain‐containing protein  1 Cingulin Zinc finger protein 746

LIMD1 CGN

ZNF746 Beta‐galactosidase;Beta‐ galactosidase‐1‐like protein 2 TIMELESS‐interacting protein

TIPIN Phosphatidate  cytidylyltransferase;Phosphatid ate cytidylyltransferase 1

CDS1

LOC89944;GLB1L2 Costars family protein ABRACL ABRACL Kinesin‐like protein  KIF20B;Kinesin‐like protein Protein KRBA1 F‐box/LRR‐repeat protein 3 PDZ domain‐containing protein  8 PDZD8

KIF20B;MPHOSPH1 KRBA1;KIAA1862

FBXL3 Lysine‐specific demethylase 2A KDM2A Peregrin BRPF1 Septin‐10 SEPT10;DKFZp547B243 Cystathionine gamma‐lyase CTH tRNA‐dihydrouridine(47)  synthase [NAD(P)(+)];tRNA‐ dihydrouridine(47) synthase  [NAD(P)(+)]‐like DUS3L UPF0547 protein C16orf87 C16orf87 Pituitary homeobox 1 PITX1 DnaJ homolog subfamily C  member 1 DNAJC1 Lysine‐specific histone  demethylase 1B Fucose mutarotase CDK5 regulatory subunit‐ associated protein 2 Pikachurin

KDM1B FUOM CDK5RAP2;DKFZp686E22 42;DKFZp686D1070 EGFLAM

LSM14B NTF2‐related export protein 1 PAXIP1‐associated glutamate‐ rich protein 1

SUOX FOXO1 MCUR1 NXT1 PAGR1 2 2 6 10 0.0084884 1.5572 24135000 3 5 1 0 2 7 0 3 0 2 1 4 1 4 3 8 0 9 2 4 3 4 4 4 3 8 2 2 3 2 1 2 1 1

Required for meiotic nuclear  division protein 1 homolog

RMND1 TSC22 domain family protein 4 TSC22D4 Serine/threonine‐protein  kinase VRK2 VRK2 Short coiled‐coil protein SCOC Pro‐cathepsin H;Cathepsin H  mini chain;Cathepsin  H;Cathepsin H heavy  chain;Cathepsin H light chain Cell cycle checkpoint control  protein RAD9A Pleckstrin homology‐like  domain family B member 1 Cullin‐9 Ectodysplasin‐A receptor‐ associated adapter protein E3 ubiquitin‐protein ligase  RNF170 Thyroid adenoma‐associated  protein Heterogeneous nuclear  ribonucleoprotein C‐like  4;Heterogeneous nuclear  ribonucleoprotein C‐like  1;Heterogeneous nuclear  ribonucleoprotein C‐like 3 Protein phosphatase 1  regulatory subunit 11

CTSH RAD9A PHLDB1;KIAA0638 CUL9 EDARADD RNF170 GITA/3p fusion;GITA/7p  fusion;THADA HNRNPCL4;HNRNPCL1;H NRNPCL3;HNRPCL1 PPP1R11

PCBP2 E3 ubiquitin‐protein ligase XIAP XIAP RNA polymerase I‐specific  transcription initiation factor  RRN3

RRN3 Integrator complex subunit 6

INTS6 Nucleolysin TIA‐1 isoform p40

TIA1 Mediator of RNA polymerase II  transcription subunit 31 Serine/threonine‐protein  phosphatase 2A 55 kDa  regulatory subunit B delta  isoform;Serine/threonine‐ protein phosphatase 2A 55 kDa  regulatory subunit B Suppressor of fused homolog 20.581 22262000 22122000 0 0 0 0 0 0 0 0 0 0 0 0 2 4 2 4 2 3 0 6 4 5 6 3 2 2 1 0 5 4 1 6 5 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

FAM96A 1 1 1 6.9 6.9 6.9 18.355 0.0081848 1.6223 ODF2 2 2 2 4.6 4.6 4.6 62.761 0 5.3153 21601000 21556000 EH domain‐binding protein 1 Sphingosine‐1‐phosphate  phosphatase 1 TELO2‐interacting protein 2 Collagen alpha‐1(VI) chain Laminin subunit beta‐2 Protein Shroom2 MOB‐like protein phocein Non‐specific protein‐tyrosine  kinase;Tyrosine‐protein kinase  ABL1 Gamma‐tubulin complex  component 5 Interleukin‐1 receptor‐ associated kinase 1 6‐phosphofructo‐2‐ kinase/fructose‐2,6‐ bisphosphatase 2;6‐ phosphofructo‐2‐ kinase;Fructose‐2,6‐ bisphosphatase TBC1 domain family member  10A Small integral membrane  protein 1 Condensin‐2 complex subunit  H2 Multivesicular body subunit  12B Ribosome‐releasing factor 2,  mitochondrial Basic immunoglobulin‐like  variable motif‐containing  protein;DNA repair protein  complementing XP‐G cells C‐Jun‐amino‐terminal kinase‐ interacting protein 4 Disintegrin and  metalloproteinase domain‐ containing protein 10 Catechol O‐methyltransferase  domain‐containing protein 1 F‐BAR and double SH3 domains  protein 1 Nucleolar protein 8 Ras‐related protein Rab‐13 Serine/threonine‐protein  kinase 33 MAP kinase‐activating death  domain protein Spindle and centriole‐ associated protein 1 tRNA pseudouridine  synthase;tRNA pseudouridine  synthase‐like 1 Coiled‐coil domain‐containing  protein R3HCC1L Mitogen‐activated protein  kinase kinase kinase  kinase;Mitogen‐activated  protein kinase kinase kinase  kinase 3 Protein NRDE2 homolog

EHBP1 SGPP1 TTI2 COL6A1 LAMB2 SHROOM2;DKFZp781J07 4 PREI3;MOB4 ABL1;BCR/ABL fusion TUBGCP5 IRAK1 DKFZp781D2217;PFKFB2 TBC1D10A SMIM1 NCAPH2 MVB12B;C9orf28 GFM2 BIVM‐ERCC5;ERCC5‐ 201;BIVM;ERCC5 MAP4K3 NRDE2;C14orf102 SPAG9 ADAM10 COMTD1 FCHSD1 STK33 MADD SPICE1 PUSL1 NOL8;DKFZp686P12242 SNX7 hCG_1996054;hCG_2499 1;RAB13 2 14 2 20 5 20946000 20943000 2 4 4 10 0 0 23 12 2 3 2 0 0 3 0 4 4 2 1 0 4 4 0 2 1 2 2 3 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made cAMP‐responsive element  available under aCC-BY 4.0 International license. modulator CREM;CREB1 2 1 1 40 32.6 32.6 10.918 0 6.0939 Zinc finger ZZ‐type and EF‐hand  domain‐containing protein 1

ZZEF1 Retinoic acid receptor RXR‐beta RXRB SLAIN motif‐containing protein  1

SLAIN1 Synaptotagmin‐like protein 4

SYTL4 Pleckstrin homology domain‐ containing family A member 1 Cyclin‐dependent kinase 8 Dual specificity protein kinase  CLK1;Dual specificity protein  kinase CLK4 Selenocysteine lyase Uncharacterized protein  C18orf25 Protein tweety  homolog;Protein tweety  homolog 3 HAUS augmin‐like complex  subunit 2 Protein HEXIM2 Phytanoyl‐CoA dioxygenase  domain‐containing protein 1 Phosphoribosyltransferase  domain‐containing protein 1 Adenylyl cyclase‐associated  protein Stromelysin‐2 Uncharacterized protein  KIAA1522 DCC‐interacting protein 13‐ beta Protein cordon‐bleu Probable threonine‐‐tRNA  ligase 2, cytoplasmic Homeobox protein SIX4

PLEKHA1 CDK8 CLK1;CLK4 SCLY C18orf25 TTYH3 CEP27;HAUS2 HEXIM2 PHYHD1 PRTFDC1 MMP10 KIAA1522 APPL2 COBL TARSL2

SIX4 Peroxisomal biogenesis factor 3 PEX3 Sideroflexin‐2 SFXN2 TFIIH basal transcription factor  complex helicase XPB subunit Hamartin Protein Niban Zinc finger protein 609

ERCC3 TSC1 C1orf24;FAM129A

ZNF609 GDP‐D‐glucose phosphorylase 1 hCG_1991723;GDPGP1 Palmitoyltransferase;Palmitoylt ransferase ZDHHC5 ZDHHC5 Lysosomal thioesterase PPT2 Di‐N‐acetylchitobiase S‐phase kinase‐associated  protein 2 Rho GTPase‐activating protein  19 Guanine nucleotide‐binding  protein subunit beta‐4 Arf‐GAP with GTPase, ANK  repeat and PH domain‐ containing protein 1 hCG_1999928;PPT2 CTBS SKP2 ARHGAP19 GNB4 CENTG2;AGAP1

VNN2 DNA‐directed RNA polymerase  subunit;DNA‐directed RNA  polymerase I subunit RPA12

ZNRD1 16 2 1 1 2 1 18580000 18546000 17579000 17559000 17407000 17404000 17398000 17312000 17294000 17166000 17127000 17126000 16997000 16989000 15 1 2 1 2 1 0 6 2 2 3 2 2 3 4 8 6 4 5 4 3 4 3 4 1 0 0 0 1 2 4 1 Phospholipid scramblase 3 Inactive ubiquitin thioesterase  FAM105A Bis(5‐adenosyl)‐triphosphatase FHIT Serine‐protein kinase ATM ATM Transmembrane 7 superfamily  member 3 Forkhead box protein P4 Protein FAM208A E3 ubiquitin‐protein ligase  RNF169 WD repeat‐containing protein  62 ATR‐interacting protein Transcription factor HES‐1 Glutamate‐rich protein 1 Collagen alpha‐6(IV) chain BAG family molecular  chaperone regulator 4 NCK‐interacting protein with  SH3 domain SET domain‐containing protein  5

PSTPIP2 TMEM256‐ PLSCR3;hCG_1987383;PL SCR3 FAM105A TM7SF3 FOXP4 hCG_2043597 FAM208A;C3orf63 MLL/CALM fusion RNF169 WDR62;DKFZp434J046 TREX1;ATRIP HES1 ERICH1 COL4A6 BAG4 NCKIPSD

SETD5;DKFZp686A20205 RNA pseudouridylate synthase  domain‐containing protein 2

RPUSD2

COX1;coxI;COI;CO1;MT‐ Cytochrome c oxidase subunit 1 CO1 Serine/threonine‐protein  kinase greatwall MASTL RNA‐binding protein 7 RBM7 Protein phosphatase 1E PPM1E Transcription initiation factor  TFIID subunit 4 TAF4 Golgi reassembly‐stacking  protein 1 GORASP1 RAB6A‐GEF complex partner  protein 2 RGP1 Pannexin;Pannexin‐1 PANX1 RING finger and CHY zinc finger  domain‐containing protein 1 Probable lysosomal cobalamin  transporter

RCHY1

LMBRD1 E3 ISG15‐‐protein ligase HERC5 HERC5 Gap junction  protein;Intraflagellar transport  protein 140 homolog Coiled‐coil and C2 domain‐ containing protein 1A Actin‐related protein 5 ETS‐related transcription factor  Elf‐2 Thymosin beta‐10 NEDD4‐binding protein 2 TATA element modulatory  factor

IFT140 CC2D1A ACTR5 ELF2 TMSB10 N4BP2 TMF1 1 1 16750000 16738000 16722000 16571000 16426000 4 12 0 10 8 2 0 2 4 0 7 1 5 2 5 3 5 3 0 0 4 2 4 2 0 1 9 2 2 1 2 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made DDB1‐ and CUL4‐associated  available under aCC-BY 4.0 International license. factor 11 DCAF11 1 1 1 60 60 60 2.9471 0 16.708 Pyridoxal‐dependent  decarboxylase domain‐ containing protein 1 PDXDC1 10 1 1 20.2 3 3 88.761 0.0094829 1.4726 60S ribosome subunit  biogenesis protein NIP7  homolog NIP7 1 1 1 15.6 15.6 15.6 20.462 0 5.658 Tyrosine‐protein kinase  Fyn;Non‐specific protein‐ tyrosine kinase FYN 3 2 2 11.4 9.3 9.3 60.761 0.0005923 2.7422 Protein TANC1 TANC1 2 2 2 1.9 1.9 1.9 202.22 0.0005924 2.7427 IQ motif and SEC7 domain‐ containing protein 1 IQSEC1 2 2 2 4.2 4.2 4.2 91.997 0 3.8244 Coiled‐coil domain‐containing  protein 8 CCDC8 2 2 2 7.7 7.7 7.7 57.266 0 6.941 13902000 13864000 13817000 13804000 14052000 Caspase‐8;Caspase‐8 subunit  p18;Caspase‐8 subunit p10

CASP8;hCG_16983 Monoacylglycerol lipase ABHD6 ABHD6 Nuclear factor related to kappa‐ B‐binding protein NFRKB CKLF‐like MARVEL  transmembrane domain‐ containing protein 7 Armadillo repeat‐containing  protein 10 Phosphorylase b kinase  regulatory subunit alpha,  skeletal muscle isoform PHKA1 Large neutral amino acids  transporter small subunit 4 Rab‐3A‐interacting protein

SLC43A2 RAB3IP ARMC10

CMTM7;DKFZp434I2129 Lysine‐specific demethylase 4A KDM4A

PCBP2 Dual specificity protein  phosphatase;Dual specificity  protein phosphatase 9 Epithelial splicing regulatory  protein 2 Integrin‐alpha FG‐GAP repeat‐ containing protein 2 Zinc transporter ZIP10 Guanine nucleotide‐binding  protein G(o) subunit alpha RB1‐inducible coiled‐coil  protein 1 Zinc finger CCCH‐type with G  patch domain‐containing  protein Histone‐lysine N‐ methyltransferase EHMT2 Amyloid‐like protein 2 Bifunctional arginine  demethylase and lysyl‐ hydroxylase JMJD6 Ankyrin repeat domain‐ containing protein SOWAHC Fanconi anemia group A  protein Rab proteins  geranylgeranyltransferase  component A 1 Cep170‐like protein Teneurin‐3 Zinc finger CCHC domain‐ containing protein 10

DUSP9 CLASP2 ESRP2 ITFG2 SLC39A10 GNAO1 RB1CC1 ZGPAT EHMT2 APLP2 JMJD6 SOWAHC FANCA CHM CEP170;CEP170P1 TENM3

ZCCHC10 Membrane‐associated  guanylate kinase, WW and PDZ  domain‐containing protein 1

MAGI1

12 16.7 1.3 21.9 20 2 8 2 0 1 2 0 2 0 2 0 4 3 2 3 4 2 1 0 0 0 0 0 0 0 0 0 3.7871 5.1054 16.7 13.011

38.33 4.8 30.278 0.0099857 8.4348 7.2037 19.395 12863000 12853000 12472000 12.269 3.664 12448000 12403000 8.1 39.241 7.5 20.308 0.0013566 2 1 1 1 2 1 1 2 1 1 4 1 1 6 1 2 3 2 2 1 1 8 1 1 11439000 11306000 11253000 11197000 11151000 Tumor necrosis factor receptor  superfamily member 16 Testis‐specific Y‐encoded‐like  protein 1 Insulinoma‐associated protein  1 Katanin p60 ATPase‐containing  subunit A‐like 1 Induced myeloid leukemia cell  differentiation protein Mcl‐1 Transcriptional activator  GLI3;Transcriptional repressor  GLI3R Thioredoxin domain‐containing  protein 16 SWI/SNF‐related matrix‐ associated actin‐dependent  regulator of chromatin  subfamily E member 1‐related Sodium‐coupled neutral amino  acid transporter 2 GRB2‐associated‐binding  protein 1 DBH‐like monooxygenase  protein 1 Fos‐related antigen 2 Zinc finger protein with KRAB  and SCAN domains 1 E3 SUMO‐protein ligase NSE2 APC membrane recruitment  protein 1 Krueppel‐like factor 4 Regulator of G‐protein signaling  19

NGFR TSPYL1 INSM1 KATNAL1 MCL1 GLI3 TXNDC16 HMG20B SLC38A2 GAB1 MOXD1 FOSL2 ZKSCAN1 AMER1 KLF4 RGS19

NSMCE2;C8orf36 Myotubularin‐related protein 3 MTMR3 Pleckstrin homology domain‐ containing family G member 1 Sorting nexin‐24 Glycosyltransferase 8 domain‐ containing protein 1 N‐chimaerin Ragulator complex protein  LAMTOR4;Ragulator complex  protein LAMTOR4, N‐terminally  processed

PLEKHG1 SNX24 GLT8D1 CHN1

LAMTOR4 Transport and Golgi  organization protein 6 homolog TANGO6 Protein‐lysine 6‐oxidase LOX Calcineurin‐binding protein  cabin‐1

DKFZp762G2015;CABIN1 2 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 3.9024 3.5684 3.8759 3.9517 9.8 52.923 10.4 11.2 24.42 3.2 56.025 0.001364 2 64.473 0.0069533 6.5 69.652 8.9 22.254 0.0028116 0 10229000 10127000 0 0 0 0 0 0 0 0 0 0 6.8053 33.606 3.169 3 103.22 16.7 13.547 0.0001999 0.0081803 9524700 9467900 9419400 9397300 2 1 5 2 4 0 3 2 2 3 2 2 2 0 2 1 1 0 2 3 0

Asparagine synthetase domain‐ containing protein 1

ASNSD1 PR domain zinc finger protein 2 PRDM2 Paired mesoderm homeobox  protein 2B U6 snRNA‐associated Sm‐like  protein LSm5 LSM5

PHOX2B BMP‐2‐inducible protein kinase BMP2K Neurobeachin NBEA;DKFZp686I03263 BET1‐like protein BET1L DNA‐directed RNA polymerase  II subunit RPB11‐b2;DNA‐ directed RNA polymerase II  subunit RPB11‐b1;DNA‐ directed RNA polymerase II  subunit RPB11‐a Coiled‐coil domain‐containing  protein 14 Protein SGT1 Beta‐mannosidase Kelch domain‐containing  protein 3 Bardet‐Biedl syndrome 1  protein Shugoshin‐like 2 SH3KBP1‐binding protein 1 WD repeat‐containing protein  81 Nucleolar protein 4‐ like;Nucleolar protein 4 Sarcosine dehydrogenase,  mitochondrial Cytochrome c oxidase assembly  factor 5 Mitogen‐activated protein  kinase kinase kinase 6 GEM‐interacting protein Myotubularin‐related protein  14 WD repeat‐containing protein  76 PHD finger protein 1 AMME syndrome candidate  gene 1 protein WASH complex subunit  FAM21A 1‐phosphatidylinositol 4,5‐ bisphosphate  phosphodiesterase beta‐ 1;Phosphoinositide  phospholipase C

POLR2J3;POLR2J1;POLR2 J;POLR2J2 CCDC14 HSGT1;ECD MANBA KLHDC3 BBS1;DKFZp313N0733 HNRNPDL SGOL2 SHKBP1 WDR81 NOL4;NOL4L SARDH COA5 MAP3K6 GMIP MTMR14 WDR76 PHF1 AMMECR1 FAM21A

PLCB1 Pleckstrin homology domain‐ containing family M member 2 PLEKHM2 SEC14 domain and spectrin  repeat‐containing protein 1 SESTD1 ATP synthase protein 8;Protein  FAM65A FAM65A PNMA‐like protein 1 PNMAL1 Ankyrin repeat and BTB/POZ  domain‐containing protein 2 Ubiquitin thioesterase OTU1 P2X purinoceptor;P2X  purinoceptor 4 Multiple epidermal growth  factor‐like domains protein 8

ABTB2 YOD1 P2RX4 MEGF8 1 1 1 2 1 4 1 1 1 1 1 1 12.6 2.5 1.9 108.6 8790300 8650400 8618800 7838600 7755900 0 16 2 2 3 3 2 5 2 bioRxiv preprint doi: https://doi.org/10.1101/2023.12.20.572361; this version posted December 21, 2023. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made RASAL2 avai1lable unde1r aCC-BY14.0 Inter4n.9ational lic4e.9nse. 4.9 41.743 0 4.2397 5147700 TPST1 C19orf57 RWDD1 TMEM175 DBNDD1;MGC3101 CIC;CIC/DUX4 fusion PHKB SLC4A7;SLC4A10;SLC4A8 METRN hCG_2039588;PBX2

VEZT Protein‐tyrosine  sulfotransferase 1 Uncharacterized protein  C19orf57 RWD domain‐containing  protein 1 Transmembrane protein 175 Dysbindin domain‐containing  protein 1 Protein capicua homolog LysM and putative  peptidoglycan‐binding domain‐ containing protein 3 Probable methyltransferase‐ like protein 15 Zinc finger transcription factor  Trps1 Serine/threonine‐protein  phosphatase 6 regulatory  ankyrin repeat subunit C Ataxin‐1 C2 calcium‐dependent domain‐ containing protein 4C Carboxylic ester  hydrolase;Carboxylesterase 3 Solute carrier family 25  member 40 NEDD4‐binding protein 1 Anion exchange  protein;Electroneutral sodium  bicarbonate exchanger  1;Sodium‐driven chloride  bicarbonate exchanger;Sodium  bicarbonate cotransporter 3 Meteorin Calcium/calmodulin‐dependent  protein kinase type II subunit  beta Protein‐methionine sulfoxide  oxidase MICAL1 Transmembrane protein 101 Ankyrin repeat domain‐ containing protein 54 Zinc finger and BTB domain‐ containing protein 14 Pre‐B‐cell leukemia  transcription factor 2 Vezatin Adenylosuccinate synthetase  isozyme 1 RNA polymerase‐associated  protein LEO1 Ubiquitin carboxyl‐terminal  hydrolase 31 Regulator of nonsense  transcripts 3A Hippocalcin‐like protein 4 Autophagy‐related protein 9A Centromere protein T Nucleolar MIF4G domain‐ containing protein 1 E1A‐binding protein p400 cGMP‐dependent protein  kinase 1

LYSMD3 METTL15 TRPS1 ANKRD52 ATXN1 C2CD4C CES3 SLC25A40 N4BP1 CAMK2B MICAL1 TMEM101 ANKRD54 ZBTB14 ADSSL1 LEO1 USP31 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 1 5 1 1 2 2 2 2 3 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 3 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 2 2 2 1 3 4555400 4341900 4334200 3501000 3451900 3111200 2779800 1467900 1288000 1250100 1198900 999690 932690 1 1 1 0 4 1 0 5 3 3 2 2 0 2 2 0 2 0 2 0 4 10 2 6.7 61.966 0.0081922 Mediator of RNA polymerase II  transcription subunit 13 THRAP1;MED13 Histone H2A;Histone H2A type  1‐C;Histone H2A type 3;Histone  HIST1H2AC;HIST1H2AB; H2A type 1‐B/E HIST3H2A Ras‐related protein Rab‐4B RAB4B Coiled‐coil domain‐containing  protein 68 CCDC68 THUMP domain‐containing  protein 3 THUMPD3 THO complex subunit 7  homolog NIF3L1BP1;THOC7 Ras‐related protein Rab‐11A RAB11A ELM2 and SANT domain‐ containing protein 1 ELMSAN1;C14orf43 Gamma‐interferon‐inducible  lysosomal thiol reductase High affinity copper uptake  protein 1 Small ubiquitin‐related  modifier;Small ubiquitin‐ related modifier 2;Small  ubiquitin‐related modifier 4 Fanconi anemia‐associated  protein of 100 kDa Microsomal glutathione S‐ transferase 3

IFI30 SLC31A1 SUMO2;SUMO4 C17orf70;FAAP100 MGST3

HNRPA1;HNRNPA1 SRC kinase signaling inhibitor 1 SRCIN1 RNA exonuclease 1 homolog B‐cell CLL/lymphoma 9‐like  protein Hydroxypyruvate  isomerase;Putative  hydroxypyruvate isomerase Transforming growth factor‐ beta‐induced protein ig‐h3 Dystonin ATP‐dependent RNA helicase  DDX19B Thioredoxin reductase 3 Presenilin;Presenilin‐ 2;Presenilin‐2 NTF  subunit;Presenilin‐2 CTF  subunit Methylenetetrahydrofolate  reductase Uncharacterized protein  C15orf41 Protein AAR2 homolog Protein kinase C and casein  kinase substrate in neurons  protein 1 Engulfment and cell motility  protein 1 Alpha‐1,3‐mannosyl‐ glycoprotein 4‐beta‐N‐ acetylglucosaminyltransferase‐ like protein MGAT4D

REXO1 BCL9L HYI TGFBI DST TXNRD3 PSEN2 MTHFR HLA‐A C15orf41 AAR2 PACSIN1 ELMO1

MGAT4D Protein cornichon homolog 4

CNIH4 DDX19B;hCG_1998531 1 7 1 2 1 1 5 1 1 1 2 1 1 1 1 1 1 2 1 2 1 3 1 1 2 1 2 1 14 1 1 1

7.8246 0 0 0 0 0 0 0 0 0 3 2 2 0 2 0 0 0 1 0 0 2 1 0 0 0 3 0 0 0 0 0 2 0 0 0 3 2 1 0

Translocating chain‐associated  membrane  protein;Translocating chain‐ associated membrane protein 1 TRAM1 Digestive organ expansion  factor homolog DIEXF Transcription factor RelB RELB

MRPL52

MBOAT2 Limb region 1 protein homolog LMBR1 39S ribosomal protein L52,  mitochondrial Lysophospholipid  acyltransferase 2 Transmembrane protein  FAM155A Ras‐related protein Rab‐7L1 Ribosomal RNA processing  protein 36 homolog Homeobox protein Hox‐A11 E3 ubiquitin‐protein ligase  RNF14 RNF14 2‐aminoethanethiol  dioxygenase ADO Solute carrier family 35  member F1 Ceramide synthase 1 Poly(rC)‐binding protein 3

RRP36

HOXA11 Nuclear receptor coactivator 2 NCOA2 Translin‐associated factor X‐ interacting protein 1

TSNAXIP1 SLC35F1 CERS1;LASS1 PCBP3 FAM155A

RAB7L1;RAB29 Ras‐related protein Rab‐1A Prothymosin  alpha;Prothymosin alpha, N‐ terminally processed;Thymosin  alpha‐1

RAB1A NPSR1 PTMA;PTMAP7

DMKN Tubulin alpha‐8 chain

TUBA8;DKFZp686L04275 Neurogenic locus notch  homolog protein 2;Notch 2  extracellular truncation;Notch  2 intracellular domain Claudin DNA topoisomerase 2 Myb/SANT‐like DNA‐binding  domain‐containing protein 4 BCL2/adenovirus E1B 19 kDa  protein‐interacting protein 2 Glycerophosphodiester  phosphodiesterase domain‐ containing protein 1 Histone deacetylase complex  subunit SAP30 Tropomyosin alpha‐4 chain Histone H3.1 Lactoylglutathione lyase Heat shock protein HSP 90‐ alpha A2 Sarcolemmal membrane‐ associated protein

NOTCH2 CLDN5 TOP2B EPB41L2 MSANTD4 PLEC BNIP2 IST1 GDPD1 HNRNPM SAP30 HEL‐S‐108;TPM4 HIST1H3A GLO1 HSP90AA2P SLMAP 1 1 1 1 1 2 2 2 2 1 1 2 1 1 7 1 1 6 14 5 1 1 1 1 10 1 8 1 10 1 10 5 10 8 1 1 1 1 1 2 2 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 2 11.4 2.8 3.6 23.7

44 28.2 21.1 3.9 57.4 14.6 33.9 26.8 45.2 43.4 60.9 26.8 11.4 0 2 1 2 0 0 3 0 Splicing regulatory  glutamine/lysine‐rich protein 1 SREK1 Vesicle transport through  interaction with t‐SNAREs  homolog 1A Mastermind‐like protein 3 Uncharacterized protein  C20orf24 Eukaryotic translation initiation  factor 3 subunit F IFP38 Tumor necrosis factor receptor  superfamily member 10D Protocadherin alpha‐10

CDCP1 RPF1 KIF15 1 1 6.9 3.9 8.2 59.632 4.6 42.785 6.9 25.217 3.9 122.29 3.2 37.892 36.5 8.2789 8.3 40.11 1.3 160.16

0 0.0042545 0 0

High mobility group  nucleosome‐binding  domaincontaining protein 3 2.9 2.9 4.3 11.7 2.8 8.6 2.5988 4.3 59.38 2.5001 2.6924 2 . 6665