Research Article
Study of APOBEC3B focused breast cancer pathways and the clinical relevance
1 Department of Biochemistry, Faculty of Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia; [email protected] (H.C.); [email protected] (A.A.); [email protected] (W.H.A.).
2 Cancer and Mutagenesis Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, 22252, Saudi Arabia.
3 Centre for Artificial Intelligence in Precision Medicines, King Abdulaziz University, Jeddah, 80200, Saudi Arabia.
4 Department of Biomedical Laboratory Science, NTNU NO-7491 Trondheim, Norway.
* Correspondence: [email protected] (M.M.); [email protected] (H.C.)
Citation: Chouadhary H, Albukhari A, Mobashir M, and Abdulaal WH. Study of APOBEC3B focused breast cancer pathways and the clinical relevance. Jour. Bas. Sci. 2024, 2(1). 1-12. Received: September 15, 2024 Revised: October 19, 2024 Accepted: October 25, 2024 Published: November 01, 2024 Abstract: APOBEC3B is considered as an enzymatic source of mutation in case of breast cancer and Human T-Cell Leukemia Virus Type 1 and Bone Leiomyosarcoma are also associated with it. The major functions controlled or affected due to APOBEC3B are gene expression, mRNA editing such as C -> U conversion, and deoxycytidine deaminase activity. Here, the main goal of the study was to perform a systematic analysis of APOBEC3B associated genes and its functional impact in human breast cancer. For this purpose, the datasets have been utilized from the publicly available database such as GEO, OncoLnc, and TCGA. Based on the requirements for fetching the values, different bioinformatics approaches have been applied at different levels. Further, co-regulated genes obtained from co-expression network have been processed and the mutated genes with the pathways enrichment analysis, and the clinical relevance using survival curve analysis by using OncoLnc have been performed. In the results, we found that there are a number of critical pathways known to directly associated with breast cancer are altered because of the genes which are either overexpressed or top mutated and are associated with APOBEC3B and these pathways are cell cycle, p53 signaling, immune signaling pathways, progesterone-mediated oocyte maturation, apoptosis, critical metabolic pathways, and pathways in cancers. From the mutational and survival analysis data, we also observe that there are a number of well-known cancer associated signaling pathways (mainly cancer), immune signaling pathway, critical metabolic associated pathways, cell cycle, ubiquitin-proteasomal signaling pathways, and p53 signaling. Network-level study of pathways and their components CD40LG as the potential gene where CD40LG is directly affecting 10 pathways and most of them are the parts of immune system and known to control a number of leading human diseases including cancers. Keywords: APOBEC3B; breast cancer; expression and mutation analysis; co-expression; survival analysis; clinical relevance 1. Introduction Genomic instability1,2 is considered as the potential driver for major human diseases mainly cancer which rapidly promotes cancer progression leading to clonal selection in tumor cells3-8 and further leads to drug resistance and poor clinical outcomes. Mutations are considered as among the potential genomic instability factors and play critical roles in cancer progression and clinical outcomes and drug resistance and the mutations are classified as driver and passenger mutations9-14 which could be induced by multiple factors which could be internal or external factors. Furthermore, gene expression pattern also displays higher-level of aberrations and their direct functional impact15-22. The advancement in technology coupled with appreciable progress in interdisciplinary approaches has enabled the direct characterization of significant functional mutations, genes, and pathways involved in various complex human diseases including cancer23-25. Classically, it was established that damage due to radiation and chemicals were main culprits for carcinogenic mutagenesis26-29. The enzyme family of DNA cytosine deaminases having important role in intrinsic DNA mutations and the human genomes contain information for 11 polynucleotide cytosine deaminase family enzymes that are responsible for mutations in cancer APOBEC1, activation-induced deaminase (AID), APOBEC2, APOBEC3 proteins (known as A3A, A3B, A3C, A3D, A3F, A3G and A3H), and APOBEC4. APOBEC2 and APOBEC4 are not reported to have not mutational activity. A tissue-specific expression has been shown for APOBEC1 and AID and has implications in cancers of those tissues, hepatocytes and B cells, respectively9,30-34. These enzymes were identified independently in 2002 as DNA mutators and antiviral factors. Among all the 11 family members, APOBEC3B is considered to be responsible for mutations in breast cancer and is found over-expressed in many breast cancer cohorts31,35-38. There are numerous causes (such as changes or mutations in DNA, gene expression pattern alterations, epigenetic changes) which may be associated with breast cancer origin39-42 and the APOBEC3B mutation is one of them which is known to play pivotal role in breast cancer in the possible clinical outcomes of the patients. APOBEC3B increases the mutation load, generating clusters of closely spaced, single-strand-specific DNA substitutions with a characteristic hypermutation signature in breast cancer37. A plethora of reports is available in support of the genomic alterations by APOBEC3B in the human genome and the mutation patterns have been studied in detail37. The factors responsible for their expression regulation have also been deduced where Linda and co-workers have revealed the biochemical basis its action. It is reported that and NF-kB controls the expression of APOBEC3B in an independent manner. In previous works, it has been shown that expression of these enzymes is controlled or inhibited by other genes and proteins. Analyses of APOBEC3B associated networks and target genes (for mutations) is a promising source of insight for discovering therapeutic targets for APOBEC3B mutated cancers. The associations of APOBEC3B hyper-expression, APOBEC3B interaction networks, and survival curves of patients having mutations or overexpression of these related genes have not been widely explored. A systematic analysis of APOBEC3B induced mutations and the related interaction networks in various types of human cancers especially in breast cancer is lacking. There have been a number of studies of genomic alterations across cancer types based on TCGA data. However, none of them have focused on the systematic study and explored genomic aberrations of APOBEC3B and its related interaction networks across the breast of cancer type. Previous works have addressed the role of APOBEC3B in a discrete manner while our potential emphasis has been on the combined study of gene expression datasets and mutational datasets to evaluate the association and impact of APOBEC3B in breast cancer including the network-level understanding of these enzymes and their associated molecular components. Further, we have focused to elucidate the clinical relevance of those APOBEC3B co-expressed genes depending variation in their expression of mutation. Our second aim targets at the overall genes preferentially mutated in breast cancer and the clinical relevance. This study reveals the detailed understanding from basics of APOBEC3B and the associated genes and the inferred functions with it, mutational profiling and the associated biological functions, survival analysis for clinical relevance, and the networks of both the genes and the pathways. It leads to the conclusion that APOBEC3B is connected with a large number of genes in the human network database and these genes are present in the critical pathways for especially breast cancer. These pathways are either altered due to differential expression of the genes or mutations. Furthermore, we conclude the same from the mutated genes and the clinically significant genes.
In this study, we have collected the datasets for expression and mutational profiling from a publicly available database (GEO, OncoLnc, and TCGA) for breast cancer. The list of OncoLnc for the cancer names abbreviations have been presented in Supplementary Table S1. Majority of the comparative analysis has been performed between normal and the tumor samples. The mutational datasets details are as follows: dataset 1 (from TCGA database) was breast cancer (METABRIC, Nature 2012 and Nature Communication 2016) and contains 2509 samples43,44 and dataset 2 was breast cancer invasive carcinoma (TCGA, Firehose Legacy) and contains 1108 samples44. For data processing, normalization, analysis, and figure plotting, MATLAB has been used in most of this work. Similarly, MATLAB has been used for the majority of the computational analysis from normalization to statistical analysis and all kinds of data processing and to summarize our work approach, we have presented a workflow as Figure 1a. This workflow represents the quick insight into the most of used approaches for the entire analysis which presents the details of the steps from data collection to the analysis steps while more details had been described in the next paragraphs. After establishing the fundamental relevance of the APOBEC3B and its association with other genes and pathways, we have used KEGG pathway database and network database (FunCoup) to present a network of both the pathways and their directly associated components i.e., directly APOBEC3B-associated genes. To prepare or fetch the network(s) for the desired list of genes, the easiest way is to directly use the server or to prepare the list of genes and write the own code to fetch out the connectivity for the list of genes in the list the later was our preferred approach. Furthermore, we have accessed the TCGA datasets belonging to breast cancer where we have mapped out the co-expressed genes for APOBEC3B and presented top 25 co-expressed genes based on the correlation values and all these genes have positive correlation values (Figure 1c) and also checked out the mutation of APOBEC3B. This have been done by searching the APOBEC3B genes in case of breast cancer in TCGA database and from the next step, the co-expression option had been selected and the threshold of +/- 0.5 (correlation value) has been applied on the correlation (spearman) values for selecting the co-expressed genes. For the list of mutated genes obtained for large datasets of breast cancer from TCGA44 and pathway enrichment analysis has been performed and the cutoff were and the steps implemented was similar to DAVID database45-47 and for network drawings cytoscape48-50 has been used and for enriched pathways, basic enrichment approach has been used. For survival analysis and to predict the top ranked genes in terms of survival in breast cancer OncoLnc have been used51. In survival curve analysis, we have presented the overall list of genes with significant . Furthermore, for the top 100 clinically significant genes, panther database52 has been used for the panther protein classification. In short we could also say that the individual files have been prepared by using the command line codes (unix) for genes and pathways list preparations based on the relevant needs for the purpose to analyze the enriched pathways or preparing the networks or for panther protein class prediction53-55. 3. Results 3.1. Understanding the role of APOBEC3B and the associated genes and pathways in breast cancer We have designed our study to achieve the goal as mentioned in the previous section and for which the large datasets (expression and mutation) have been collected from GEO, OncoLnc, and TCGA and applied in-silico approach for analysis and the summarized layout have been presented in Figure 1a. We have mapped out the cox regression analysis of APOBEC3B in different types of cancers (Table 1) and the directly APOBEC3B-associated genes and the respective pathways for all the genes by using protein-protein interaction database and finally presented a combined network of all these genes and the pathways in Figure 1b. Furthermore, a network of co-expressed genes for APOBEC3B and the associated pathways have been presented in Figure 1c. In Figure 1b and 1c, we observe that p53 signaling, cell cycle, oocyte meiosis, major cancer signaling, ubiquitin-mediated proteolysis, TLR signaling, chemokine signaling, antigen processing and presentation, regulation of actin cytoskeleton, neurotrophine, MAPK, BCR signaling, a number of metabolism associated signaling, and calcium signaling pathways as the potentially affected pathways due to alteration(s) in APOBEC3B either expression or mutation or both. From the previous work, it is known that in a number of cancer normal and malignant cells, increased GSH level is associated with a proliferative response and is essential for cell cycle progression, regulation of actin cytoskeleton pathway hasits role in cancer cell migration and invasion in ECM, p53 signaling pathway, MAPK kinase pathway, neurotrophine and chemokine and calcium signaling pathways which when altered in cancer cells are involved in tumor initiation, angiogenesis, progression, and metastasis. These pathways are known to be very specific and play pivotal role in cancer cell migration and proliferation and these pathways which are very important in cancer progression were observed to be strongly associated with APOBE3B. Further, extending our analysis we classified positive and negative co-expressing genes but threshold we have applied for correlation value was either greater than +0.5 or less than -0.5 and such negative and positive correlation in expression provides important information regarding the dependence of gene expression on each other. Table 1. Cox regression analysis for APOBEC3B.
3.2. Mutational profiling and their functional impact in breast cancer In the previous section, we have explored the fundamentally APOBEC3B associated genes and the pathways and moreover, mutational profiling of breast cancer genes have been performed and the respective enriched pathways for the two big clinical datasets from TCGA database. In this analysis, two datasets have been used and among the top-ranked mutated genes there are a large number of common genes (Figure 2a) and as we go down the number of common genes decreases and similar to it the common enriched pathways with the respective p-values have been presented (Figure 2b) and finally the venn diagram has been shown for both the genes and the enriched pathways (Figure 2c). PIK3CA, TP53, MUC16, TTN, AHNAK2, SYNE1, CDH1, KMT2C, GATA3, and more are among the top ranked genes and MAPK, calcium signaling, cAMP, PI3K-AKT, focal adhesion, adrenergic signaling, thyroid hormone, oxytocin, ErbB, ubiquitin, apelin, tight junction, GnRH, Ras, cGMP-PKG, cell cycle, and pluropotency of stem cells are among the commonly enriched pathways. Overall, there were 42 commonly mutated genes and 18 pathways commonly enriched, 131 genes dataset1 specific mutated genes and 41 enriched pathways while 188 mutated genes and two enriched pathways specific to dataset 2. Among the enriched pathways list for dataset 1, there are a number of pathways which directly belong to the immune system and these pathways are TCR, BCR, TLR, NK cell-mediated cytotoxicity, TNF, TGF, cytokine-cytokine receptor interaction, and leukocyte transendothelial cell migration and ubiquitin-mediated proteolysis is common to both the mutational datasets. In Figure 2c, We have presented the detailed analysis of the clinically significant top 100 genes and the associated biological functions, where we observe that CD40LG is directly associated 10 pathways and most of them are the parts of immune system and known to control a number of leading human diseases including cancers. CD40LG (CD40 Ligand) is a Protein Coding gene. The diseases associated with CD40LG are mainly immunodeficiency with Hyper-Igm, type-1 and toxoplasmosis. It mainly acts as a ligand for integrins, mainly ITGA5:ITGB1 and ITGAV:ITGB3; both the integrins and the CD40 receptor are required for activation of CD40-CD40LG signalling which have cell-type dependent effects, such as B-cell activation, NF-kB signaling and anti-apoptotic signaling. Furthermore, we have also performed analysis of APOBEC3B signature (C->T) in breast cancer and observe that this specific mutation pattern is dominantly present in missense and exon followed by upstream, synonymous, intron, 3’ UTR, and 5’ UTR (Supplementary data S2).
Figure 2. Breast cancer mutation and its functional impact. (a) Top ranked mutated genes in the clinical breast cancer samples (TCGA database), (b) commonly enriched pathways, and (c) venn diagram to present the commonly top ranked mutated genes and the enriched pathways among the TCGA database.
After analyzing the APOBEC3B related genes and the pathways including the mutated genes and the altered functions because of mutation, we have analyzed the overall list of genes which are showing clinical significance in terms of overall patients survival (Figure 3). Here, top 100 genes have been presented (Figure 3a) followed by the respective p-values, panther protein classes for these top-ranked genes (Figure 3b) have been shown also, and finally the associated pathways and the top-ranked gens as a network have also been shown (Figure 3c). In terms of overall survival, MCTS1, OVOS2, MAPT-IT1, ATG4A, SLC16A2, SLC35A2, VDAC1, TBC1D24, RP11-214F16.8, and PDP1 are showing extremely high significance. From panther protein classification, majority of these top-ranked genes mainly belong to metabolite interconversion enzyme, transporter, protein modifying enzyme, defense/immunity protein, gene-specific transcriptional regulator, and membrane traffic protein classes. Most of the clinically significant genes belong to the above mentioned protein classes where metabolite interconversion enzyme has highest number of the genes followed by transporter, and protein modifying protein classes (Figure 3b). From the network of these top-ranked genes and the associated functions, CD40LG appears to control the 10 pathways and most of them belong to immune signaling. Among the overall functions associated with these top-ranked genes, majority of these pathways are known and well-established that they control major human diseases multiple types of cancers including breast cancer, neurodegenerative diseases, diabetes, and infection diseases (Figure 3c). Thus leading to the conclusion that immune system and its critical components are mainly affected as a result of breast cancer. Moreover, we have also presented a supplementary data where the clinical details have been presented (Supplementary data S3).
Figure 3. Clinically significant genes and their functional classification. (a) Top 100 genes which are highly significant in terms of the overall survival as presented here the respective p-values. (b) Panther classification of the genes. (c) Pathway—gene association network of the top 100 clinically significant genes. 4. Discussion There are a large number previous studies where the genomic aberrations and its impact have been studied including the genomic alterations by APOBEC3B in human genome and mutation profiling35,37,56. APOBEC-induced mutations are reported in early replicating regions where chromatin is active and the single state DNA present in these replicating regions is with increased DNA fragility producing additional substrates which are of great interest for APOBEC3B activity. Alterations and distortions in early replicating and highly transcribed regions of cancer genomes arise due to chromosome breakage, such as copy number variation, chromosome rearrangements, fragility and loss of heterozygosity and similar to it, our results appear57-60. The pathway enrichment of the top mutated genes in two cohorts revealed that pathways involved in cellular signaling were the main mutational target for APOBEC3B in breast cancer patients (Figure 3b). Enhanced DNA damage and repair is prone to increased changes of mutation by APOBEC3B. Interestingly, APOBEC-induced mutations are abundantly found in cancer genomes but no regulatory sequences are reported yet. It is reported that total of <6% of APOBEC-induced kataegic events in the vicinity of transcriptional start sites compared to 82% for AID-induced events and while APOBECs favor early replicating regions of the genome, which in B-cells, are devoid of AID-induced DSBs61-63.Few factors affecting the expression of APOBE3B directly or in-directly has also been screened previously. It is reported that p53 and NF-kB control the expression of APOBEC3B in an independent manner36,37,64. So far the studies have been mainly on the role of APOBEC3B in as an isolated study and keeping all this here, we have potentially emphasized on the integrated study of APOBEC3B associated genes, functions, expression datasets, and the clinical relevance to evaluate the association and impact of APOBEC3B in breast cancer. Here, the effort has been made to explore in depth the APOBEC3B and the associated genes followed by their functions by applying a number of approaches as an integrative study by including the network-level understanding20,65-68 of the relevant genes and their associated functions. As we already know that APOBEC3B is a gene editing enzyme with cytidine deaminase activity and high expression of its mRNA in breast tumors have been shown to be associated with progressive cases and poor prognosis. In this study, we aimed to examine the relationship between the expression of APOBEC3B and the effect of neoadjuvant chemotherapy (NAC) using pretreatment biopsy tissue, and examined whether the expression of APOBEC3B influenced chemotherapy efficacy. So here, in terms of correlation of APOBEC3B expression with the clinicopathophysiological features, it has been shown APOBEC3B as a novel predictive factor for pathological complete response to neoadjuvant chemotherapy in breast cancer69-71. Moreover, there are a number of previous studies where the functional assays of APOBEC3 have been performed72-74. Here, they have performed analysis to characterize the functional diversification of APOBEC gene family members associated with breast cancer mutagenesis for estrogen receptor (ER) status and found that both APOBEC3B and APOBEC3C mRNA levels among the APOBEC family were significantly higher in estrogen receptor negative (ER−) subtype compared with estrogen receptor positive (ER+) subtype69,71-73. Here, the APOBEC3B family members show mRNA levels extremely low or no obvious expression differences between ER+ and ER− breast cancers. The expression levels of APOBEC family genes in 55 breast cancer tissues with unknown ER status were similar with those samples of ER+ subtypes which suggests that many of these 55 patients may belong to ER+ subtype leading to the conclusion that ER status dependent expression patterns of APOBEC3B and APOBEC3C genes. In terms of DNA methylation, highly expressed genes appear to have lower DNA methylation levels in their proximal transcription start site (TSS) regions72. This study is an effort to elucidate the clinical relevance of APOBEC3B and breast cancer associated genes to study a relation between variation in their expression and followed by the mutational profiling and clinical relevance of breast cancer associated genes. As presented in the results section, the APOBEC3B associated genes and pathways, breast cancer mutational profiles and the enriched pathways, and the survival analysis of the breast cancer genes. This study leads to the conclusion that APOBEC3B may impact broadly in case of breast cancer. The genes and the enriched pathways are reported to play critical role in breast cancer75-83. The genes in co-relation with APOBEC3B, breast cancer mutated genes, and clinically relevant genes belong to the pathways which are potentially known to be altered in cancer especially breast cancer. Pathways responsible for the cancer cell progression and migration were observed closely associated with APOBEC3B inferring that APOBEC3B has an important contribution in the spread of breast cancer directly or indirectly. The co-relation trends generated from expression datasets suggest that some genes might be suppressed by APOBEC3B. The co-expressing analysis showed spatial patterns of gene expression indicating that their respective DNA portion is transcriptionally active. Transcription occurs on unfolded DNA which gives APOBEC3B an opportunity to target these exposed portion to induce mutations. Here, we have tried to establish a connection between the co-expression of genes with APOEBC3B and the instances of mutations occurring during this event33,35,61,62,84,85. It is also that the study mainly focused on computational analysis by using very simplified approach to present the APOBEC3B associated genes and the pathways, mutational profiling and the altered pathways in result of mutation, survival analysis for the clinical samples, and finally, network-level understanding of the critical genes and potential pathways. This study not only reveals the above mentioned goals but also gives quick insight into easy understanding of the potential genes and the critical pathways which may help in clinical and therapeutic purpose. In terms of future perspective and the application, mathematical modeling and simulations of the pathway-network model could be further explored by using the so far prediction from this study13,86-91 and integrated with experimental data also92. 5. Conclusions Based on this study, we conclude that the APOBEC3B associated genes and the pathways are very specific and play pivotal role in cancer cell migration and proliferation and these pathways which are very important in cancer progression were observed to be strongly associated with APOBE3B. PIK3CA, TP53, MUC16, TTN, AHNAK2, SYNE1, CDH1, KMT2C, GATA3, and more are among the top ranked genes and MAPK, calcium signaling, cAMP, PI3K-AKT, focal adhesion, adrenergic signaling, thyroid hormone, oxytocin, ErbB, ubiquitin, apelin, tight junction, GnRH, Ras, cGMP-PKG, cell cycle, and pluripotency of stem cells are among the commonly enriched pathways. From survival perspective, MCTS1, OVOS2, MAPT-IT1, ATG4A, SLC16A2, SLC35A2, VDAC1, TBC1D24, RP11-214F16.8, and PDP1 are showing extremely high significance and the panther protein classification, majority of these top-ranked genes mainly belong to metabolite interconversion enzyme, transporter, protein modifying enzyme, defense/immunity protein, gene-specific transcriptional regulator, and membrane traffic protein classes. Network of the top-ranked overall survival analysis-based genes and the associated functions, CD40LG appears to control the 10 pathways and most of them belong to immune signaling. Among the overall functions associated with these top-ranked genes, majority of these pathways are known and well-established that they control major human diseases multiple types of cancers including breast cancer, neurodegenerative diseases, diabetes, and infection diseases. Supplementary Materials: The following supporting information can be downloaded at: www.jbsciences.com/xxx/s1, Figure S1: title; Table S1: title; Video S1: title. Author Contributions: Conceptualization, H.C.; methodology, H.C., A.A., M.M., and W.H.A.; software, H.C.; validation, H.C., A.A. and W.H.A.; formal analysis, H.C.; investigation, H.C.; resources, H.C.; data curation, H.C., A.A., M.M., and W.H.A.; writing—original draft preparation, H.C., A.A., M.M, and W.H.A.; writing—review and editing, H.C.; visualization, H.C., A.A., and W.H.A.; supervision, M.M. and H.C.; project administration, H.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by Deanship of Scientific Research (DSR) at King Abdulaziz University (KAU), Jeddah, grant number G: 1437-130-655” and “The APC was funded by G: 1437-130-655. Acknowledgments: We are grateful to the Department of Biochemistry, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia and Cancer Metabolism and Epigenetic Unit, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia for providing us all the facilities to carry out the entire work. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: All the related data are supplied in this work or have been referenced properly. References
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of JBS and/or the editor(s). JBS and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. Copyright: © 2024 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |