Review
Impact of tumor cell fraction on gene expression and functional outcomes
Lina A. Baz 1*
1 Department of Biochemistry, College of Science, University of Jeddah, Jeddah-21589 Saudi Arabia.
* Correspondence: lbaz@uj.edu.sa (L.A.B)
Citation: Baz, L.A. Impact of tumor cell fraction on gene expression and functional outcomes. Glob. Jour. Bas. Sci. 2025, 1(8). 1-8.
Received: May 07, 2025
Revised: May 31, 2025
Accepted: June 09, 2025
Published: June 13, 2025
doi: 10.63454/jbs20000038
ISSN: 3049-3315
Volume 1; Issue 8
Download PDF file
Abstract: Tumor tissue is a complex ecosystem composed of malignant cells, immune infiltrates, stromal components, and vascular structures. The proportion of tumor cells within this microenvironment—termed tumor cell fraction (TCF) or tumor purity—has emerged as a critical variable influencing genomic, transcriptomic, and functional analyses in cancer research. This review synthesizes current evidence on how TCF affects gene expression profiling, pathway activation signatures, immune microenvironment characterization, and functional prediction models. We highlight methodological approaches for TCF estimation, discuss its confounding effects on bulk RNA sequencing data, and examine implications for biomarker discovery, patient stratification, and therapeutic response prediction. Standardizing TCF assessment and developing correction methods are essential for improving the reproducibility and biological relevance of cancer genomics.
Keywords: tumor purity; tumor microenvironment; bulk RNA-seq; deconvolution; biomarker; cancer genomics; immune infiltration; bioinformatics
1. Introduction
Cancer is fundamentally a disease of pathological heterogeneity, manifesting across multiple intersecting scales. At the cellular level, a single tumor mass is a complex ecosystem, or “heterogeneous tissue,” comprising not only the malignant epithelial or mesenchymal cells of origin but also a diverse array of recruited and resident non-neoplastic elements. These include immune cells (lymphocytes, macrophages, myeloid-derived suppressor cells), stromal components (cancer-associated fibroblasts, pericytes), and vascular networks, all embedded within an altered extracellular matrix. At the genomic level, tumor evolution driven by mutational processes and selective pressures leads to profound genetic diversity. This results in a constellation of subclonal populations within the same lesion, each harboring distinct combinations of driver and passenger mutations, copy number alterations, and chromosomal rearrangements. This genomic heterogeneity exists spatially, with different regions of a tumor (intra-tumor heterogeneity) and between primary and metastatic sites (inter-tumor heterogeneity), and temporally, as clones expand or are eradicated by therapy [1]. Furthermore, at the microenvironmental level, the biochemical and physical milieu—characterized by gradients of oxygen, nutrients, and immune signals—imposes selective pressures that shape both the genomic landscape and the cellular composition of the tumor.
Historically, the technological limitations of early molecular biology necessitated a simplifying assumption: that a tumor biopsy was a relatively homogeneous collection of genetically identical cancer cells. This paradigm treated the surrounding microenvironment as passive background noise. However, the advent of high-throughput genomic technologies and single-cell analytics has definitively overturned this view. It is now an established oncological principle that every solid tumor specimen is an admixture, a mosaic of neoplastic and non-neoplastic elements whose relative proportions are highly variable and biologically informative. This recognition reframes the tumor not as a pure culture of cancer cells, but as an organ-like structure with its own complex histology and ecology.
It is within this context that the Tumor Cell Fraction (TCF) emerges as a critical quantitative descriptor. Defined precisely as the percentage of nucleated cells within a analyzed sample that are malignant, TCF serves as a direct measure of tumor cellularity or “purity.” This metric is not a fixed property but a dynamic variable exhibiting significant fluctuation. TCF varies widely across cancer types (e.g., high in leukemias, lower in desmoplastic carcinomas like pancreatic ductal adenocarcinoma), within individual tumor types based on histology and grade, within a single tumor from its invasive front to its necrotic core, and between biopsy sites in the same patient [2]. This variability is not random noise but a reflection of underlying tumor biology, such as stromal reactivity, immune infiltration, and patterns of invasion.
Consequently, the accurate assessment of TCF transcends being a mere technical QC step; it is a fundamental prerequisite for the biologically faithful interpretation of all bulk-tumor genomic data. In bulk sequencing assays—where DNA or RNA is extracted from the entire cellular mélange—every measurement is a composite signal, a weighted average of the contributions from each cellular constituent. The TCF acts as the primary weighting factor. It directly influences the apparent allele frequency of somatic mutations (diluting them in low-purity samples), the detectability of copy number alterations, the magnitude of gene expression values for tumor-specific transcripts, and the enrichment scores of molecular pathways in signature analyses [3]. To analyze such data without correcting for TCF is akin to performing a chemical assay without knowing the concentration of the analyte; the output is difficult to interpret and potentially meaningless.
The failure to explicitly account for this confounder carries significant scientific and clinical risk. It can lead to erroneous biological conclusions, such as misidentifying stromal gene expression as a tumor-intrinsic dysregulation. It promotes the misclassification of molecular subtypes, as low-purity samples are systematically pushed towards “normal-like” or “reactive” classifications regardless of their true oncogenic drivers. Most critically, it jeopardizes accurate prediction of clinical outcomes, as prognostic and predictive biomarkers developed from confounded data may capture signatures of cellular composition rather than therapeutic vulnerability, ultimately undermining the translation of genomic insights into effective patient care.
Therefore, this review has four principal aims: (1) to synthesize and evaluate the current methodologies for estimating TCF from genomic, transcriptomic, and histopathological data; (2) to systematically examine the multifaceted impact of TCF on downstream gene expression analysis, functional annotation, and immunological profiling; (3) to discuss the profound implications of TCF for the discovery, validation, and clinical application of cancer biomarkers; and (4) to propose a standardized framework and set of best practices for TCF-aware analysis, with the goal of enhancing the rigor, reproducibility, and clinical utility of cancer genomics research.
2. Methodologies for estimating tumor cell fraction
Accurately determining the tumor cell fraction (TCF) is a critical prerequisite for meaningful genomic analysis, and a diverse arsenal of methodologies has been developed, each with its own strengths, limitations, and data requirements. These approaches range from traditional histological examination to sophisticated computational deconvolution of high-throughput sequencing data (Figure 1).

Figure 1. Estimation of tumour cell fraction.
2.1. Histopathological assessment
Histopathological review of hematoxylin and eosin (H&E)-stained tissue sections remains the clinical and traditional gold standard for TCF estimation. A trained pathologist visually assesses the slide, manually estimating the percentage of area occupied by malignant nuclei versus stromal, immune, and necrotic components [4]. This approach offers the unique advantage of direct morphological correlation, allowing the pathologist to distinguish invasive carcinoma from in situ disease or benign mimics. To reduce subjectivity, digital pathology platforms employing image analysis algorithms can quantify cellular density and nuclear features to provide a more reproducible score. However, both manual and digital histopathological methods face significant challenges: they are labor-intensive, difficult to standardize across different observers and institutions, and can be confounded by tumor architecture, such as glands dispersed in stroma, which complicates area-based estimation. Furthermore, they provide no direct link between the purity estimate and the specific nucleic acid extract used for downstream molecular assays.
2.2. Genomic-based estimation
Genomic methods infer TCF by exploiting the unique genetic aberrations of tumor cells compared to the diploid genome of contaminating normal cells. Copy number alteration (CNA) approaches are among the most robust. Tools like ABSOLUTE and ASCAT analyze allele-specific copy number profiles from DNA sequencing data [5,6]. They model the observed sequencing read depth and allelic ratios to simultaneously solve for two key parameters: tumor purity (the fraction of cancer cells) and ploidy (the average number of chromosome copies per cancer cell). A separate but related strategy uses mutation allele frequency. In a purely tumor sample, a heterozygous somatic mutation should be present in 50% of reads. Deviations from this expected frequency are primarily due to dilution by normal DNA; tools like PureCN use the observed allele frequencies of multiple somatic mutations, corrected for local copy number and tumor ploidy, to back-calculate the purity [7]. Lastly, methylation-based deconvolution capitalizes on the cell-type-specific nature of DNA methylation patterns. Algorithms like InfiniumPurify use reference methylation profiles from purified cell types to deconvolve the bulk methylation signal from a tumor sample, yielding estimates for the proportion of cancerous and various non-cancerous cells [8].
2.3. Transcriptomic-based estimation
For the many studies reliant on RNA sequencing data, transcriptomic deconvolution methods are indispensable. The most common approach employs reference-based deconvolution algorithms such as CIBERSORTx, ESTIMATE, and quanTIseq [9,10,11]. These tools use predefined “signature matrices” containing gene expression profiles of pure cell types (e.g., various immune cells, fibroblasts, epithelial cells). By fitting the bulk tumor gene expression profile as a linear combination of these reference profiles, they estimate the proportional contribution of each cell type, with the cancer cell proportion derived as the remainder or from a tumor-specific signature. A more direct method leverages single-nucleotide variant (SNV) expression. If a somatic mutation is present, tools like ISOpure can analyze the RNA-seq reads to quantify the expression specifically from the mutant allele versus the wild-type allele. The proportion of RNA derived from the mutant allele provides a direct estimate of the tumor-specific RNA fraction [12]. Simpler, signature-based tools like ESTIMATE bypass full deconvolution by calculating aggregate “ImmuneScore” and “StromalScore” from gene sets; a combined score inversely correlates with tumor purity, providing a rapid, albeit less granular, estimate [10].
2.4. Integrated approaches
Recognizing that no single method is flawless, integrated approaches combine evidence from multiple data types to generate more accurate and robust TCF estimates. For example, a consensus estimate can be derived by integrating the purity inferred from DNA-based copy number analysis with that from RNA-based deconvolution or mutation allele frequency [13]. This multi-platform strategy helps overcome the limitations of individual methods—such as the inability of transcriptomic methods to account for non-nucleated cells or the failure of genomic methods in tumors with few CNAs. Consequently, generating a consensus estimate from several complementary algorithms is increasingly considered a best practice, as it increases confidence and reduces the error inherent in any single technique.
3. Impact of tumor cell fraction on gene expression analysis
The proportion of tumor cells within a sample is not a passive quality metric but an active and pervasive confounding variable that systematically distorts nearly every aspect of bulk transcriptomic analysis. Failing to account for TCF can lead to biologically misleading conclusions and compromise the reproducibility of findings (Figure 2).

Figure 2. Impact of tumour cell fraction on cancer analyses.
3.1. Confounding of differential expression analysis
Differential expression (DE) analysis aims to identify genes whose transcription is differentially regulated between sample groups (e.g., tumor vs. normal, treated vs. untreated). However, TCF introduces a severe confound: genes that are highly and specifically expressed in tumor cells will naturally have higher observed expression in high-purity tumor samples compared to low-purity tumors or normal tissues, irrespective of their true regulatory state. Conversely, genes characteristic of the microenvironment (e.g., collagen from fibroblasts or CD3E from T-cells) will appear more highly expressed in low-purity samples [14]. If the compared sample groups have systematic differences in average purity—a common scenario when comparing late-stage tumors to early-stage ones or primary tumors to metastatic sites—the DE analysis will detect these purity-driven expression shifts as statistically significant. This results in a high false discovery rate, populating the resulting gene list with markers of sample composition rather than genuine disease biology.
3.2. Distortion of expression-based subtyping
Many cancers are classified into molecular subtypes with prognostic and therapeutic implications based on gene expression patterns (e.g., basal-like, luminal A/B in breast cancer via PAM50). These classifiers are typically trained on bulk RNA-seq data. When applied to a new sample with low TCF, the strong expression signal from the surrounding stroma and immune infiltrate can dominate the profile. This causes low-purity samples from biologically distinct tumor subtypes to cluster together simply because they share a common “low-purity/ high-stroma” expression signature, a phenomenon often observed in pan-cancer analyses [15]. Consequently, a low-purity basal-like tumor might be misclassified as a normal-like or reactive subtype, potentially leading to incorrect prognostic predictions and the withholding of appropriate subtype-specific therapies.
3.3. Alteration of pathway activation scores
Gene set enrichment analysis (GSEA) and single-sample scoring methods like GSVA or ssGSEA are used to quantify the activity of biological pathways (e.g., “MYC Targets,” “Epithelial-Mesenchymal Transition”). These scores are calculated from the expression levels of member genes. Since pathway activity is a cellular property, bulk scores represent a purity-weighted average. Proliferation pathways, whose genes are typically highly expressed in tumor cells, show a strong positive correlation with TCF. In contrast, pathways like “Inflammatory Response” or “Angiogenesis,” often driven by microenvironment cells, show a strong inverse correlation [16]. An observed association between a pathway score and a clinical variable like survival could therefore be entirely mediated by TCF. For instance, a high proliferation score might predict poor outcome not because proliferation is more oncogenic, but simply because high-purity tumors are larger or more advanced.
3.4. Effect on co-expression networks
The construction of gene co-expression networks, used to identify functionally related gene modules, is also vulnerable. In a pure cell population, a high correlation between two genes suggests co-regulation. In a bulk tumor, a strong correlation can arise artificially if both genes are expressed in the same cell type whose proportion varies across samples. For example, two fibroblast-specific genes will co-vary perfectly with changes in stromal content, even if they are in different pathways [17]. This creates hybrid networks that conflate intra-cellular regulatory networks with inter-cellular population structure. Modules identified in such networks may not represent coherent biological programs within any single cell type but rather serve as proxies for the abundance of a particular cellular compartment, misleading downstream functional inferences.
4. Impact on functional and clinical outcomes
The influence of Tumor Cell Fraction (TCF) extends far beyond a technical confounding variable; it fundamentally reshapes the interpretation of functional tumor biology and the validity of clinically actionable insights. Its pervasive effect calls into question established findings and necessitates a recalibration of analytical approaches in translational research (Figure 3).

Figure 3. Fraction and clinical impact of tumour cell fraction.
4.1. Immune microenvironment characterization
Perhaps the most significant impact of TCF is on the evaluation of the tumor immune microenvironment, a critical determinant of prognosis and response to immunotherapy. Bulk transcriptomic methods for quantifying immune infiltration—such as deconvolution algorithms (e.g., CIBERSORTx) or signature scores (e.g., an IFN-γ score)—produce estimates that are intrinsically and inversely correlated with TCF [18]. This creates a profound interpretative challenge: a sample with low TCF but a moderate number of immune cells can appear overwhelmingly “immune-hot” because the immune signal constitutes a large relative proportion of the total transcriptome. Conversely, a highly cellular tumor with substantial absolute immune infiltration might be misclassified as “immune-cold” if the immune signal is diluted by a dominant tumor cell expression profile. This TCF-driven distortion directly impacts immunotherapy research, as predictive biomarkers like tumor mutational burden (TMB) corrected for purity or immune gene signatures require purity-adjusted values to accurately identify patients most likely to benefit from checkpoint inhibitors.
4.2. Prognostic signature development
The development and validation of multi-gene prognostic signatures are acutely vulnerable to TCF confounding. Many established signatures associated with poor outcomes, such as proliferation metagenes, are heavily enriched for genes highly expressed in tumor cells. Consequently, their prognostic power may partially or wholly stem from their correlation with high tumor cellularity, a non-specific indicator of aggressive growth, rather than specific oncogenic biology [19]. Conversely, signatures derived from stromal or immune cell gene expression can show paradoxical prognostic associations across cancer types—predicting better survival in some contexts (e.g., via anti-tumor immunity) and worse in others (e.g., via pro-tumorigenic stroma). These conflicting results are often explicable by the underlying purity distribution of the training cohort. Signatures developed without correcting for TCF are essentially learning the sample composition, leading to poor generalizability when applied to independent datasets with different average purity or stromal content.
4.3. Drug response prediction
The accuracy of pharmacogenomic models, which aim to predict drug sensitivity from genomic or transcriptomic features, is compromised by uncorrected TCF. The expression of drug targets, efflux pumps, DNA repair enzymes, and metabolic pathway genes can differ radically between malignant and surrounding benign cells [20]. A predictive model trained on bulk RNA-seq data from cell lines or patient-derived xenografts (which have high purity) may identify a gene whose expression correlates with resistance. However, if this gene is primarily expressed in cancer-associated fibroblasts in patient samples, the model’s prediction will fail when applied to bulk tumor data of variable purity, as it conflates tumor-intrinsic resistance with microenvironmental contamination. Purity-aware modeling is therefore essential to isolate the true cellular origin of predictive signals and build robust translatable classifiers.
4.4. Biomarker discovery and validation
The quest for molecular biomarkers is fundamentally a quest for cellular specificity. Biomarkers identified from bulk-tissue analyses risk being misattributed. A canonical example is PD-L1, a critical immunotherapy biomarker. Bulk RNA-seq measures a composite PD-L1 signal from both tumor cells, where it may indicate adaptive immune resistance, and from infiltrating immune cells, where it reflects immune activation [21]. The therapeutic implication and predictive value may differ based on the source. TCF-aware analysis, often requiring integration with single-cell data or spatial methods, is crucial to deconvolve these signals. It ensures that a putative biomarker like “high VEGF expression” is correctly interpreted as endothelial-driven angiogenesis versus tumor-driven paracrine signaling, guiding the appropriate choice of anti-angiogenic therapy.
5. Correction methods and analytical best practices
Addressing the confounding effects of TCF requires a multi-faceted strategy encompassing statistical correction, refined bioinformatic pipelines, and thoughtful experimental design. Adopting these best practices is no longer optional for rigorous cancer genomics (Figure 4).

Figure 4. Correction methods for tumour cell fractions.
5.1. Statistical adjustment in differential expression
To isolate true tumor-specific transcriptional changes, several statistical approaches can mitigate TCF effects. The most straightforward method is to include an estimated TCF as a covariate in linear regression models used by tools like DESeq2, limma, or edgeR [22-36]. This controls for the systematic variation in gene expression explained by purity. A more sophisticated approach is microenvironment cell-adjusted analysis, which involves using deconvolution estimates to computationally subtract the expression contribution of non-tumor cell types from the bulk sample before performing differential expression testing [23-36]. A simpler, albeit less comprehensive, tactic is selective gene filtering, where analysis is restricted to genes known to be predominantly expressed in tumor cells, though this risks missing important biology regulated in other compartments.
5.2. Purity-aware subtype classification
Molecular subtyping pipelines must be redesigned to account for TCF. One method is to train classifiers on purity-corrected expression data, ensuring the subtype definitions are based on tumor-intrinsic patterns. Alternatively, classifiers can be developed to explicitly include TCF or microenvironment scores as input features, allowing the algorithm to learn and adjust for composition-based patterns [24]. Emerging tools now provide “microenvironment-adjusted” calls that recalibrate traditional subtype probabilities based on sample purity, preventing the misclassification of low-purity tumors into stroma-rich subtypes by default.
5.3. Normalization strategies
Standard RNA-seq normalization methods operate under the assumption that most genes are not differentially expressed, an assumption violated in tumors where thousands of genes show cell-type-specific expression. Methods like TMM (trimmed mean of M-values) can thus introduce bias. Alternative strategies include using exogenous spike-in controls or a carefully curated set of housekeeping genes expressed uniformly across all cell types within the tissue. Another approach is to perform quantile normalization within groups of samples binned by similar TCF, reducing purity-driven global expression shifts. The most robust, albeit computationally intensive, method is a two-step deconvolution and re-constitution process: first estimate cell-type proportions and expression profiles, then mathematically reconstitute the expression data to a standardized, high-purity reference, enabling fair comparisons [25].
5.4. Experimental design recommendations
Proactive design is the best defense against TCF confounding. During cohort assembly, researchers should stratify samples based on TCF to ensure balanced representation across experimental groups. Reporting TCF estimates as mandatory metadata alongside raw sequencing data in public repositories is essential for reproducibility and re-analysis. Crucially, key findings from bulk analyses should be validated using orthogonal, composition-aware methods such as single-cell RNA-seq, spatial transcriptomics, or on flow-sorted cell populations whenever feasible. Finally, recognizing intra-tumoral heterogeneity, multi-region sampling should be employed in studies aiming to characterize the tumor microenvironment or clonal evolution, as a single biopsy may grossly misrepresent the overall tumor’s cellular and genomic landscape.
6. Clinical implications and future directions
6.1. Diagnostic testing standards
The implications of tumor cell fraction (TCF) extend directly into the clinical diagnostic arena, where molecular tests guide critical treatment decisions. Assays for somatic mutation profiling, fusion detection, and gene expression signatures, whether based on next-generation sequencing or other platforms, often stipulate a minimum TCF requirement, typically ranging from 20% to 30% [26]. This threshold is necessary to achieve sufficient analytical sensitivity for detecting low-frequency variants and to ensure the test’s reliability. Failure to meet this threshold risks false-negative results, potentially depriving a patient of a targeted therapy. Therefore, standardized TCF assessment—leveraging both histopathological review and computational estimates from the assay data itself—must be formally integrated into clinical laboratory workflows and reporting frameworks. Pathology reports should explicitly state the estimated TCF alongside genomic findings, providing clinicians with essential context for interpreting the results, such as the potential for missed alterations in low-purity samples.
6.2. Clinical trial design
The influence of TCF necessitates a paradigm shift in the design of clinical trials, particularly in biomarker-driven and immunotherapy studies. First, patient stratification should account for TCF to avoid confounding. In trials testing agents targeting tumor-intrinsic pathways, patients with low-TCF tumors may be misclassified as biomarker-negative due to signal dilution, potentially excluding them from beneficial therapies. Stratifying by TCF or adjusting biomarker scores for purity can mitigate this bias. Second, response assessment criteria may require refinement. For example, in measuring minimal residual disease (MRD) via circulating tumor DNA, the dynamic change in TCF within a metastatic lesion could affect the quantity of shed DNA, independent of tumor cell kill. Furthermore, trials targeting the tumor microenvironment (e.g., anti-fibrotic agents) should consider multi-region sampling to capture intra-tumoral heterogeneity in stromal and immune cell distribution, as a single biopsy may not represent the overall microenvironmental context of the tumor.
6.3. Emerging technologies
Future progress hinges on technologies that either circumvent or more precisely quantify TCF. Single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics resolve cellular composition by design, providing an unambiguous view of gene expression within each cell type and their spatial relationships [27]. While currently prohibitive for routine large-scale use, they serve as the gold standard for validating bulk-derived inferences and discovering cell-type-specific signatures. Digital pathology powered by artificial intelligence (AI) offers a more immediately scalable solution [28]. AI models can be trained to estimate TCF and classify microenvironmental features directly from standard H&E slides, providing a rapid, reproducible, and cost-effective adjunct to genomic data. Finally, liquid biopsy through circulating tumor DNA (ctDNA) or RNA presents a paradigm-shifting alternative that is intrinsically free from the sampling bias and heterogeneity of a single tumor site, capturing a more global genomic profile unaffected by tissue-based TCF constraints [29-36].
6.4. Open challenges
Despite advances, significant challenges remain. Foremost is the lack of standardization in TCF estimation methods across different genomic platforms and cancer types, hindering cross-study comparisons and meta-analyses. A related challenge is the integration of TCF with other dimensions of heterogeneity, such as clonal genetic diversity and spatial architecture, to form a unified model of the tumor ecosystem. Furthermore, our understanding of the dynamic evolution of TCF during disease progression and in response to therapy is limited; a tumor’s cellular composition at diagnosis may differ radically from that at relapse, with profound implications for subsequent treatment strategies. Finally, ensuring equitable access to these advanced analytical frameworks is crucial. Developing cost-effective, computationally efficient methods for TCF assessment is essential for their adoption in resource-limited settings, ensuring that the benefits of precision oncology can be realized globally.
7. Conclusion
Tumor cell fraction (TCF) has emerged not as a mere technical footnote in genomic analysis, but as a fundamental biological and methodological variable central to the accurate interpretation of cancer data. The proportion of neoplastic cells within a biopsy sample serves as a critical lens, profoundly shaping every downstream analysis derived from bulk sequencing technologies. It acts as a pervasive confounder, systematically skewing measurements of gene expression, where shifts in transcript levels may reflect changes in cellular composition rather than true regulatory events within the cancer cells themselves. This confounding extends to functional annotation, where the assessment of pathway activity, immune infiltration scores, and molecular subtype classification can be heavily influenced by stromal and immune cell admixture, potentially leading to misclassification and flawed biological inference. Most consequentially, TCF impacts clinical correlation studies, where prognostic and predictive biomarkers risk being surrogates for tumor cellularity or stromal reaction rather than genuine tumor-intrinsic biological drivers, threatening their validity and generalizability across diverse patient cohorts.
While the research community has made significant strides in developing methodologies to address this challenge, a notable implementation gap persists. Sophisticated computational tools for estimating TCF—from histopathological image analysis and copy number deconvolution to transcriptomic signature-based approaches—are now widely available. Similarly, statistical techniques to adjust differential expression analyses or correct subtype calls for purity are increasingly accessible. Despite these advancements, awareness and routine application of these corrections remain inconsistent in the published literature. Many studies utilizing public genomic datasets still present bulk-tissue analyses without acknowledging the confounding role of TCF, generating a body of evidence that may be partially obscured or misinterpreted, thereby hindering reproducible and translatable research.
To advance the field toward more rigorous and clinically actionable science, a concerted shift in standard practice is imperative. First, the adoption of standardized, evidence-based guidelines for TCF assessment across different assay types (e.g., WGS, RNA-seq, methylation arrays) is crucial to ensure consistency and comparability between studies. Second, TCF estimates must be elevated to the status of essential sample metadata, routinely reported alongside standard clinical and pathological variables in publications and public data repositories. This transparency is non-negotiable for meta-analyses and the independent validation of findings. Third, the development and adoption of next-generation analytical frameworks that explicitly model the tumor microenvironment’s composition are needed. This involves moving beyond simple post-hoc adjustment to integrated approaches that deconvolve or jointly analyze tumor and microenvironmental signals, providing a clearer, more nuanced view of the cancer ecosystem.
Ultimately, this rigorous, TCF-aware approach is not an academic exercise but a foundational requirement for the success of precision oncology. The goal of translating molecular insights into reliable clinical benefit for patients depends on the accuracy of the initial molecular portrait. By systematically accounting for and correcting the influence of tumor purity, researchers and clinicians can ensure that therapeutic decisions are informed by the true biology of the cancer cells, leading to more accurate diagnostics, robust biomarkers, and effective, personalized treatment strategies. Only through such methodological rigor can we fully realize the promise of cancer genomics.
Author Contributions: Conceptualisation, L.A.B.; software, L.A.B.; investigation, L.A.B.; writing—original draft preparation, L.A.B.; writing—review and editing, L.A.B.; visualisation, L.A.B.; supervision, L.A.B.; project administration, L.A.B. The author has read and agreed to the published version of the manuscript.
Funding: Not applicable.
Acknowledgments: We are grateful to the Department of Biochemistry, College of Science, University of Jeddah, Jeddah-21589 Saudi Arabia for providing us all the facilities to carry out the entire work.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: All the related data are supplied in this work or have been referenced properly.
References
- Yuan Y, Cai L, Zheng X, Dong H, Li X, Wang L. Tumor cell fraction estimation and correction in genomic profiling. Nat Biotechnol. 2018;36:1056–1065. doi:10.1038/nbt.4187
- Marusyk A, Polyak K. Tumor heterogeneity: Causes and consequences. Biochim Biophys Acta. 2010;1805(1):105–117. doi:10.1016/j.bbcan.2009.11.002
- Aran D, Sirota M, Butte AJ. Systematic pan-cancer analysis of tumour purity. Nat Commun. 2015;6:8971. doi:10.1038/ncomms9971
- Bankhead P, Loughrey MB, Fernández JA, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017;7:16878. doi:10.1038/s41598-017-17204-5
- Carter SL, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30(5):413–421. doi:10.1038/nbt.2203
- Van Loo P, Nordgard SH, Lingjærde OC, et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci USA. 2010;107(39):16910–16915. doi:10.1073/pnas.1009843107
- Riester M, Singh AP, Brannon AR, et al. PureCN: copy number calling and SNV classification using targeted short read sequencing. Bioinformatics. 2014;30(17):3151–3153. doi:10.1093/bioinformatics/btu520
- Zheng X, Zhang N, Wu HJ, Wu H. Estimating and accounting for tumor purity in the analysis of DNA methylation data from cancer studies. Genome Biol. 2017;18:17. doi:10.1186/s13059-016-1143-5
- Newman AM, Steen CB, Liu CL, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019;37:773–782. doi:10.1038/s41587-019-0114-2
- Yoshihara K, Shahmoradgoli M, Martínez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun. 2013;4:2612. doi:10.1038/ncomms3612
- Finotello F, Mayer C, Plattner C, et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med. 2019;11:34. doi:10.1186/s13073-019-0638-6
- Quon G, Haider S, Deshwar AG, Cui A, Boutros PC, Morris Q. Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Biol. 2013;14(7):r7. doi:10.1186/gb-2013-14-7-r7
- Zheng X, Liu Y, Chen L, et al. Tumor purity as a critical factor in cancer system biology. Cell Syst. 2021;12(11):1024–1046. doi:10.1016/j.cels.2021.08.010
- Wang X, Li M, Hu L, et al. The impact of tumor purity on molecular pathological epidemiology. Cancer Res. 2019;79(21):5362–5373. doi:10.1158/0008-5472.CAN-19-1381
- Ali HR, Chlon L, Pharoah PDP, Markowetz F, Caldas C. Patterns of immune infiltration in breast cancer and their clinical implications: A gene-expression-based retrospective study. Genome Biol. 2016;17:218. doi:10.1186/s13059-016-1070-5
- Sturm G, Finotello F, List M. Immunedeconv: An R package for unified access to computational methods for estimating immune cell fractions from bulk RNA-sequencing data. eLife. 2019;8:e45312. doi:10.7554/eLife.45312
- Wang J, Ma A, Ma Y, et al. Systematic evaluation of tumor purity impact on cancer genomic analysis. Nucleic Acids Res. 2020;48(5):e47. doi:10.1093/nar/gkaa076
- Thorsson V, Gibbs DL, Brown SD, et al. The immune landscape of cancer. Immunity. 2018;48(4):812–830.e14. doi:10.1016/j.immuni.2018.03.023
- Venet D, Dumont JE, Detours V. Most random gene expression signatures are significantly associated with breast cancer outcome. Sci Transl Med. 2011;3(101):101ra114. doi:10.1126/scitranslmed.3002564
- Jang IS, Neto EC, Guinney J, Friend SH, Margolin AA. Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data. Nat Commun. 2017;8:15078. doi:10.1038/ncomms15078
- Patel SP, Kurzrock R. PD-L1 Expression as a Predictive Biomarker in Cancer Immunotherapy. Mol Cancer Ther. 2015;14(4):847–856. doi:10.1158/1535-7163.MCT-14-0983
- Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi:10.1186/s13059-014-0550-8
- Li B, Severson E, Pignon JC, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Nat Methods. 2016;13(9):921–925. doi:10.1038/nmeth.3960
- Aran D, Hu Z, Butte AJ. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 2017;18(1):220. doi:10.1186/s13059-017-1349-1
- Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12(5):453–457. doi:10.1038/nmeth.3337
- Jennings LJ, Arcila ME, Corless C, et al. Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. Arch Pathol Lab Med. 2017;141(10):1404–1416. doi:10.5858/arpa.2016-0542-CP
- Navin NE. The first five years of single-cell cancer genomics and beyond. Annu Rev Genomics Hum Genet. 2014;15:443–459. doi:10.1146/annurev-genom-090413-025449
- Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology – new tools for diagnosis and precision oncology. Nat Cancer. 2021;2(6):556–568. doi:10.1038/s43018-021-00214-6
- Wan JCM, Massie C, Garcia-Corbacho J, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer. 2017;17(4):223–238. doi:10.1038/nrc.2017.7
- McGranahan N, Swanton C. Clonal Heterogeneity and Tumor Evolution: Past, Present, and the Future. Cell. 2017;171(3):613–628. doi:10.1016/j.cell.2017.01.018
- Tirosh I, Izar B, Prakadan SM, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352(6282):189–196. doi:10.1126/science.aad0501
- Roerink SF, Sasaki N, Lee-Six H, et al. Intra-tumour diversification in colorectal cancer at the single-cell level. Nature. 2018;556(7702):457–462. doi:10.1038/s41586-018-0024-3
- Avraham E, Chozick C, Lee J, et al. Tumor purity-corrected somatic copy number alterations enhance the predictiveness of prognostic biomarkers. Cell Rep. 2020;31(13):107550. doi:10.1016/j.celrep.2020.107550
- Chakravarthy A, Furness A, Joshi K, et al. Pan-cancer deconvolution of tumour composition using DNA methylation. Cancer Cell. 2018;33(5):776–792.e3. doi:10.1016/j.ccell.2018.03.014
- Bao X, Shi R, Zhao T, Wang Y, Anastasov N, Rosemann M. Integrated analysis of single-cell RNA-seq and bulk RNA-seq unravels tumor heterogeneity plus M2-like tumor-associated macrophage infiltration and aggressiveness in TNBC. Nat Commun. 2021;12:5368. doi:10.1038/s41467-021-25682-5
- Tashkandi M.A., Refai M.Y., Baz L.A., Baeissa H.M., Barqawi, A.A., and Shamra, P.K. Altered gene expression pattern due to different tumor percentage affects functions. Glob. Jour. Bas. Sci. 2025, 1(6). 1-9.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of Global Journal of Basic Science and/or the editor(s). Global Journal of Basic Science and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: © 2025 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
![]()
