jbs-20250063.html - Global Journal of Basic Science

Review

From genome sequences to epigenomes: Computational strategies shaping modern genomics

Heikham Russiachand Singh ^1,2*

¹ Department of Plant Science, McGill University, Raymond Building, 21111, Lakeshore Road, Ste. Anne de Bellevue, Quebec, Canada..

* Correspondence: heikham.singh@mail.mcgill.ca (H.R.S.)

Citation: Singh, H.R. From genome sequences to epigenomes: Computational strategies shaping modern genomics. Glob. Jour. Bas. Sci. 2025, 1(12). 1-7.

Received: July 29, 2025

Revised: October 03, 2025

Accepted: October 19, 2025

Published: October 20, 2025

doi: 10.63454/jbs20000063

ISSN: 3049-3315

Volume 1; Issue 12

Download PDF file

Abstract: The rapid evolution of high-throughput sequencing technologies has fundamentally transformed genomics, shifting the field from simple genome sequence analysis to the integrative exploration of epigenomic regulation. Modern genomics now operates at the intersection of massive genomic datasets, epigenetic modifications, and advanced computational frameworks that enable meaningful biological interpretation. Computational strategies play a pivotal role in decoding DNA sequences, identifying regulatory elements, modeling chromatin organization, and integrating multi-omics layers such as transcriptomics, methylomics, and histone modifications. This review provides a comprehensive overview of computational approaches that bridge genome sequences and epigenomes, highlighting key algorithms, machine learning methods, data integration frameworks, and emerging challenges. We also discuss future directions in computational genomics, emphasizing scalability, interpretability, and clinical translation.

Keywords: Computational genomics; epigenomics; next-generation sequencing; machine learning; multi-omics integration

1. Introduction

The completion of the Human Genome Project in 2003 marked a seminal milestone in biological research, providing the first comprehensive reference blueprint of the human DNA sequence [1]. However, it soon became evident that this static, linear genome sequence alone could not fully explain the remarkable complexity of cellular identity, phenotypic diversity, or individual susceptibility to disease. A critical puzzle emerged from the observation that cells within a multicellular organism share an identical DNA sequence yet exhibit strikingly different morphologies, functions, and fates. This fundamental biological paradox is resolved by epigeneticsâ€”a suite of heritable, regulatory mechanisms that modulate gene expression patterns without altering the underlying nucleotide sequence [2]. Epigenetic machinery, including DNA methylation, histone modifications, chromatin accessibility, and non-coding RNA interactions, dynamically shapes the functional architecture of the genome, acting as a crucial interface between genotype and phenotype. The recognition of epigenetics as a central determinant of cellular state catalyzed a paradigm shift in molecular biology, moving beyond static genome cataloging toward dynamic, regulation-centric epigenomic research aimed at deciphering the “second code” that governs genomic function.

This shift has been propelled and empowered by the exponential growth of next-generation sequencing (NGS) technologies, which have enabled the systematic and comprehensive profiling of both genomes and epigenomes at unprecedented resolution and scale [3]. A powerful arsenal of high-throughput techniquesâ€”including whole-genome sequencing for genetic variation, RNA sequencing for transcriptomes, ChIP-seq for protein-DNA interactions (e.g., histone marks, transcription factors), ATAC-seq for chromatin accessibility, and bisulfite sequencing for DNA methylationâ€”now routinely generates vast, multidimensional volumes of data. These complex datasets, often comprising billions of short reads, demand sophisticated computational and statistical tools for their accurate processing, normalization, integration, and biological interpretation [4]. Consequently, computational genomics has evolved from a supportive discipline into a cornerstone of modern biological and biomedical research. It provides the essential algorithms, computational frameworks, and analytical pipelines required to convert raw, high-dimensional sequencing data into actionable biological insights and testable hypotheses.

Against this backdrop, this review explores the advanced computational strategies that bridge the gap between the static genome sequence and the dynamic, regulatory epigenomic landscapes that define cellular function. We first contextualize the core data generation technologies that fuel this field. Subsequently, we delve into the computational pipelines for primary data analysis, followed by an examination of the statistical and machine learning approaches critical for pattern discovery and predictive modeling from epigenomic data. We then discuss the challenges and frameworks for the integrative analysis of multi-omics data, which is key to a holistic understanding of regulatory systems. Finally, we highlight real-world applications of these computational epigenomics approaches in elucidating the mechanisms of health, development, and disease, underscoring their transformative impact on biomedical discovery and precision medicine.

2. Genome sequencing technologies and data characteristics

Genome sequencing technologies form the foundational bedrock of modern genomics, with each successive generation offering transformative leaps in capability. Early Sanger sequencing, a capillary electrophoresis-based method, provided gold-standard accuracy for individual DNA fragments but was fundamentally limited in throughput and cost-effectiveness for whole-genome applications (Figure 1). The advent of next-generation sequencing (NGS), most notably platforms like Illuminaâ€™s reversible dye-terminator chemistry, revolutionized the field by enabling massively parallel, short-read sequencing at a drastically reduced cost per base, democratizing large-scale genomic studies [5]. This “short-read” era (typically 50-300 base pairs) excels at detecting single-nucleotide variants and quantifying transcript abundance. More recently, third-generation or long-read sequencing technologies, such as PacBioâ€™s Single Molecule, Real-Time (SMRT) sequencing and Oxford Nanoporeâ€™s nanopore-based electronic sensing, have broken the short-read barrier. By sequencing individual DNA molecules that are tens to hundreds of kilobases long, these platforms facilitate de novo genome assembly with dramatically improved continuity, resolve complex structural variants, and directly detect base modifications, thereby providing a more complete picture of genomic architecture [6]. Â

Figure 1. From genome sequences to epigenomes: computational strategies shaping modern genomics. This is an overview of computational strategies in modern genomics, integrating DNA sequence data, epigenetic marks, machine learning models, multi-omits data, and applications in health.

Each platform generates data with distinct, platform-specific characteristics that critically inform computational strategies. Read length, error profiles (e.g., Illumina’s low random substitution errors vs. Nanopore’s context-dependent indel errors), and coverage biases (e.g., in GC-rich or repetitive regions) necessitate tailored preprocessing. Essential computational steps include rigorous quality control (e.g., using FastQC), adapter/artifact trimming, and platform-specific error correction or signal base-calling. Subsequently, two primary computational tasks are performed: de novo assembly and read alignment. Assembly algorithms, such as those based on de Bruijn graphs (ideal for short, high-coverage reads) or Overlap-Layout-Consensus (OLC) methods (suited for long reads), computationally reconstruct the complete genome sequence from millions of overlapping fragments [8]. Alternatively, for organisms with a reference genome, alignment tools (e.g., BWA, Bowtie for short reads; Minimap2 for long reads) map reads to the reference, identifying matches and variations. These computational steps establish the precise genomic coordinate systemâ€”the essential framework upon which all subsequent epigenomic annotations and analyses are precisely overlaid and interpreted.

3. Epigenomic landscapes: biological foundations

Epigenomics encompasses a diverse and interconnected set of heritable molecular modifications that regulate chromatin architecture and gene activity, effectively writing a dynamic regulatory layer upon the static DNA sequence. Core mechanisms include: (1) DNA methylation, typically the addition of a methyl group to cytosine in a CpG dinucleotide, generally associated with transcriptional repression and genomic stability; (2) Histone modifications, a complex vocabulary of post-translational marks (e.g., acetylation, methylation, phosphorylation) on histone tails that alter chromatin compaction and recruit effector proteins; (3) Chromatin accessibility, the physical openness of chromatin dictating the binding accessibility for transcription factors and polymerases; and (4) Three-dimensional genome organization, encompassing loops, topologically associating domains (TADs), and nuclear compartmentalization that bring distal regulatory elements into physical proximity with target genes [9]. Crucially, these features are not static; they form a highly dynamic and context-dependent “epigenomic landscape” that varies profoundly across distinct cell types, developmental timepoints, and in response to environmental stimuli, thereby encoding cellular identity and plasticity [10].

High-throughput assays have been developed to map each layer genome-wide. Bisulfite sequencing (e.g., WGBS, RRBS) chemically converts unmethylated cytosines to uracils, allowing single-base-resolution profiling of DNA methylation [11]. Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) uses antibodies to immunoprecipitate DNA bound by specific histone modifications or transcription factors, identifying their genomic binding sites. Assays for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and DNase-seq directly probe chromatin accessibility by leveraging enzymatic or chemical cleavage of open regions, revealing active promoters, enhancers, and other cis-regulatory elements [12]. Computational analysis is indispensable for interpreting these datasets, as the raw signals are often subtle, confounded by technical noise (e.g., antibody specificity in ChIP-seq, insertion bias in ATAC-seq), and exhibit complex correlations both with each other and with underlying genomic sequence features like nucleotide composition and repetitiveness.

4. Computational pipelines for epigenomic data analysis

To ensure reproducibility and accuracy, the field relies on standardized computational pipelines (Figure 2) for processing raw epigenomic data into interpretable genomic features. These pipelines follow a series of modular steps: (1) Read Alignment, mapping sequenced reads to a reference genome using splice-aware aligners for RNA-seq or standard aligners for DNA-based assays; (2) Peak Calling, a critical step for enrichment-based assays (ChIP-seq, ATAC-seq) where statistical algorithms identify genomic regions with significantly higher read density than background noise. Widely used tools like MACS2 model the shift in paired-end reads to pinpoint transcription factor binding sites or nucleosome-depleted regions with high precision [14]; (3) Normalization, which corrects for technical confounders such as sequencing depth (library size), GC bias, and batch effects to enable valid cross-sample comparisons [15]; and (4) Quality Assessment, evaluating metrics like signal-to-noise ratio, fragment size distribution, and reproducibility between replicates. Â

Figure 2. Computational pipelines for epigenomic data analysis and machine learning in computational genomics. It shows the visualisation of computational pipelines for processing epigenomic data, including alignment and peak calling, and the role of machine learning models in interpretation and prediction.

Downstream analysis involves annotating and interpreting these called features within a biological context. This includes classifying peaks relative to genomic annotations (promoters, enhancers, gene bodies, insulators) [16], comparing differential activity/occupancy between conditions, and visualizing the data. Interactive genome browsers, such as the UCSC Genome Browser and Integrative Genomics Viewer (IGV), are crucial for intuitive visual exploration, allowing researchers to overlay multiple epigenomic tracks with gene annotations and genomic variants [17]. The development of containerized, workflow-managed pipelines (e.g., using Nextflow, Snakemake, or nf-core) has been pivotal in scaling analyses for large international consortia like ENCODE and Roadmap Epigenomics, ensuring robust, version-controlled, and reproducible science.

5. Machine learning in computational genomics

Machine learning (ML) has transitioned from a novel tool to a central component of computational genomics, providing powerful methods for pattern recognition, classification, and prediction within high-dimensional, complex biological data [18]. Supervised learning approaches (e.g., random forests, support vector machines) are extensively applied to tasks where labeled training data exists, such as classifying genomic regions as active promoters or enhancers based on histone modification patterns, predicting methylation states from sequence context, or discerning driver from passenger genetic mutations [19]. Unsupervised methods (e.g., clustering, principal component analysis, hidden Markov models) are vital for discovering latent structure without pre-defined labels, enabling the identification of novel regulatory modules, cell-type-specific chromatin states, or the segmentation of the genome into functional domains based on combinatorial epigenetic marks [20].

Recently, deep learning architectures have achieved state-of-the-art performance by automatically learning hierarchical feature representations from raw genomic data. Convolutional neural networks (CNNs) excel at scanning DNA sequence for predictive motifs and patterns, while recurrent neural networks (RNNs) can model dependencies in sequential data. These models have demonstrated remarkable ability in tasks such as predicting transcription factor binding, chromatin accessibility, and histone modifications directly from DNA sequence alone, thereby helping to decipher the “regulatory code” or “grammar” embedded in the genome [21, 22]. However, a significant challenge persists in translating the high predictive accuracy of these often opaque “black-box” models into mechanistically interpretable biological insights. The field is thus actively pursuing explainable AI (xAI) techniques to extract learned sequence motifs and regulatory rules, ensuring that model predictions can be grounded in testable biological hypotheses.

6. Integrative multi-omics frameworks

A systems-level understanding of genome regulation necessitates moving beyond single data layers to integration across multiple omics modalities. Integrative multi-omics frameworks combine genomic, epigenomic, transcriptomic, and often proteomic data to construct unified, causal models of cellular state and function [23]. This integration is computationally challenging due to differences in data scale, type (discrete vs. continuous), noise, and dimensionality. Computational strategies to address this include: (1) Network-Based Models, which construct gene regulatory or co-expression networks where nodes represent genes and edges are inferred from correlations or probabilistic causal relationships across data types; (2) Matrix Factorization Techniques (e.g., joint non-negative matrix factorization, iNMF), which simultaneously decompose multiple omics matrices into shared and dataset-specific latent factors, revealing conserved patterns across modalities [24]; and (3) Statistical Correlation and Mediation Analysis, which help infer potential causal pathways, such as whether a genetic variant (QTL) influences a phenotype by altering DNA methylation (meQTL), which in turn modulates gene expression (eQTL) [25].

The advent of single-cell multi-omics technologies (e.g., scATAC-seq, scRNA-seq, or co-assays like CITE-seq and SHARE-seq) has elevated this challenge and opportunity by capturing multiple layers of information from individual cells [26]. This allows for the direct construction of regulatory networks within specific cell types and the dissection of heterogeneity within seemingly uniform populations. Computational strategies for single-cell multi-omics must therefore also contend with extreme sparsity, technical dropout, and the need for sophisticated integration and alignment of data across modalities at the cellular level. Effective integrative frameworks are ultimately essential for translating the descriptive catalogues of omics projects into predictive models of development, disease pathogenesis, and therapeutic response. Â

7. Three-dimensional genome organization

The spatial organization of the genome within the nucleus is a fundamental, non-linear regulatory layer that profoundly influences gene expression. It dictates which distal regulatory elements, like enhancers, physically interact with target promoters via looping, bringing them into close proximity despite genomic separation. Chromatin conformation capture technologies, such as Hi-C and its derivatives (micro-C, HiChIP), have been instrumental in mapping these interactions genome-wide, generating complex matrices of contact frequencies [27]. These maps reveal critical architectural features: Topologically Associating Domains (TADs), which are self-interacting, megabase-sized regions that largely insulate regulatory activity from neighboring domains, and long-range chromatin loops that connect specific regulatory sequences (Figure 3).

Figure 3. Three-dimensional genome organisation: This is the illustration of 3D genome organisation within the nucleus, mediated by chromatin capture technologies, highlighting regulatory loops and topologically associating domains (TADs).

A primary computational task involves reconstructing plausible three-dimensional genome structures from these two-dimensional interaction maps. Algorithms for this purpose range from constraint-based modeling, which treats the genome as a polymer chain and uses contact frequencies as spatial constraints, to more advanced probabilistic and optimization-based approaches [28]. Modeling 3D organization presents significant computational challenges due to the inherent sparsity of interaction data (especially at high resolution), the immense scale of the genome, and the dynamic, cell-to-cell variability of chromatin architecture. To interpret this complexity, researchers employ graph-based models, where genomic bins are nodes and interaction frequencies are weighted edges, facilitating the identification of communities (like TADs) and hubs. Complementarily, polymer physicsâ€“based models simulate chromatin as a fiber with specific physical properties (e.g., bead-spring models), offering mechanistic insights into the principles driving folding [29]. The true power of 3D genomics emerges from its integration with linear epigenomic profiles; by overlaying histone modifications, transcription factor binding, and accessibility data onto 3D structures, researchers can directly link spatial proximity to functional regulatory outcomes, moving from correlation to mechanistic understanding.

8. Applications in health and disease

Computational genomics has fundamentally transformed biomedical research by providing the tools to decipher the non-coding genome and its role in disease. Genome-Wide Association Studies (GWAS) have identified thousands of genetic variants statistically linked to complex diseases, but the vast majority reside in non-coding regions, implicating dysregulation rather than protein-coding changes [31]. Integrative computational approaches are therefore essential to prioritize functional variants; by intersecting GWAS loci with epigenomic annotations (e.g., enhancer marks in relevant cell types), chromatin accessibility QTLs (caQTLs), and expression QTLs (eQTLs), researchers can pinpoint which variants are likely to alter transcription factor binding or chromatin state, thereby elucidating disease mechanisms [30, 32].

In oncology, epigenomic alterationsâ€”including widespread DNA hypomethylation, localized promoter hypermethylation, and reshaped histone modification landscapesâ€”are recognized as hallmarks of cancer, driving tumorigenesis, metastasis, and therapeutic resistance [33]. Computational analysis of large-scale projects like The Cancer Genome Atlas (TCGA) enables the identification of epigenomic subtypes of cancer, the discovery of non-invasive biomarkers (e.g., circulating tumor DNA methylation patterns), and the prediction of drug response, directly supporting precision medicine initiatives [34]. Beyond cancer, epigenomic profiling has illuminated the pathogenesis of neurodevelopmental disorders (e.g., mutations in chromatin remodelers like CHD8 in autism), neurological diseases (e.g., histone acetylation imbalances in neurodegenerative conditions), and complex cardiometabolic traits, revealing how environmental factors write a “molecular memory” onto the genome that influences disease risk [35].

9. Challenges and limitations

Despite remarkable progress, the field of computational genomics faces persistent and evolving challenges. Data heterogeneityâ€”arising from diverse platforms, experimental protocols, and analytical pipelinesâ€”creates significant obstacles for integration and meta-analysis. Batch effects can be confounded with biological signals, while limited sample sizes for rare cell types or diseases constrain statistical power and the generalizability of machine learning models [36]. Scalability remains a pressing concern as datasets grow exponentially in size (with long-read and single-cell data) and complexity (multi-omics at spatial resolution), demanding novel algorithms and efficient computing infrastructure [37].

Equally critical are challenges in scientific practice. Ensuring reproducibility and transparency requires rigorous documentation of code, parameters, and workflows, with a growing push toward containerization and workflow management systems to guarantee consistent results [38]. Furthermore, the translation of genomic insights raises important ethical considerations, including patient data privacy, informed consent for data re-use, potential for genetic discrimination, and the equitable implementation of genomic medicine to avoid exacerbating health disparities [39]. Addressing this multifaceted set of challenges necessitates sustained methodological innovation, investment in computational resources, and deep interdisciplinary collaboration among biologists, computational scientists, clinicians, and ethicists.

10. Future perspectives

Future advances in computational genomics will be driven by convergence across several frontiers. Algorithmic development will focus on improved scalability and interpretability, particularly for deep learning models, where techniques from explainable AI (xAI) will be crucial to extract biologically meaningful insights from “black box” predictors [40]. A key paradigm shift will be the integration of artificial intelligence with mechanistic modeling; hybrid approaches that combine the pattern recognition power of machine learning with the causal, biophysical principles of systems biology may yield predictive frameworks capable of simulating regulatory responses to genetic or environmental perturbation [41-50].

Technologically, the field will be shaped by the explosion of single-cell and spatial multi-omics data. Computational strategies must evolve to integrate transcriptomic, epigenomic, and proteomic data within their native tissue architecture, unraveling cell-cell communication networks and the spatial regulation of gene expression [42]. Ultimately, the central goal remains translation into clinical applications. This will require moving beyond association to causation, developing robust diagnostic and prognostic models validated in diverse populations, and creating user-friendly computational tools that can be deployed in clinical settings. Robust, standardized computational strategies will be the linchpin for realizing the promise of truly personalized and precision medicine [43].

11. Conclusion

The transition from static genome sequences to dynamic, multi-layered epigenomes represents a defining evolution in our understanding of biological regulation. Computational strategies have served as the indispensable backbone of this transformation, providing the analytical frameworks to process, integrate, and interpret vast, heterogeneous datasets. By bridging the fundamental sequence of DNA with the complex, context-dependent information encoded in chromatin modifications, spatial architecture, and transcriptional output, computational genomics has furnished powerful new tools to deconstruct the logic of development, physiology, and disease pathogenesis. As data generation continues to accelerate in scale and resolution, continued innovation in computational methodsâ€”prioritizing robustness, interpretability, and integrationâ€”will undoubtedly shape the next decade of discovery, driving genomics from a descriptive science toward a predictive and ultimately therapeutic discipline.

Author Contributions: Conceptualisation, H.R.S.; software, H.R.S.; investigation, H.R.S.; writingâ€”original draft preparation, H.R.S.; writingâ€”review and editing, H.R.S.; visualisation, H.R.S.; supervision, H.R.S.; project administration, H.R.S. The author has read and agreed to the published version of the manuscript.

Funding: Not applicable.

Acknowledgments: We are grateful to the Department of Plant Science, McGill University, Raymond Building, 21111, Lakeshore Road, Ste. Anne de Bellevue, Quebec, Canada for providing us all the facilities to carry out the entire work.

Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Institutional Review Board Statement: We have already mentioned in details in the method section.

Informed Consent Statement: We have already mentioned in details in the method section.

Data Availability Statement: All the related data are supplied in this work or have been referenced properly.

References

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921.
Bird A. Perceptions of epigenetics. Nature. 2007 May 24;447(7143):396-8.
Metzker ML. Sequencing technologies – the next generation. Nat Rev Genet. 2010 Jan;11(1):31-46.
Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387-402.
Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7.
Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nat Rev Genet. 2020 Oct;21(10):597-614.
Andrews S. FastQC: a quality control tool for high throughput sequence data. Cambridge: Babraham Institute; 2010. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60.
Allis CD, Jenuwein T. The molecular hallmarks of epigenetic control. Nat Rev Genet. 2016 Aug;17(8):487-500.
Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015 Feb 19;518(7539):317-30.
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009 Nov 19;462(7271):315-22.
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 Dec;10(12):1213-8.
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015 Feb 19;518(7539):317-30.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40.
Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012 Mar;9(3):215-6.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006.
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015 Jun;16(6):321-32.
Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS, et al. A method to predict the impact of regulatory variants from DNA sequence. Nat Genet. 2015 Aug;47(8):955-61.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003 Jan;3:993-1022.
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015 Aug;33(8):831-8.
Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016 Jul;26(7):990-9.
Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017 May 18;18(1):83.
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014 Mar;11(3):333-7.
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006 Mar 6;7 Suppl 1:S7.
Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019 May;20(5):257-72.
Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009 Oct 9;326(5950):289-93.
Dekker J, Mirny L. The 3D genome as moderator of chromosomal communication. Cell. 2016 Mar 10;164(6):1110-21.
Di Pierro M, Cheng RR, Aiden EL, Wolynes PG, Onuchic JN. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture. Proc Natl Acad Sci U S A. 2017 Nov 7;114(46):12126-31.
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012 Sep 7;337(6099):1190-5.
Farh KK, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015 Feb 19;518(7539):337-43.
Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011 May 5;473(7345):43-9.
Feinberg AP, Koldobskiy MA, GÃ¶ndÃ¶r A. Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nat Rev Genet. 2016 May;17(5):284-99.
Esteller M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet. 2007 Apr;8(4):286-98.
Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, et al. Global epigenomic reconfiguration during mammalian brain development. Science. 2013 Aug 9;341(6146):1237905.
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010 Oct;11(10):733-9.
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big Data: Astronomical or Genomical? PLoS Biol. 2015 Jul 7;13(7):e1002195.
Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten simple rules for reproducible computational research. PLoS Comput Biol. 2013 Oct 24;9(10):e1003285.
Shabani M, Borry P. Challenges of web-based personal genomic data sharing. Life Sci Soc Policy. 2015;11:3.
Angermueller C, PÃ¤rnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016 Jul 29;12(7):878.
Karr JR, Sanghvi JC, Macklin DN, Gutschow MV, Jacobs JM, Bolival B Jr, et al. A whole-cell computational model predicts phenotype from genotype. Cell. 2012 Jul 20;150(2):389-401.
LÃ¤hnemann D, KÃ¶ster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020 Jan 31;21(1):31.
Ashley EA. Towards precision medicine. Nat Rev Genet. 2016 Sep;17(9):507-22.
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015 Oct;12(10):931-4.
Buenrostro JD, Corces MR, Lareau CA, Wu B, Schep AN, Aryee MJ, et al. Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation. Cell. 2018 May 17;173(6):1535-1548.e16.
Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, et al. The Human Cell Atlas. Elife. 2017 Dec 5;6:e27041.
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74.
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015 Feb 19;518(7539):317-30.
Kelsey G, Feil R. New insights into establishment and maintenance of DNA methylation imprints in mammals. Philos Trans R Soc Lond B Biol Sci. 2013 Jan 5;368(1609):20110336.
Green ED, Gunter C, Biesecker LG, Di Francesco V, Easter CL, Feingold EA, et al. Strategic vision for improving human health at The Forefront of Genomics. Nature. 2020 Oct;586(7831):683-92.

Disclaimer/Publisherâ€™s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of Global Journal of Basic Science and/or the editor(s). Global Journal of Basic Science and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.Â

Copyright: Â© 2025 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Bibliography