jbs-20250062.html - Global Journal of Basic Science

Review

Metagenomics: Concepts, technologies, and applicationsâ€”A comprehensive review

Shiful Islam ¹, and Rifaquat Ahmed ^1,2*

¹ Department of Biotechnology, Faculty of Natural Science, Norwegian University of Science and Technology, Trondheim 7491, Norway.

² Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim 7491, Norway.

* Correspondence: saifparvez95@gmail.com (S.I.)

Citation: Islam, S. and Ahmed, R. Metagenomics: Concepts, technologies, and applicationsâ€”A comprehensive review. Glob. Jour. Bas. Sci. 2025, 1(10). 1-7.

Received: June 22, 2025

Revised: July 29, 2025

Accepted: August 07, 2025

Published: August 08, 2025

doi: 10.63454/jbs20000062

ISSN: 3049-3315

Volume 1; Issue 12

Download PDF file

Abstract: Metagenomics has emerged as a transformative approach in modern biological sciences, enabling the culture-independent analysis of genetic material recovered directly from environmental samples. By bypassing the limitations of traditional cultivation-based microbiology, metagenomics provides unprecedented insights into the taxonomic composition, functional potential, and ecological roles of complex microbial communities. Advances in high-throughput sequencing technologies, bioinformatics, and computational frameworks have accelerated metagenomic research across diverse domains, including environmental science, human health, agriculture, and biotechnology. This review comprehensively discusses the fundamental concepts of metagenomics, key sequencing and analytical technologies, data analysis pipelines, and major applications. Additionally, current challenges, ethical considerations, and future perspectives of metagenomic research are critically examined.

Keywords: Metagenomics; microbial diversity; shotgun sequencing; microbiome; bioinformatics analysis; functional annotation; environmental and clinical applications

1. Introduction

Microorganisms constitute the most abundant and phylogenetically diverse forms of life on Earth, serving as indispensable agents in critical global processes such as biogeochemical cycling, ecosystem stability, and the maintenance of host health. This immense diversity, however, has remained largely hidden from scientific inquiry. It is estimated that more than 99% of microbial species resist cultivation using standard laboratory techniques, creating a fundamental barrier known as the “great plate count anomaly” [1,2]. This cultivation bottleneck has severely constrained our understanding of the true scope of microbial diversity, their metabolic capabilities, and their complex functional roles in natural and host-associated environments.Â

Metagenomics emerged as a revolutionary paradigm to transcend this limitation. By enabling the direct extraction, sequencing, and analysis of total genomic DNA from environmental samplesâ€”such as soil, seawater, or the human gutâ€”this approach bypasses the need for cultivation. It thereby provides an unbiased, holistic view of the genetic potential and taxonomic composition of entire microbial communities [3]. The term itself was coined in the late 1990s, capturing the essence of studying the collective genome (the metagenome) of a habitat.Â

Fueled by continuous breakthroughs in high-throughput next-generation sequencing (NGS) technologies, advanced computational algorithms, and sophisticated systems-level analytical frameworks, the field of metagenomics has evolved at a remarkable pace [4]. It has matured from a novel concept into a cornerstone of modern microbiome science. Today, metagenomic applications are reshaping foundational knowledge across disciplines, driving transformative discoveries in microbial ecology, evolutionary biology, and the intricate dynamics of host-microbe interactions, from symbiosis to pathogenesis [5].

2. Fundamental concepts of metagenomics

The field of metagenomics is built upon a foundational and powerful conceptual shift: rather than studying microorganisms through the lens of isolated cultures, it analyzes the collective genetic material (the metagenome) of an entire microbial community directly harvested from its natural environment. This paradigm is operationalized through a standardized yet sophisticated workflow designed to capture the full genetic essence of a microbiota. The process commences with the isolation of total community DNA from a complex sample matrixâ€”be it environmental (e.g., a gram of soil, a liter of seawater) or host-associated (e.g., a stool sample, a buccal swab). This extracted DNA, representing a pooled genomic library from potentially thousands of co-existing species, is then subjected to high-throughput sequencing. The resulting millions to billions of short DNA sequences (reads) form the raw data for the final, critical stage: computational analysis and biological interpretation using advanced bioinformatics pipelines [6].

This holistic approach provides a critical and multidimensional advantage over targeted molecular methods, most notably 16S ribosomal RNA (rRNA) gene amplicon sequencing. Amplicon-based techniques, which PCR-amplify and sequence a single, conserved phylogenetic marker gene, are highly efficient and cost-effective for answering the primary ecological question of “who is there?” They yield robust, high-resolution profiles of taxonomic composition and community structure (alpha and beta diversity). However, their functional inference is indirect and limited, relying on extrapolation from the identified taxa to reference genomes, which often fails to capture strain-specific functions and entirely misses the vast reservoir of uncharacterized genes. In stark contrast, shotgun metagenomic sequencing shatters and sequences all DNA fragments in a sample randomly, capturing both coding and non-coding regions from every domain of life present. This allows researchers to move decisively from cataloging taxonomy to interrogating “what they are doing and what they could do.” By mapping sequencing reads to functional databases, researchers can directly reconstruct metabolic pathways (e.g., for carbon fixation or methanogenesis), profile the full repertoire of antibiotic resistance genes (the resistome) and virulence factors, discover novel enzymes and biosynthetic gene clusters, and identify regulatory elements, thereby revealing the rules that govern community function and adaptation [7]. Â

Within metagenomics, two primary methodological branches have solidified, each with distinct philosophical and technical approaches, costs, and analytical outcomes. 1. Amplicon-based metagenomics: This targeted approach focuses on PCR amplification and deep sequencing of conserved, phylogenetically informative marker genes. The 16S rRNA gene is the universal cornerstone for bacteria and archaea, while the Internal Transcribed Spacer (ITS) region serves a similar role for fungi. By comparing the sequences of these variable regions to extensive reference databases, researchers can achieve precise taxonomic classification, often down to the genus level, and generate detailed metrics of microbial diversity and community dynamics. Its primary strength is its cost-effectiveness and depth of coverage for answering compositional questions. Its fundamental limitation is its inherent lack of direct functional data; any functional insight is purely inferential, based on the presumed capabilities of the taxonomically identified organisms, which overlooks horizontal gene transfer and the functional novelty within uncultured lineages [8]. 2. Shotgun metagenomics: The comprehensive, hypothesis-agnostic approach sequences the totality of DNA in a sample, generating a complex, fragmented mosaic of genomic material from all resident organismsâ€”bacterial, archaeal, viral, and eukaryotic. The power of this method lies in its capacity for comprehensive functional profiling. Bioinformatic tools allow for the direct annotation of millions of short reads or assembled contigs against functional databases (e.g., KEGG, COG, CAZy), providing a quantitative snapshot of the community’s collective genetic capabilities. Furthermore, through sophisticated assembly algorithms, overlapping short reads can be stitched together into longer contiguous sequences (contigs) and binned into Metagenome-Assembled Genomes (MAGs). MAGs represent draft genomes of uncultured organisms, offering unparalleled insights into the metabolic potential, ecology, and evolution of microbial “dark matter.” While more expensive and computationally demanding than amplicon sequencing, shotgun metagenomics provides the only route to a truly systemic and gene-centric understanding of a microbiome’s functional potential, strain-level variation, and mobile genetic element content, painting a far more detailed and actionable picture of the community’s role in its environment or host [8].

3. Sampling and DNA extraction strategies

The biological and technical validity of any metagenomic study is critically dependent on its initial stages: representative sampling and unbiased nucleic acid extraction. Environmental heterogeneityâ€”such as spatial gradients in soil or temporal fluctuations in marine settingsâ€”along with variability in microbial biomass and the presence of PCR inhibitors like humic acids or bile salts, can significantly skew downstream sequencing results and interpretations [9]. Consequently, meticulous experimental design that accounts for these factors is paramount.

The DNA extraction step itself introduces a major potential source of bias. Protocols must ensure broad and equivalent lysis efficiency across the diverse cell types present in a community. Gram-positive bacteria with thick peptidoglycan layers, spores, fungi with chitinous cell walls, and archaea often require more rigorous lysis conditions than Gram-negative bacteria. Inefficient lysis of these resistant cells leads to their underrepresentation in the final metagenomic library, distorting the perceived community structure [10]. While the development of standardized commercial extraction kits has improved reproducibility across laboratories, there is no universal “one-size-fits-all” solution. Sample-specific optimization and protocol customization remain essential, especially for particularly challenging and complex matrices like soil, sediments, and host-associated microbiomes, where co-extracted contaminants can interfere with sequencing chemistry [11].

4. Sequencing technologies in metagenomics

The explosive growth of metagenomics has been inextricably linked to rapid advancements in DNA sequencing technology. Early pioneering studies in the field relied on Sanger sequencing, a method renowned for its long, accurate reads but ultimately limited for community analysis by its low throughput and prohibitively high cost, restricting studies to small clone libraries [12]. The advent of next-generation sequencing (NGS) platforms, including Illumina, Ion Torrent, and SOLiD, catalyzed a revolution. These technologies enabled massive parallel sequencing of millions to billions of DNA fragments simultaneously, drastically reducing the cost per base and making the deep sequencing of complex environmental samples feasible for the first time [13].

Currently, the field is being further transformed by the maturation of third-generation sequencing technologies, notably PacBioâ€™s Single Molecule, Real-Time (SMRT) sequencing and Oxford Nanopore Technologyâ€™s (ONT) nanopore sequencing. These platforms generate reads that are orders of magnitude longer than NGS short readsâ€”from thousands to millions of bases. These long reads are invaluable for metagenomics, as they dramatically improve the assembly of complete microbial genomes from complex mixtures, resolve repetitive genomic regions that confound short-read assemblers, and enable the detection of large structural variants and epigenetic modifications [14,15]. Recognizing the complementary strengths of each technology, hybrid sequencing strategies that integrate high-accuracy short reads (e.g., Illumina) with long-range scaffolding reads (e.g., PacBio or ONT) are becoming an increasingly powerful standard for generating high-quality, contiguous metagenome-assembled genomes [16].

5. Bioinformatics and data analysis pipelines

The transformation of raw sequencing data into meaningful biological insight requires the orchestration of sophisticated, multi-step computational pipelines. These workflows are designed to handle the immense complexity and volume of metagenomic data, where billions of short DNA reads from potentially thousands of organisms must be deconvoluted, characterized, and interpreted. The standard analytical journey involves a sequence of interdependent stages. It begins with quality control and preprocessing, where raw reads are filtered to remove low-quality bases, sequencing adapters, and contaminating host DNA (in clinical samples). This is followed by de novo sequence assembly, a computationally intensive process where overlapping reads are stitched together to reconstruct longer contiguous sequences (contigs), aiming to approximate the original genomes from the fragmented community DNA. The next stage, binning, employs algorithmic strategiesâ€”often based on sequence composition (k-mer frequency) and abundance profiles across multiple samplesâ€”to cluster these contigs into discrete groups that represent putative genomes of individual microbial populations, known as Metagenome-Assembled Genomes (MAGs). Concurrently or subsequently, taxonomic classification assigns phylogenetic identity to individual reads or assembled contigs, answering the fundamental question of “who is there?” Finally, functional annotation predicts the biological roles of identified genes, bridging sequence data to ecosystem or host phenotype [17].

A robust and ever-evolving ecosystem of specialized bioinformatics tools has been developed to manage this complex workflow. For amplicon-based studies, integrated platforms like QIIME 2 and mothur provide end-to-end environments for processing 16S/ITS rRNA sequences, performing diversity analyses, and conducting statistical testing. For shotgun metagenomics, the toolkit is more modular. Tools like MetaPhlAn and mOTUs use clade-specific marker genes for highly efficient and accurate taxonomic profiling. Assemblers such as MEGAHIT and metaSPAdes are optimized for the uneven coverage and high strain diversity of metagenomic data. Classifiers like Kraken and Bracken use k-mer matching against comprehensive genomic databases for ultra-fast taxonomic assignment of reads, while tools like GTDB-Tk facilitate the classification of MAGs against standardized taxonomic frameworks. The management of these pipelines is increasingly streamlined by workflow managers like Snakemake and Nextflow, which ensure reproducibility and scalability [18, 19].

Functional annotation, a critical endpoint of shotgun metagenomics that unlocks understanding of “what the community can do,” relies heavily on comparisons against curated reference databases that map gene sequences to known biological functions. Comprehensive resources are pivotal: the Kyoto Encyclopedia of Genes and Genomes (KEGG) provides maps of metabolic and signaling pathways; Clusters of Orthologous Groups (COG) and its expanded version, eggNOG, offer functional categorization across a broad spectrum of genes; Pfam and TIGRFAMs define protein families and domains; and specialized databases like CAZy (Carbohydrate-Active Enzymes) and CARD (Comprehensive Antibiotic Resistance Database) address specific ecological or clinical questions. Annotation pipelines like PROKKA, DRAM, and HUMAnN systematically query these databases to infer the metabolic potential, stress responses, virulence factors, and ecological roles encoded within a metagenome or MAG [20].Â

The complexity, high dimensionality, and scale of metagenomic data have necessitated the integration of advanced computational techniques beyond traditional bioinformatics. Machine learning (ML) and deep learning algorithms are now routinely applied to mine these datasets for patterns that are not discernible through conventional statistics. Supervised ML models are trained to identify complex microbial biomarkersâ€”specific taxonomic or functional signaturesâ€”that can predict environmental conditions (e.g., pollution levels), agricultural outcomes (e.g., crop yield), or host disease states (e.g., colorectal cancer, IBD) with high accuracy. Unsupervised methods are used for dimensionality reduction and pattern discovery in complex microbiome datasets. Furthermore, network-based analysis has become a cornerstone of microbial ecology. By constructing co-occurrence or correlation networks (e.g., using SparCC or SPIEC-EASI), researchers can model the intricate web of potential microbial interactions, such as cooperation, competition, and niche partitioning. These networks move the field from descriptive cataloging of parts lists to predictive, systems-level modeling of community stability, assembly rules, and responses to perturbation, representing a shift towards a more mechanistic understanding of microbial ecosystems [21, 22].

6. Applications of metagenomics

6.1 Environmental metagenomics
Metagenomics has fundamentally revolutionized environmental microbiology by serving as a powerful lens to reveal the staggering, and previously hidden, diversity of microbial life across the planetâ€™s biomes. By analyzing the collective genomes of microbial communities in habitats ranging from deep-sea hydrothermal vents and polar ice caps to rainforest soils and acid mine drainage, researchers have discovered entirely new phyla and metabolic strategies. In marine environments, for instance, metagenomic studies have been pivotal in identifying novel phototrophic pathways, such as proteorhodopsins in uncultured bacteria, which contribute significantly to oceanic energy capture, and in elucidating complex carbon, nitrogen, and sulfur cycling mechanisms carried out by microbial consortia [23, 24]. Similarly, soil metagenomics has moved beyond simple diversity surveys to provide functional insights into the microbial drivers of nutrient cycling (e.g., nitrogen fixation, phosphorus solubilization), the intricate molecular dialogues underpinning plant-microbe symbioses and pathogen suppression, and the microbial basis of ecosystem resilience and response to environmental change or pollution [25].

6.2 Human microbiome and health
The application of metagenomics to the human microbiome has transformed our view of the human body as a holobiont, demonstrating that resident microbial communities in the gut, skin, oral cavity, and other sites are integral to host physiology. Large-scale projects like the Human Microbiome Project have utilized shotgun metagenomics to establish baseline maps of microbial genes and pathways associated with health. This work has robustly linked a state of microbial imbalance, or dysbiosis, to a wide spectrum of chronic diseases. For example, specific metagenomic signaturesâ€”such as altered microbial gene profiles for short-chain fatty acid synthesis or mucin degradationâ€”have been associated with inflammatory bowel disease (IBD), while shifts in microbial energy harvest pathways are implicated in obesity and type 2 diabetes [26â€“28]. Critically, shotgun metagenomics provides strain-level resolution and direct functional profiling, enabling researchers to distinguish pathogenic strains from commensal ones within the same species and to identify the specific virulence or resistance genes they carry. This granularity is paving the way for precision microbiome medicine, including the development of next-generation probiotics, targeted prebiotics, and fecal microbiota transplantation guided by metagenomic diagnostics [29].

6.3 Clinical and diagnostic applications
Metagenomics is emerging as a disruptive technology in clinical microbiology and public health. Unlike culture-based or single-pathogen PCR tests, clinical metagenomic next-generation sequencing (mNGS) allows for the unbiased detection of all pathogensâ€”viruses, bacteria, fungi, and parasitesâ€”directly from patient samples like cerebrospinal fluid, blood, or respiratory secretions. This culture-independent approach enables the rapid identification of novel, fastidious, or unculturable pathogens during outbreaks of unknown etiology, significantly improving infectious disease diagnosis and management [30]. Furthermore, metagenomic sequencing provides a comprehensive snapshot of the entire antimicrobial resistance (AMR) gene repertoire (the “resistome”) within a clinical sample or a hospital environment. This capability is crucial for infection control, tracking the transmission of resistant strains during outbreaks, and informing effective, tailored antibiotic stewardship programs [31].

6.4 Industrial and biotechnological applications
The vast genetic reservoir of uncultured microbes represents a treasure trove for biotechnology, and metagenomics is the key tool for tapping into this resource. Through functional metagenomics, environmental DNA is cloned and expressed in heterologous hosts (like E. coli), followed by high-throughput screening for novel enzymes with desirable properties. This approach has accelerated the discovery of industrially relevant biocatalysts, including thermostable cellulases for biofuel production, alkaline proteases for detergents, cold-active lipases for food processing, and novel polymer-degrading enzymes for bioremediation [32, 33]. By accessing the genetic potential of the “microbial dark matter,” metagenomics-driven bioprospecting continues to yield molecules with applications in pharmaceuticals, agriculture, and green chemistry.

6.5 Agricultural and food systems
In agriculture, metagenomics provides a systems-level view of the rhizosphere and bulk soil microbiomes, elucidating their roles in maintaining soil fertility, promoting plant growth, and suppressing diseases. This knowledge is driving the development of microbial consortia as next-generation biofertilizers and biopesticides for sustainable crop production [34]. In food science and safety, metagenomics is used to profile and monitor the complex microbial communities involved in fermentation processes (e.g., for cheese, wine, and fermented meats), ensuring quality and authenticity. It is also a powerful tool for comprehensive food safety monitoring, enabling the detection of spoilage organisms and foodborne pathogens without prior culturing, and for tracking microbial contaminants throughout the food production chain [35].

7. Challenges and limitations

Despite its transformative impact, metagenomics confronts significant technical and analytical hurdles. The sheer volume and complexity of data generated pose immense computational demands for storage, processing, and analysis. A primary limitation remains the incompleteness of reference genomic databases, which leads to a substantial fraction of sequencing reads remaining unclassified or annotated as “unknown,” hindering accurate taxonomic assignment and functional interpretation [36]. Technical artifacts, such as biases introduced during DNA extraction and sequencing, uneven genomic coverage across community members, and contamination from host DNA (especially in low-biomass clinical samples), can further distort biological interpretations and complicate downstream analysis [37].

Beyond technical challenges, the field is grappling with important ethical and legal considerations. Human-associated metagenomic data contains not only microbial information but also human genomic sequences, raising critical issues of patient privacy, informed consent, and data ownership. Questions about who has the right to access and benefit from microbial gene discoveries from unique populations or environments are also emerging [38]. To ensure scientific rigor and reproducibility, there is a pressing need for greater standardization in sampling protocols, DNA extraction methods, and bioinformatic pipelines, as well as the development of unified data-sharing frameworks and metadata reporting standards across the global research community [39].

8. Future perspectives

The future trajectory of metagenomics points toward multi-omics integration and higher-resolution technologies. Combining metagenomics with complementary approaches like metatranscriptomics (community gene expression), metaproteomics (protein expression), and metabolomics (metabolite profiling) will move the field from cataloging genetic potential to understanding dynamic microbial community function and host-microbe interactions at a true systems biology level [40]. Concurrently, advances in artificial intelligence and machine learning will enhance the power to identify complex patterns, predict ecosystem functions or clinical outcomes from microbiome data, and model microbial interactions. The rise of cloud computing and real-time sequencing platforms, like portable nanopore devices, promises to democratize access and enable field-based, rapid microbiological analysis [41].

At the technological frontier, single-cell genomics allows the recovery of complete genomes from individual microbial cells, bypassing assembly challenges and revealing population heterogeneity. When coupled with spatial metagenomics techniques, which map microbial identities and functions onto their physical locations within a habitat (e.g., a soil particle or a gut villus), these methods will bridge the critical gap between community-level “what” and organism-level “who,” providing unprecedented insight into the spatial ecology of microbiomes [42]. Together, these innovations will profoundly expand the translational applications of metagenomics in personalized medicine, environmental monitoring, and industrial biotechnology.

9. Conclusion

Metagenomics has fundamentally reshaped the life sciences by providing a culture-independent passport to explore the vast, unseen microbial universe that constitutes the majority of Earth’s genetic diversity. By directly interrogating the collective genome of complex communities, it has effectively circumvented the “great plate count anomaly,” illuminating the immense phylogenetic diversity, metabolic capacity, and ecological significance of microorganisms in habitats ranging from deep-sea hydrothermal vents and polar ice caps to the intricate ecosystems of the human gut and plant rhizosphere. This paradigm shift has transformed microorganisms from a collection of isolated laboratory strains into a dynamic, interconnected network of functions that underpin global biogeochemical cycles, host physiology, and ecosystem stability.Â

The remarkable trajectory of the field has been fueled by a synergistic cycle of technological and computational innovation. Continuous advancements in high-throughput sequencing, marked by precipitously dropping costs and rising throughput, have democratized access to deep metagenomic surveying. In parallel, the development of sophisticated bioinformatics pipelinesâ€”for quality control, de novo assembly, binning, taxonomic classification, and functional annotationâ€”has provided the essential computational framework to distill meaning from terabytes of raw sequence data. Most recently, the move towards integrative multi-omics analytical approaches, which layer metatranscriptomics, metaproteomics, and metabolomics onto genomic blueprints, has enabled a transition from static genetic potential to a dynamic, systems-level understanding of community function and host-microbe interactions. These cumulative advancements have steadily broadened its applications, making metagenomics an indispensable discovery tool and diagnostic platform across disciplines as diverse as clinical medicine, microbial ecology, precision agriculture, and industrial biotechnology.Â

Despite its transformative impact, the field navigates a set of persistent challenges that define its current frontiers. The sheer data complexity and computational burden of analysis remain significant barriers, while incomplete reference databases limit the functional interpretation of a substantial fraction of sequenced genes, often termed “microbial dark matter.” Achieving true standardization in sampling protocols, experimental workflows, and bioinformatic pipelines is critical for improving reproducibility and enabling robust meta-analyses across studies. Furthermore, the rise of human-associated metagenomics brings ethical and legal considerations to the fore, including questions of privacy, data ownership, and the equitable use of genetic resources from indigenous or unique populations.Â Nevertheless, the field continues to evolve with remarkable rapidity. Emerging technologies like long-read sequencing, single-cell genomics, and spatial metagenomics promise to deliver higher-resolution insights, bridging the gap between community-level profiles and the biology of individual cells in their physical context. The integration of artificial intelligence and machine learning is poised to unlock predictive modeling of microbial community dynamics and host phenotypes. As a powerful and ever-adapting scientific instrument, metagenomics holds immense, and still-growing, potential. Its continued development and thoughtful application are poised to address some of the most pressing global challenges, including the diagnosis of elusive infections, the management of antimicrobial resistance, the restoration of degraded ecosystems, the development of climate-smart agriculture, and the discovery of novel enzymes for the bioeconomy, securing its role as a cornerstone of 21st-century science.

Author Contributions: Conceptualization, S.I., and R.A.; methodology, R.A.; software, S.I., and R.A.; formal analysis, S.I., and R.A.; investigation, R.A.; resources, S.I., and R.A.; data curation, S.I., and R.A.; writingâ€”original draft preparation, S.I., and R.A.; writingâ€”review and editing, S.I., and R.A.; visualization, S.I., and R.A.; supervision, S.I.; project administration, S.I.; funding acquisition, S.I. The authors have read and agreed to the published version of the manuscript.

Funding: Not applicable.

Acknowledgments: We are grateful to the Department of Biotechnology, Faculty of Natural Science, Norwegian University of Science and Technology, Trondheim 7491, Norway and Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim 7491, Norway for providing us all the facilities to carry out the entire work.

Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Institutional Review Board Statement: We have already mentioned in details in the method section.

Informed Consent Statement: We have already mentioned in details in the method section.

Data Availability Statement: All the related data are supplied in this work or have been referenced properly.

References

Amann RI, Ludwig W, Schleifer KH. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev. 1995;59(1):143â€“69.
Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol. 1998;5(10):R245â€“9.
Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. PLoS Comput Biol. 2010;6(2):e1000667.
Thomas T, Gilbert J, Meyer F. Metagenomics – a guide from sampling to data analysis. Microb Inform Exp. 2012;2:3.
Schloss PD, Handelsman J. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol. 2005;6(8):229.
Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014;5:209.
Jovel J, Patterson J, Wang W, Hotte N, Oâ€™Keefe S, Mitchel T, et al. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front Microbiol. 2016;7:459.
Pollock J, Glendinning L, Wisedchanwet T, Watson M. The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl Environ Microbiol. 2018;84(7):e02627-17.
Yuan S, Cohen DB, Ravel J, Abdo Z, Forney LJ. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One. 2012;7(3):e33865.
Costea PI, Zeller G, Sunagawa S, Pelletier E, Alberti A, Levenez F, et al. Towards standards for human fecal sample processing in metagenomic studies. Nat Biotechnol. 2017;35(11):1069-76.
Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66â€“74.
Metzker ML. Sequencing technologies – the next generation. Nat Rev Genet. 2010;11(1):31â€“46.
Rhoads A, Au KF. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics. 2015;13(5):278â€“89.
Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018;36(4):338â€“45.
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012;30(7):693â€“700.
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35(9):833â€“44.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335â€“6.
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46.
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457â€“62.
Knights D, Costello EK, Knight R. Supervised classification of human microbiota. FEMS Microbiol Rev. 2011;35(2):343â€“59.
Faust K, Raes J. Microbial interactions: from networks to models. Nat Rev Microbiol. 2012;10(8):538â€“50.
Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, et al. A communal catalogue reveals Earthâ€™s multiscale microbial diversity. Nature. 2017;551(7681):457â€“63.
DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard NU, et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science. 2006;311(5760):496â€“503.
Fierer N, Lauber CL, Ramirez KS, Zaneveld J, Bradford MA, Knight R. Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients. ISME J. 2012;6(5):1007â€“17.
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature. 2007;449(7164):804â€“10.
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464(7285):59â€“65.
Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, et al. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486(7402):207â€“14.
Franzosa EA, McIver LJ, Rahnavard G, Thompson LR, Schirmer M, Weingart G, et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods. 2018;15(11):962â€“8.
Chiu CY, Miller SA. Clinical metagenomics. Nat Rev Genet. 2019;20(6):341â€“55.
Wilson MR, Sample HA, Zorn KC, Arevalo S, Yu G, Neuhaus J, et al. Clinical metagenomic sequencing for diagnosis of meningitis and encephalitis. N Engl J Med. 2019;380(24):2327â€“40.
Simon C, Daniel R. Metagenomic analyses: past and future trends. Appl Environ Microbiol. 2011;77(4):1153â€“61.
Ferrer M, Beloqui A, Timmis KN, Golyshin PN. Metagenomics for mining new genetic resources of microbial communities. J Mol Microbiol Biotechnol. 2009;16(1-2):109â€“23.
Mendes R, Kruijt M, de Bruijn I, Dekkers E, van der Voort M, Schneider JH, et al. Deciphering the rhizosphere microbiome for disease-suppressive bacteria. Science. 2011;332(6033):1097â€“100.
Ercolini D. High-throughput sequencing and metagenomics: moving forward in the culture-independent analysis of food microbial ecology. Appl Environ Microbiol. 2013;79(10):3148â€“55.
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, DrÃ¶ge J, et al. Critical assessment of metagenome interpretationâ€”a benchmark of metagenomics software. Nat Methods. 2017;14(11):1063â€“71.
Eisenhofer R, Minich JJ, Marotz C, Cooper A, Knight R, Weyrich LS. Contamination in low microbial biomass microbiome studies: issues and recommendations. Trends Microbiol. 2019;27(2):105â€“17.
Shabani M, Borry P. Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation. Eur J Hum Genet. 2018;26(2):149â€“56.
Knight R, Vrbanac A, Taylor BC, Aksenov A, Callewaert C, Debelius J, et al. Best practices for analysing microbiomes. Nat Rev Microbiol. 2018;16(7):410â€“22.
Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):83.
Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. 2015;16(6):321â€“32.
Blainey PC. The future is now: single-cell genomics of bacteria and archaea. FEMS Microbiol Rev. 2013;37(3):407â€“27.
Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):R60.
Zhernakova A, Kurilshikov A, Bonder MJ, Tigchelaar EF, Schirmer M, Vatanen T, et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science. 2016;352(6285):565â€“9.
Mende DR, Waller AS, Sunagawa S, JÃ¤rvelin AI, Chan MM, Arumugam M, et al. Assessment of metagenomic assembly using simulated next generation sequencing data. PLoS One. 2012;7(2):e31386.
Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol. 2013;31(6):533â€“8.
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(D1):D590â€“6.
Tett A, Pasolli E, Masetti G, Ercolini D, Segata N. Prevotella diversity, niches and interactions with the human host. Nat Rev Microbiol. 2021;19(9):585â€“99.
Nayfach S, Shi ZJ, Seshadri R, Pollard KS, Kyrpides NC. New insights from uncultivated genomes of the global human gut microbiome. Nature. 2019;568(7753):505â€“10.
Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Structure and function of the global ocean microbiome. Science. 2015;348(6237):1261359.

Disclaimer/Publisherâ€™s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of Global Journal of Basic Science and/or the editor(s). Global Journal of Basic Science and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.Â

Copyright: Â© 2025 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Bibliography