Chromosome-level genome assembly and annotation of the Rhabdophis nuchalis (Hubei keelback)

Duan, Mingwen; Yang, Shijun; Li, Xiufeng; Tang, Xuemei; Cheng, Yuqi; Luo, Jingxue; Wang, Ji; Song, Huina; Wang, Qin; Zhu, Guang xiang

doi:10.1038/s41597-024-03708-z

Chromosome-level genome assembly and annotation of the Rhabdophis nuchalis (Hubei keelback)

Data Descriptor
Open access
Published: 08 August 2024

Volume 11, article number 850, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly and annotation of the Rhabdophis nuchalis (Hubei keelback)

Download PDF

Mingwen Duan¹^na1,
Shijun Yang¹^na1,
Xiufeng Li¹,
Xuemei Tang¹,
Yuqi Cheng²,
Jingxue Luo¹,
Ji Wang¹,
Huina Song¹,
Qin Wang¹ &
…
Guang xiang Zhu ORCID: orcid.org/0000-0002-8900-3659¹

453 Accesses
1 Altmetric
Explore all metrics

Abstract

Rhabdophis nuchalis, a snake widely distributed in China, possesses a unique trait: glands beneath the skin on its neck and back, known as nucho-dorsal glands. These features make it a valuable subject for studying genetic diversity and the evolution of complex traits. In this study, we obtained a high-quality chromosome-level reference genome of R. nuchalis using MGI short-read sequencing, PacBio Revio long-read sequencing, and Hi-C sequencing techniques. The final assembly comprised 1.92 Gb of the R. nuchalis genome, anchored to 20 chromosomes (including 9 macrochromosomes and 11 microchromosomes), with a contig N50 of 104.79 Mb, a scaffold N50 of 204.96 Mb, and a BUSCO completeness of 97.50%. Additionally, we annotated a total of 1.09 Gb of repetitive sequences (which constitute 56.51% of the entire genome) and identified 22,057 protein-coding genes. This high-quality reference genome of R. nuchalis furnishes essential genomic data for comprehending the genetic diversity and evolutionary history of the species, as well as for facilitating species conservation efforts and comparative genomics studies.

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Article Open access 04 May 2024

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

Article Open access 25 October 2022

Chromosome-level genome assembly and annotation of eel goby (Odontamblyopus rebecca)

Article Open access 02 February 2024

Background & Summary

Recent studies indicate that snakes gradually evolved from lizards during the Early Cretaceous period, approximately 117.68 million years ago¹. According to the most recent entries in the Reptile Database (https://reptile-database.reptarium.cz/), there are over four thousand snake species distributed across all continents except Antarctica, occupying diverse ecological niches and demonstrating high species diversity². This broad distribution and adaptation to various habitats make snakes a vital component of Earth’s biodiversity³. Furthermore, certain snakes have developed distinct characteristics through evolution. For instance, Viperidae and Elapidae snakes exhibit high venom potency⁴. Snakes within the subfamily Hydrophiinae have adapted to sea life⁵, while those in the Typhlopidae are adept at living in soil⁶. Consequently, snakes represent an irreplaceable subject for biodiversity and adaptive evolutionary research. In recent years, high-quality chromosome-level genomes of several snake species have been published, offering valuable insights into unique traits and snake evolution^1,7,8,9,10. However, despite these advancements, there remains a significant dearth of available data on snakes, both in terms of quantity and quality of reference genomes, hindering further research in this field.

Malnate (1960) partitioned Natrix sensu lato based on several morphological characters and restored the genus Rhabdophis¹¹, which was established by Fitzinger in 1843 with the R. subminiatus as the type species. Unlike other snakes, the genus Rhabdophis possesses a distinctive trait of having glands beneath the skin of the neck and back, referred to as nuchal and dorsal glands, respectively¹². In some species, these glands are confined to the neck (e.g., R. tigrinus)¹³. These glands harbor potent cardiotonic steroids known as Bufadienolides (BDs), which serve as defensive toxins against predators^13,14. According to the latest records from the Reptile Database, there are currently 34 known species of the genus Rhabdophis worldwide. However, to date, there is not yet a reference genome available in the entire genus, which poses a challenge for genomics studies of these species.

In 1891, Boulenger described Tropidonotus nuchalis¹⁵ based on a specimen from Hubei, China, subsequently classified as Natrix nuchalis¹⁶ by Parker in 1925 and revised by Malnate in 1960 as R. nuchalis¹¹. This species is known by the common name of Hubie keelback and exhibits a wide distribution in China¹⁷. Its diet primarily consists of earthworms and firefly larvae. Notably, R. nuchalis acquires BDs from firefly larvae and stores them in its dorsal neck glands, making it an ideal candidate for studying genetic diversity and complex trait evolution¹⁴. However, current research on R. nuchalis primarily focuses on morphology¹⁸, phylogenetic relationships¹⁷, and biogeography¹⁹, yet the absence of genomic data has hindered further exploration.

In this study, we successfully assembled and annotated the genome of R. nuchalis at the chromosome level by MGI short-read sequencing, PacBio Revio long-read sequencing, Hi-C²⁰ sequencing, and RNA sequencing (RNA-seq) techniques. We estimated genome size and heterozygosity from clean short reads, performed long-read sequencing using the PacBio Revio System, and combined it with Hi-C²⁰ reads to achieve chromosome-level assembly. Genome annotation was conducted using RNA-seq reads from five tissues (heart, spleen, lung, kidney, and muscle), published genomes of closely related species, and de novo prediction methods. Additionally, we assessed the quality of genome assembly using various metrics. Our efforts culminated in the first high-quality reference genome of the genus Rhabdophis, providing essential genetic data for studying adaptive evolution, genetic diversity, and resequencing analysis of R. nuchalis and the broader genus Rhabdophis.

Methods

Ethics statement

This study adhered to all pertinent ethical and legal guidelines and regulations. The collection of animals and extraction of tissues underwent thorough review and received approval from the Animal Ethics and Welfare Committee of Sichuan Agricultural University (Approval No. 20230121).

Sample collection

An adult female R. nuchalis (body length of 755 mm) was collected from the Shennongjia forest area (latitude: 31.683625, longitude: 110.418075) in Hubei Province, China, for genome sequencing and assembly. Six different tissues (heart, liver, spleen, lung, kidney, and muscle) were sequentially collected and rapidly frozen in liquid nitrogen upon collection, then stored at −80 °C. Liver tissue was utilized for MGI short-read sequencing, PacBio Revio HiFi long-read sequencing, and Hi-C sequencing, while the remaining five tissues were designated for RNA sequencing.

Library construction and sequencing

The collected tissues were sent to GrandOmics Biosciences Co., Ltd. (Wuhan, China), for DNA extraction, library construction, and sequencing. Genomic DNA (gDNA) was extracted from the liver following the manufacturer’s instructions and used for the construction of gDNA libraries. The integrity and purity of the gDNA samples were assessed using agarose gel electrophoresis.

For short-read sequencing, 1.5 μg gDNA was randomly fragmented by Covaris, following the guidelines specified in the device’s operating manual, and 300–400 bp fragments were selected with the Agencourt AMPure XP-Medium kit. The library was then constructed from the selected fragments using the AxyPrep Mag PCR clean-up Kit according to the manufacturer’s instructions. Finally, the qualified libraries were sequenced on the BGISEQ DNBSEQ-T7 platform. This yielded 108.82 Gb of raw reads, and 101.02 Gb of clean reads (with an average depth of coverage of 38.95×) were obtained after quality control using fastp v0.21.0²¹ (Table 1). These clean reads were utilized for genome size estimation and to evaluate the accuracy of genome assembly.

Table 1 Statistics of sequencing data used for R. nuchalis genome assembly in this study.

Full size table

For PacBio HiFi long-read sequencing, 5 µg of gDNA was used to construct SMRTbell libraries following PacBio’s standard protocol (Pacific Biosciences, CA, USA). The process included shearing of gDNA using g-TUBEs (Covaris, USA) according to the expected size of the fragments for the library, DNA damage repair, end repair, and A-tailing, followed by ligating hairpin adapters at both ends of the fragments using the SMRTbell Express Template Prep Kit 3.0 (Pacific Biosciences). After nuclease treatment of the SMRTbell library using the SMRTbell Enzyme Cleanup Kit, target fragments were screened using PippinHT (Sage Science, USA), and the prepared SMRTbell library was sequenced on the PacBio Revio platform instrument with Revio Kit in Grandomics. This resulted in 79.31 Gb of HiFi long reads (with an average coverage depth of 39.65×) for genome assembly (Table 1).

Hi-C libraries were constructed following the protocol²² to obtain the genome at the chromosome level. Key steps included fixation of liver samples using 2% formaldehyde, cleavage of sequences with the DpnII enzyme, end repair, biotin-14-dCTP labeling, ligation with T4 DNA ligase, and uncross-linking and interrupting the sequences. Subsequently, the libraries were sequenced on the BGISEQ DNBSEQ-T7 platform. This generated 209.72 Gb of raw reads, and 209.72 Gb of clean reads (with an average depth of coverage of 107.70×) were obtained after quality control using fastp v0.21.0²¹ (Table 1).

To improve the precision of genome annotation, RNA sequencing was conducted across five distinct tissues: heart, spleen, lungs, kidneys, and muscles. Each tissue underwent RNA extraction utilizing TRIzol reagent (Invitrogen, USA), followed by assessment of RNA purity and concentration using Nanodrop and Qubit, construction of RNA-seq libraries employing the MGIEasy RNA Sample Prep Kit (UW Genetics), and sequencing on the BGISEQ DNBSEQ-T7 platform. A minimum of 6 Gb of sequencing data was guaranteed for each tissue. In total, 40.26 Gb of raw reads were generated, with 40.14 Gb of clean reads obtained post quality control using fastp v0.21.0²¹ (Table 1). These clean reads were utilized for transcriptome annotation of the genome.

Predicting genome size and heterozygosity

The genome size and heterozygosity of R. nuchalis were predicted using KMC v3.2.1²³ and GenomeScope v1²⁴ software before assembly. Initially, the short reads, post-quality control, underwent analysis with KMC v3.2.1²³ (parameter k = 17) to generate the k-mer frequency distribution table. Subsequently, the obtained k-mer frequency distribution table was analyzed using GenomeScope v1²⁴ software to derive genome prediction information. Finally, the prediction results indicated a genome size of 1.57 Gb and a heterozygosity of 1.20% (Table 2).

Table 2 Statistical analysis of the size and heterozygosity prediction for the R. nuchalis genome (K-mer = 17).

Full size table

De novo assembly of the R. nuchalis genome

De novo assembly of the R. nuchalis genome was conducted using the obtained HiFi long reads through hifiasm v0.16.0²⁵. We acquired the preliminary assembled genome, which underwent comparison with the NT (Nucleotide Sequence Database) library. Sequences longer than 1 Mb were subjected to 50 kb cuts, and contaminating reads (non-target macroclasses, mitochondria) were subsequently removed from the genome to yield the final assembly. The resulting genome size of R. nuchalis, post-contamination removal, was 1.93 Gb, with a contig N50 of 104.79 Mb (Table 3).

Table 3 Assembly statistics for R. nuchalis are presented in two parts: the first part comprises the assembly results prior to Hi-C integration, while the second part showcases the outcomes of the Hi-C-assisted assembly process.

Full size table

To assess the quality of the genome assembly, we first employed BUSCO v4.0.5²⁶ (Benchmarking Universal Single-Copy Orthologs) to evaluate completeness. This involved analyzing single-copy homologous genes in the OrthoDB database vertebrata_odb10. The analysis revealed that 3,270 (97.50%) out of 3,354 BUSCO groups were identified as complete, including 3,232 complete and single-copy BUSCOs (96.36%), and 38 complete and duplicated BUSCOs (1.13%), indicating high completeness of the assembled genome (Table 4).

Table 4 Statistics from the BUSCO assessment of the R. nuchalis genome assembly and annotation.

Full size table

Furthermore, to evaluate the accuracy of the assembly, clean short reads and HiFi long reads were mapped to the R. nuchalis genome using BWA v0.7.15²⁷ and minimap2²⁸, respectively. The results indicated that at a coverage depth of 1×, the clean short reads and HiFi long reads achieved 98.24% and 99.97% coverage across the entire genome, respectively (Table 5). This demonstrates the high accuracy of the genome assembly.

Table 5 Results from this study involve the alignment of quality-controlled short-read and long-read sequences to the assembled R. nuchalis genome.

Full size table

Hi-C assisted assembly

We employed a multi-step process to assemble the genome of R. nuchalis to the chromosome level using quality-filtered Hi-C reads. Firstly, clean Hi-C reads were aligned to genomes assembled with HiFi long reads using bowtie2 v2.3.2²⁹ to obtain uniquely mapped paired-end reads. Subsequently, HiC-Pro v2.8.1³⁰ was utilized to identify and retain valid interacting paired-end reads from these uniquely mapped pairs while filtering out invalid sequences such as dangling-end, self-cycle, re-ligation, and dumped products.

Subsequently, the scaffolds underwent further clustering, sorting, and chromosomal localization using LACHESIS v1³¹. Subsequent manual adjustments were made to the genome using Juicebox v1.11.08³² to derive the final pseudochromosomes. The chromosomes, GC content, gene density, abundance of repetitive sequences, and ncRNA distribution of the genome were visualized using the advanced circos³³ in TBtools II³⁴ (Fig. 1B). The analysis unveiled that R. nuchalis features 20 chromosomes, consisting of 9 macrochromosomes and 11 microchromosomes (with a 50 Mb threshold in squamates³⁵). Chromosome sizes varied from 14.96 Mb to 411.07 Mb, contributing to a total genome size of 1.92 Gb (Tables 3, 6, and Fig. 1). Notably, the contig N50 stood at 104.79 Mb, while the scaffold N50 reached 204.96 Mb (Table 3). This comprehensive approach facilitated the structuring of the genome into chromosomal configurations, offering profound insights into the genomic architecture of R. nuchalis.

Table 6 Statistics of 20 chromosomes of R. nuchalis genome (1–9 are macrochromosomes and 10–20 are microchromosomes).

Full size table

Repeat sequence annotation

Repeat sequences, comprising tandem repeats (TRs) and transposable elements (TEs), were annotated in the genome of R. nuchalis using a combination of software tools and databases. For TRs, we employed GMATA v2.2³⁶ and Tandem Repeats Finder (TRF v4.07b³⁷) software pairs. GMATA v2.2³⁶ identified simple repeat sequences (SSRs), while TRF v4.07b³⁷ identified all tandem repeats in the genome. Regarding TEs, a dual approach of de novo and homologous annotation was adopted. Firstly, transposable elements were de novo annotated using MITE-hunter³⁸ and RepeatModeler v1.0.11³⁹ software, in which also uses LTR_FINDER⁴⁰, LTR_harvest⁴¹ and LTR_retriver⁴² for synchronization detection of repeat sequences. Subsequently, the obtained libraries were compared with the TEclass Repbase database to categorize each repeat family using TEclass v2.1.3⁴³. Furthermore, RepeatMasker v1.331⁴⁴ was utilized to search for both known and novel TEs by localizing sequences from de novo repeat libraries and Repbase repeat libraries. Overlapping transposon factors belonging to the same repeat class were sorted and combined.

The results indicated that a total of 1.09 Gb of repetitive sequences were annotated in the genome of R. nuchalis, constituting 56.51% of the entire genome. Among these, TRs and TEs accounted for 13.78 Mb and 885.68 Mb in size, representing 0.72% and 46.02% of the whole genome, respectively. Class I and Class II TRs comprised 628.50 Mb and 257.18 Mb, contributing to 32.66% and 13.36% of the entire genome, respectively (Table 7). This comprehensive annotation provides insights into the repetitive landscape of the R. nuchalis genome.

Table 7 Statistical outcomes regarding repetitive sequences in the R. nuchalis genome (categorized by the type of repetitive sequences).

Full size table

Gene structure annotation

In the structural annotation of the R. nuchalis genome, we initially applied RepeatMasker v1.331⁴⁴ to soft-mask the annotated repetitive sequences. Subsequently, gene structure prediction was conducted through three methods: homology prediction, transcriptome prediction, and de novo prediction, with integration of the results to derive the final gene structure annotation. For homology prediction, comparisons were made with the genomes of five closely related species (Ahaetulla prasina⁷, Calamaria septentrionalis¹, Pantherophis guttatus¹, Thamnophis elegans NCBI accession GCA_009769535.1, and Thermophis baileyi⁸) using GeMoMa v1.6.1⁴⁵ software. Transcriptome prediction involved mapping quality-controlled RNA-seq reads to the R. nuchalis genome using STAR v2.7.3a⁴⁶, followed by transcript assembly with Stringtie v1.3.4d⁴⁷ and prediction of open reading frames (ORFs) using PASA v2.3.3⁴⁸. De novo prediction entailed reassembly of RNA-seq reads using Stringtie v1.3.4d and analysis with PASA v2.3.3⁴⁸ to generate a training set, followed by de novo gene prediction using Augustus v3.3.1⁴⁹. Finally, the predictions were integrated using EVM v1.1.1⁴⁸ (EVidenceModeler).

The results indicated that homology prediction, transcriptome prediction, and de novo prediction annotated 48,439, 18,203, and 20,575 genes, respectively, with a final count of 22,057 protein-coding genes successfully annotated after EVM v1.1.1⁴⁸ integration. Among them, the average gene length and CDS length were 34,853.45 bp and 1,617.01 bp, respectively. Each exon contained an average of 9.12 genes, while the average lengths of exons and introns were 177.32 bp and 4,093.52 bp, respectively (Table 8).

Table 8 Statistical outcomes of gene structure annotation in the R. nuchalis genome, obtained through three different methods and subsequently integrated.

Full size table

Gene function annotation

We have successfully completed the functional gene annotation of the R. nuchalis genome by utilizing five key public databases: GO (Gene Ontology)⁵⁰, SwissProt⁵¹, NR (Non-Redundant protein Database), KEGG (Kyoto Encyclopedia of Genes and Genomes)⁵², and KOG (Eukaryotic Orthologous Groups of proteins)⁵³. In the case of the GO database, we employed the default parameters of the InterProScan v5.32⁵⁴ program for gene function annotation. For the remaining four databases, we utilized Blastp v2.7.1 to annotate gene functions. The results revealed that 13,451, 18,567, 19,655, 14,474, and 13,362 genes were annotated in GO⁵⁰, SwissProt⁵¹, NR, KEGG⁵², and KOG⁵³, respectively, accounting for 60.98%, 84.18%, 89.11%, 65.62%, and 60.58% of the total number of genes in R. nuchalis (Table 9). Notably, 9,343 genes were annotated across all five databases (Fig. 2). By integrating the annotation outcomes from these databases, we completed the functional annotation of 19,918 genes, representing 90.30% of the total gene count (Table 9, Fig. 2).

Table 9 Statistical findings from the functional annotation of genes within the R. nuchalis genome, sourced from five distinct databases and subsequently consolidated.

Full size table

Subsequently, we conducted an evaluation of the genome annotation results. Initially, the annotated genes were assessed using BUSCO v4.0.5²⁶ based on the OrthoDB database vertebrata_odb10. The evaluation revealed that 3,237 complete genes were identified within 3,354 BUSCO groups, accounting for 96.51% of the database, underscoring the high completeness of the annotated genome of R. nuchalis (Table 4). Furthermore, we compared the genome of R. nuchalis with the published genomes of five closely related species, which exhibited a total gene count ranging from 18,213 to 22,959 genes (Table 10). Remarkably, R. nuchalis possessed 22,057 genes, aligning well with the published species (Table 10). Additionally, in terms of gene length, average CDS length, exon length, the average number of exons per gene, intron length, and the distribution of intron number, R. nuchalis exhibited consistency with the five closely related species (Table 10, Fig. 3).

Table 10 Comparison of results of R. nuchalis genome annotation with closely related species.

Full size table

Non-coding RNA (ncRNA) annotation

The annotation of ncRNAs in the R. nuchalis genome was accomplished through a combination of database searching and model prediction methods. Specifically, tRNAs were annotated using tRNAscan-SE v2.0⁵⁵, while MicroRNAs, rRNAs, small nucleolar RNAs, and small nucleolar kernel RNAs were identified by searching the Rfam database⁵⁶ using Infernal v1.1.2 cmscan⁵⁷. Additionally, RNAmmer v1.2⁵⁸ prediction was employed for the annotation of rRNAs and their subunits. The results showed that a total of 3,599 ncRNA were annotated in the R. nuchalis genome, including 397 rRNA, 981 snRNA, and 2,063 tRNA (Table 11).

Table 11 Statistical results of Non-coding RNA annotation of the R. nuchalis genome (categorized by the type of Non-coding RNA).

Full size table

Data Records

All the raw sequencing data generated in this study have been uploaded to the NCBI Sequence Read Archive (SRA) database with the accession number SRP500045⁵⁹. The assembled chromosome-level genome data have been deposited in Genbank with the accession number GCA_039707465.1⁶⁰. The genome annotation data have been uploaded to Figshare (https://doi.org/10.6084/m9.figshare.25559178.v1)⁶¹.

Technical Validation

To assess the accuracy and completeness of the assembled genome of R. nuchalis, we conducted BUSCO v4.0.5²⁶ assessment, identifying 3,270 complete BUSCO genes out of 3,354, indicating 97.50% completeness(Table 4). Furthermore, mapping clean short reads and HiFi long reads to the genome revealed 98.24% and 99.97% mapping ratio, respectively, at a coverage depth of 1×, demonstrating high accuracy (Table 5). Additionally, for genome structure annotation, BUSCO assessment yielded 3,237 complete genes out of 3,354 BUSCO groups, representing 96.51% completeness (Table 4). Comparison with five closely related species showed consistency in gene count and various gene parameters, affirming the effectiveness of genome annotation (Table 10, Fig. 3).

Code availability

No specific code was used in this study. All analytical processes were executed according to the manuals and protocols of the corresponding bioinformatic tools. The software parameters used in this study are as follows: fastp v0.21.0: -n 0 -f 5 -F 5 -t 5 -T 5 -q 20KMC v3.2.1: -k17 -ci1 -cs1000000GenomeScope v1: defaulthifiasm v0.16.0: defaultBUSCO v4.0.5: -l vertebrata_odb10 -g genomeBWA v0.7.15: defaultminimap2: -x map-hifibowtie2 v2.3.2: -end-to-end --very-sensitive -L 30HiC-Pro v2.8.1: -c confg-hicpro.txt -i -oLACHESIS v1: CLUSTER_MIN_RE_SITES = 100,CLUSTER_MAX_LINK_DENSITY = 2.5, CLUSTER NONINFORMATIVE RATIO = 1.4, ORDER MIN N RES IN TRUNK = 60, ORDER MIN N RES IN SHREDS = 60Juicebox v 1.11.08: defaultTBtools II: defaultGMATA v2.2: defaultTRF v4.07b: 2 7 7 80 10 50 500 -f -d -h -rMITE-hunter: -n 20 -P 0.2 -c 3RepeatModeler v1.0.11: -engine wublastLTR_FINDER: defaultLTR_harvest: defaultLTR_retriver: defaultTEclass v2.1.3: defaultRepeatMasker v1.331: nolow -no_is -gff -norna -engine abblast -lib libGeMoMa v1.6.1: defaultSTAR v2.7.3a: -outWigType bedGraph --outSAMtype BAM SortedByCoordinate--outSAMstrandField intronMotifStringtie v1.3.4d: defaultPASA v2.3.3: -c alignAssembly.config -C -R -g genome.fasta -T -u trans.fasta -t trans.clean.fasta -f fl.acc --CPU 10 --ALIGNERS gmapAugustus v3.3.1: --gff3 = on --hintsfile = hints.gff --extrinsicCfgFile = extrinsic.cfg--allow_hinted_splicesites = gcag, atac --min_intron_len = 30 --softmasking = 1EVM v1.1.1: --segmentSize 1000000 --overlapSize 100000InterProScan v5.32: defaultBlastp v2.7.1: -e 1e-5tRNAscan-SE v2.0: --thread 4 -E -IInfernal v1.1.2: defaultRNAmmer v1.2: -S euk -m lsu, ssu, tsu -gff

References

Peng, C. et al. Large-scale snake genome analyses provide insights into vertebrate development. CELL 186, 2959 (2023).
Article PubMed Google Scholar
Zug, G. R., Vitt, L. J. & Caldwell, J. P. Herpetology:An introductory biology of amphibians and reptiles. SYST. BIOL. 42, 592 (1993).
Article Google Scholar
Pyron, R. A., Burbrink, F. T. & Wiens, J. J. A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes. BMC EVOL. BIOL. 13, 93 (2013).
Article PubMed PubMed Central Google Scholar
Zhao, E. M. Snakes of China. (Anhui Science and Technology Publishing House, Hefei, Anhui., 2006).
Sanders, K. L., Lee, M. S. Y., Leys, R., Foster, R. & Keogh, J. S. Molecular phylogeny and divergence dates for Australasian elapids and sea snakes (Hydrophiinae): Evidence from seven genes for rapid evolutionary radiations. Journal of evolutionary biology 21, 682–695 (2008).
Article CAS PubMed Google Scholar
Beatriz, S. M. T. G. Intrauterine and post‐ovipositional embryonic development of Amerotyphlops brongersmianus (Vanzolini, 1976) (Serpentes: Typhlopidae) from northeastern Argentina. J. Morphol. 281, 523–535 (2020).
Article Google Scholar
Tang, C. Y. et al. Genetic mapping and molecular mechanism behind color variation in the Asian vine snake. Genome Biol. 24, 46 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yan, C. et al. Temperature acclimation in hot-spring snakes and the convergence of cold response. Innovation-Amsterdam 3, 100295 (2022).
Google Scholar
Margres, M. J. et al. The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype. Proceedings of the National Academy of Sciences 118, e2014634118 (2021).
Article CAS Google Scholar
Li, A. et al. Two Reference-Quality Sea Snake Genomes Reveal Their Divergent Evolution of Adaptive Traits and Venom Systems. Mol. Biol. Evol. 38, 4867 (2021).
Article CAS PubMed PubMed Central Google Scholar
Malnate, E. V. Systematic division and evolution of the colubrid snake genus Natrix, with comments on the subfamily Natricinae. P. Acad. Nat. Sci. Phila. 112, 41 (1960).
Google Scholar
Takeuchi, H. et al. Evolution of nuchal glands, unusual defensive organs of Asian natricine snakes (Serpentes: Colubridae), inferred from a molecular phylogeny. Ecol. Evol. 8, 10219 (2018).
Article PubMed PubMed Central Google Scholar
Mori, A. et al. Nuchal glands: a novel defensive system in snakes. Chemoecology 22, 187 (2012).
Article CAS Google Scholar
Yoshida, T. et al. Dramatic dietary shift maintains sequestered toxins in chemically defended snakes. Proceedings of the National Academy of Sciences 117, 5964 (2020).
Article ADS CAS Google Scholar
Boulenger, G. A. Descriptions of new oriental reptiles and batrachians. Annals and Magazine of Natural History 7, 279 (1891).
Article Google Scholar
Parker & H., W. eds. XXVIII.— Variation of the Leopidosis of a snake from S.E. Asia. (1925).
Liu, Q., Lyu, B., Xie, X., Zeng, Y. & Guo, P. Genomic evidence sheds new light on phylogeny of Rhabdophis nuchalis (sensu lato) complex (Serpentes: Natricidae). MOL. Phylogenet. Evol. 189, 107893 (2023).
Article CAS PubMed Google Scholar
Mori, A. et al. Morphology of the nucho-dorsal glands and related defensive displays in three species of Asian natricine snakes. Journal of zoology 300, 18 (2016).
Article Google Scholar
Zhu, G. et al. Cryptic diversity and phylogeography of the Rhabdophis nuchalis group (Squamata: Colubridae). Mol. Phylogenet. Evol. 166, 107325 (2022).
Article CAS PubMed Google Scholar
Belton, J. M. et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884 (2018).
Article PubMed PubMed Central Google Scholar
Rao, S. S. P. et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sebastian et al. KMC 2: fast and resource-frugal k-mer counting. Bioinformatics (Oxford, England) 31, 1569–1576 (2015).
Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 31, 2202–2204 (2017).
Article Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 1 (2021).
Article Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210 (2015).
Article CAS PubMed Google Scholar
Heng, L. & Richard, D. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article Google Scholar
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103 (2016).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).
Article CAS PubMed PubMed Central Google Scholar
Belaghzal, H., Dekker, J. & Gibcus, J. H. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Genome. Biol. 123, 56–65 (2017).
CAS Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119 (2013).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 3, 99 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, C., Wu, Y. & Xia, R. A painless way to customize Circos plot: From data preparation to visualization using TBtools. iMeta 1, 35 (2022).
Article Google Scholar
Chen, C. et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol. Plant 16, 1733 (2023).
Article CAS PubMed Google Scholar
Waters, P. D. et al. Microchromosomes are building blocks of bird, reptile, and mammal chromosomes. Proceedings of the National Academy of Sciences 118, e2112494118 (2021).
Article CAS Google Scholar
Wang, X. & Wang, L. GMATA: An Integrated Software Package for Genome-Scale SSR Mining. Marker Development and VIewing. Frontiers in plant science. 7, 1350 (2016).
PubMed Google Scholar
Gary, B. Tandem repeats finder: a program to analyze DNA sequences. Nucleic. Acids. Res. 27, 573–580 (1999).
Article Google Scholar
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. P. Natl. Acad. Sci. USA. 117, 9451 (2020).
Article ADS CAS Google Scholar
Zhao, X. & Hao, W. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265 (2007).
Article Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Article PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176, 1310 (2017).
Google Scholar
György, A., Norbert, G., Luc, D. M. & Wojciech, M. TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Article Google Scholar
Bedell, J. I. W. MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics 16, 1040–1041 (2000).
Article CAS PubMed Google Scholar
Jens et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids. Res. 9, e89 (2016).
Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15 (2013).
Article CAS PubMed Google Scholar
Tung, L. H., Shao, M. & Kingsford, C. Quantifying the benefit offered by transcript assembly with Scallop-LR on single-molecule long reads. Genome Biol. 20, 287 (2019).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J., Salzberg, S. L., Zhu, W. & Pertea, M. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Stanke et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics Oxford 24, 637–644 (2008).
Article CAS Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Amos, B. & Rolf, A. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27, 49–54 (1999).
Article Google Scholar
Ogata, H. et al. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29–34 (1999).
Article CAS PubMed PubMed Central Google Scholar
Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43, D261 (2015).
Article CAS PubMed Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 49, 9077–9096 (2021).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).
Article CAS PubMed Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Oxford University Press 35, 3100–3108 (2007).
ADS CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP500045 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_039707465.1 (2024).
MW, D. Genome annotation of the Rhabdophis nuchalis. figshare. Dataset. https://doi.org/10.6084/m9.figshare.25559178.v1 (2024).

Download references

Acknowledgements

This study was supported by a grant of the National Natural Science Foundation of China (NSFC 32270477). This research was also supported in part by a grant to Zhu GX (2016 M592688) from the China Postdoctoral Foundation.

Author information

These authors contributed equally: Mingwen Duan, Shijun Yang.

Authors and Affiliations

College of Life Science, Sichuan Agricultural University, Ya’an, 625014, China
Mingwen Duan, Shijun Yang, Xiufeng Li, Xuemei Tang, Jingxue Luo, Ji Wang, Huina Song, Qin Wang & Guang xiang Zhu
Chengdu Zoo, Chengdu, Sichuan Province, 610081, China
Yuqi Cheng

Authors

Mingwen Duan
View author publications
You can also search for this author in PubMed Google Scholar
Shijun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiufeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuemei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jingxue Luo
View author publications
You can also search for this author in PubMed Google Scholar
Ji Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huina Song
View author publications
You can also search for this author in PubMed Google Scholar
Qin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guang xiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mingwen Duan: Conceived and designed the experiments; data curation and analysis; writing (drafted the manuscript, review and editing). Shijun Yang: Conceptualization; data curation; investigation; writing (original draft, review and editing). Xiufeng Li, Xuemei Tang, Yuqi Cheng, Jingxue Luo, Ji Wang, Huina Song, and Qin Wang: Sample collection, investigation; methodology; writing (review and editing). Guangxiang Zhu: Conceived and designed the experiments; data curation; funding acquisition; resources; supervision; writing (drafted the manuscript, review and editing).

Corresponding author

Correspondence to Guang xiang Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Duan, M., Yang, S., Li, X. et al. Chromosome-level genome assembly and annotation of the Rhabdophis nuchalis (Hubei keelback). Sci Data 11, 850 (2024). https://doi.org/10.1038/s41597-024-03708-z

Download citation

Received: 23 May 2024
Accepted: 30 July 2024
Published: 08 August 2024
DOI: https://doi.org/10.1038/s41597-024-03708-z
Springer Nature Limited

Chromosome-level genome assembly and annotation of the Rhabdophis nuchalis (Hubei keelback)

Abstract

Similar content being viewed by others

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

Chromosome-level genome assembly and annotation of eel goby (Odontamblyopus rebecca)

Background & Summary