Abstract
Holothuria scabra, a commercially valuable yet ecologically vulnerable tropical holothuroid, has experienced a severe decline in its wild populations, especially in China. Genomic resources are crucial for the development of effective genomic breeding projects and stock conservation strategies to restore these natural populations. Until now, a high-quality, chromosome-level reference genome for H. scabra has not been available. Here, we employed Oxford Nanopore and Hi-C sequencing technologies to assemble and annotate a high-quality, chromosome-level reference genome of H. scabra. The final genome comprised 31 scaffolds with a total length of 1.19 Gb and a scaffold N50 length of 53.52 Mb. Remarkably, 1,191.67 Mb (99.95%) of the sequences were anchored to 23 pseudo-chromosomes, with the longest one spanning 79.75 Mb. A total of 34,418 protein-coding genes were annotated in the final genome, with BUSCO analysis revealing 98.01% coverage of metazoa_odb10 genes, marking a significant improvement compared to the previous report. These chromosome-level sequences and annotations will provide an essential genomic basis for further investigation into molecular breeding and conservation management of H. scabra.
Similar content being viewed by others
Background & Summary
Echinoderms are a group of deuterostome invertebrate animals1, comprising two other closely related phylogenetic groups: hemichordates and chordates2. Echinoderms are the largest marine phylum among deuterostome animals, with more than 7,000 living species3. They are found exclusively in marine environments and are known to inhabit diverse marine environments, including shallower oceans, the deep oceans, as well as intertidal zones4. Sea cucumbers, or holothuroids (Echinodermata: Holothuroidea), are a species-diverse echinoderm group with significant ecological and economic roles in marine ecosystems as important benthic invertebrates5. There are more than 1,700 confirmed holothuroid species in the world, with the greatest diversity located in the Asia-Pacific region6. Nevertheless, there are fewer than 100 holothuroid species are considered suitable for consumption and are currently being commercially harvested worldwide7. Among them, only a few species such as Apostichopus japonicus have undergone artificial breeding techniques and are being cultivated as a delicious seafood8. In recent years, increasing consumer demand in the Asian region has caused severe overfishing of numerous commercially valuable holothuroid species, leading to a significant depletion of their natural populations. The development of artificial breeding techniques and effective fishery management of sea cucumbers needs urgent investigation.
Holothuria scabra, or sandfish, is among the commercially valuable yet ecologically vulnerable tropical holothuroids9. Because of its rich flavor and great nutritional value, H. scabra is considered one of the most precious tonic foods with significant commercial value in the Asian market10. Since the 1970s, when commercial harvests increased dramatically, H. scabra has been intensively exploited in the Pacific and Indian oceans, including China, Indonesia, India, Philippines, and Australia11. In recent decades, a growing demand in the Asian market, coupled with insufficient fishery management of H. scabra, has led to the depletion of the natural stocks across its entire geographic range, particularly in China9. The wild populations of H. scabra have not been observed in the Beibu Gulf of China for more than a decade now, despite the fact that natural stocks of this species were once abundant in this region during the 1960s12. Owing to the dramatic decline in the wild stocks of H. scabra, the International Union for Conservation of Nature (IUCN) designated H. scabra as an endangered species in 201313. In order to recover H. scabra’s severely depleted natural populations, it is imperative to develop effective aquaculture techniques and population enhancement projects of H. scabra. However, thus far, artificial breeding techniques for H. scabra have been applied inefficiently in China, and the scale of aquaculture for H. scabra has been limited14.
In recent times, high-quality chromosome-level reference genomes of commercially valuable holothuroid species, such as A. japonicus8,15 and Holothuria leucospilota16, have been published and are now readily accessible to researchers. These genomic resources are essential for implementing more efficient genomic breeding projects and stock conservation strategies aimed at restoring natural populations. However, thus far, high-quality chromosome-level reference genome of H. scabra has not been published. Even though H. scabra’s genomic assembly and annotation were reported in 2022, the genome sequences and annotation data are still regrettably inaccessible to researchers. In this report, using multiple sequencing technologies, a high-quality chromosome-level reference genome of H. scabra was constructed and annotated. Approximately 1.19 Gb of genomic sequences were assembled into a chromosome-level genome, consisting of 31 scaffolds, with a scaffold N50 length of 53.52 Mb and a total of 528 gaps. Specifically, 1,191.67 Mb (99.95%) of the sequences were anchored to 23 pseudo-chromosomes, with the longest one spanning 79.75 Mb. In total, 34,418 protein-coding genes were annotated, and the BUSCO analysis demonstrated coverage of 98.01% of metazoa_odb10 genes, marking a significant improvement compared to the previous report. The availability of the first chromosome-level genome sequences and annotations for H. scabra represents a valuable genomic resource. It will play a pivotal role in enabling more efficient genomic breeding projects and stock conservation strategies, which are essential for restoring the severely depleted stocks of H. scabra in China.
Methods
Sample collection and sequencing
A healthy male sandfish with a body weight of 82.30 g was collected from Hainan Province, China, (coordinates: 19.25 N, 110.64 E) for genomic DNA sequencing, transcriptome sequencing, and chromosome conformation capture (Hi-C) sequencing. To perform genomic DNA sequencing, we extracted high molecular weight (HMW) genomic DNA from a muscle sample using the QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany). Subsequently, we utilized a combination of Nanopore and MGI platforms to obtain the genomic sequences of H. scabra. Approximately 46.10 Gb of long read sequences averaging 18,500 bp in length were obtained from the Nanopore 20 kb insert sequencing library. Meanwhile, the MGI 350 bp insert sequencing library yielded 113.56 Gb of paired-end sequences with 2 × 150 bp lengths and Q20 > 98.58%. To conduct transcriptome sequencing, total RNA was extracted from various tissue samples, including tentacles, respiratory tree, and intestine, using the RNAiso kit (TaKaRa, Tokyo, Japan). We utilized the MGISEQ-2000 platform to generate approximately 74.76 Gb of transcriptome data with a Q20 quality score of 98.64%. In order to perform Hi-C sequencing, a sample of freshly harvested muscle was first formaldehyde cross-linked and then digested using the DpnII restriction enzyme. By using the Illumina NovaSeq platform, approximately 168.60 Gb of clean paired-end reads with a Unique Mapped Ratio of 60.45% were generated from the Hi-C sequencing library (Table 1).
Genome survey and assembly
The haploid genome length and polymorphism information of H. scabra were assessed through k-mer analysis using the genomic paired-end reads from the MGI sequencing platform. Initially, k-mer frequencies with a length of 19 were calculated using Jellyfish (v.2.3.0)17. Subsequently, Genomescope (v.2.0)18 was employed to calculate genomic characteristics for H. scabra based on the frequency information. The results revealed a predicted haploid genome size of 1127.23 Mb, with repetitive rate and heterozygous rate of 41.73% and 1.24%, respectively (Fig. 1). In order to assemble the high heterozygosity regions of H. scabra genome, the long read data from Nanopore sequencing platform were used by NextDenovo (v.2.5.2) with a correct-then-assemble strategy. Subsequently, the high accuracy genomic paired-end reads data were used by NextPolish (v.1.4.1)19 to improve the base accuracy of contigs. Finally, the redundancy regions in corrected contigs were eliminated by Purge_Dups (v.1.2.6)20. Eventually, we assembled a contig-level genome for H. scabra, consisting of 505 contigs with a total size of 1192.13 Mb, a contig N50 length of 3.15 Mb, and the longest contig spanning 19.72 Mb. In order to anchor the contig sequences to chromosomes, a Hi-C scaffolding tool called YaHS (v.1.1)21 was applied for mapping all Hi-C paired-end reads to the contig-level genome of H. scabra with default parameters. For correction and refinement of the draft scaffold genome, Juicebox (v.3.1.4)22 was used for manual reviewing and Hi-C interaction map generation. The Hi-C interaction map of the final chromosomal-level genome showed a clear interaction signal of 23 super-scaffolds indicating that the number of pseudo-chromosomes in H. scabra’s genome was 23 (Fig. 2). The result of the genome-wide interaction signal of H. scabra was consistent with the previous investigation in the holothuria species16 and A. japonicus8. The final chromosomal-level genome had a size of 1.19 Gb and consisted of 31 scaffolds, with a scaffold N50 length of 53.52 Mb and gap length (N’s) of 52,800. Of the final genomic sequences, 1,191.67 Mb (99.95%) of the sequences were anchored to 23 pseudo-chromosomes, with lengths ranging from 79.75 Mb to 33.31 Mb (Table 2 & Fig. 3), and contained 525 gaps. To evaluate the functional completeness of the final chromosomal-level genome, we conducted a BUSCO (v.5.4.4)23 analysis in genome mode using the metazoa_odb10 dataset. The results indicated that it covered 97.90% of metazoa_odb10 genes, comprising a completion rate of 94.55% and a fragmentation rate of 3.35%, with only a missing rate of 2.10% (Table 2). Compared to the previous genome assembly report of H. scabra, our chromosomal-level assembled genome has shown significant improvements in terms of genomic contiguity and completeness (Table 2).
Transposable elements and noncoding RNAs (ncRNAs) annotation
To evaluate the presence of transposable elements (TE) in the genome of H. scabra, a combined approach utilizing ab initio and homology-based strategies was employed by EDTA (v.2.1.0)24 and RepeatMasker (v.4.1.2, www.repeatmasker.org), respectively. The TE of H. scabra’s genome was first predicted with ab initio strategy, and then based on the predicted TE library, homology-based strategy was used to find out the remaining TE sequences. The final TE prediction analysis revealed that 52.31% of the H. scabra genome, equivalent to a total length of 623.64 Mb, consisted of transposable elements, slightly higher proportion compared to H. leucospilota (50.41%). Specifically, the H. scabra genome exhibited the most abundant of terminal inverted repeats, accounting for 35.81% of the genomic sequences. Long terminal type and tandem type were the following abundant types, accounting for 8.84% and 4.43%, respectively (Table 3). In order to predict the ncRNAs in the genomic sequence, tRNAscan-SE (v.2.0.6)25 and RNAmmer (v.1.2)26 were first applied to search transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs). Pre-miRNAs and other remaining ncRNAs were searched by Infernal (v.1.1.2)27 based on the Rfam datasets. A set of 0.97 Mb genomic sequences was predicted to be ncRNAs including 5,836 tRNAs, 1,978 pre-miRNAs, 99 rRNAs, and 1699 snRNAs (Table 4).
Gene prediction and functional annotation
For the prediction of protein-coding genes, a combination of transcript-based, homology-based, and ab initio prediction methods was employed. Initially, transcriptome sequencing data from our study were employed to de novo assemble transcripts using Trinity (v.2.14)28 with default parameters. Subsequently, StringTie (v.2.2.1)29 was utilized to construct genome-guided transcripts. The predicted genes based on transcripts were then obtained by applying PASA (v.2.5.2)30 to map both the de novo and genome-guided transcripts. The validation of the homology-based method was conducted using GeMoMa (v.1.9)31 with default settings, relying on echinoderm protein data from GenBank, which included Lytechinus pictus32, Anneissia japonica33, H. leucospilota34, and A. japonicus35. For the ab initio approach, BRAKER (v.2.1.6)36 in combination mode was applied for predicting coding genes based on both the transcriptome sequencing data from this study and echinoderm protein data. Subsequently, the predictions from all three methods underwent evaluation using EvidenceModeler (v.2.1.0)37, followed by functional annotation with DIAMOND (v.2.1.3)38 and HMMER (v.3.4, hmmer.org). Default parameters were employed for searching the Swiss-Prot, UniProtKB-TremBL, Gene Ontology (GO), and KEGG databases, with an E-value limit of 1e-5 for homologous annotation. The completeness validation of the final predicted protein-coding genes was conducted through BUSCO (v.5.4.4) with the metazoa_odb10 datasets. The prediction of the H. scabra genome yielded a total of 34,418 protein-coding genes, with an average length of 16,859 bp (Table 2). According to BUSCO evaluation, these predicted genes covered 98.74% of metazoa_odb10 genes, with 98.01% complete genes (Table 5). Importantly, the completeness of gene predictions in the H. scabra genome exhibited a substantial improvement compared to the previous report (98.01% vs. 90.11%). Moreover, functional annotation was accomplished for 84.77% (Table 6) of H. scabra predicted genes across diverse protein databases, including UniProtKB-TremBL (83.30%), Swiss-Prot (55.17%), KEGG (60.13%), and GO (52.50%). Interestingly, the annotation ratios exhibited a notable similarity to those observed in H. leucospilota16, specifically in Swiss-Prot (55.80%), KEGG (66.11%), and GO (53.56%).
Data Records
The H. scabra genome assembly and annotation projects have been registered in the NCBI BioProject database under PRJNA1047316. The genomic sequencing data from both Nanopore and MGI platforms have been deposited in the NCBI Sequence Read Archive (SRA), with accession numbers SRR2701083839 and SRR2701003140, respectively. The transcriptome and Hi-C sequencing data are also stored in the NCBI SRA, with accession numbers SRR2702266941 and SRR27030181-SRR2703018342, respectively. The chromosomal-level genome assembly has been deposited in the NCBI GenBank with accession number GCA_037179385.143. The chromosomal-level genome assembly and annotation files can be accessed through the Figshare44.
Technical Validation
DNA and RNA quality validation
Quality validation of genomic DNA samples using both Nanodrop spectrophotometer (LabTech, USA) and pulse electrophoresis in agarose gel. DNA samples with slightly degraded were considered viable for sequencing library construction. For RNA samples, quality was verified using an Agilent 2100 bioanalyzer (Agilent Technologies), with samples having an RNA integrity number (RIN) greater than 9.50 being considered suitable for library construction.
Genome assembly and annotation quality evaluation
Quality validation of the genome assembly was initially conducted using QUAST (v.5.2 https://github.com/ablab/quast), which revealed a significant improvement in genome continuity for the final chromosome-level genome. In comparison with previous studies, the scaffold N50 of the H. scabra genome has increased markedly from 1.56 Mb to 53.51 Mb, and the length of the largest scaffold has grown from 11.12 Mb to 79.75 Mb (Fig. 3). Furthermore, the BUSCO completion score, evaluated using the metazoa_odb10 datasets, improved from 89.13% to 94.55% (Table 5). Merqury (v.1.3)45 was subsequently employed to evaluate the accuracy and completeness of the genome. The resulting consensus quality value (QV) of 53.37 and k-mer completeness of 91.31% suggest that the final chromosome-level genome assembly achieves a high degree of quality. Genomic sequencing data from Nanopore and MGI platforms were aligned using BWA (v.0.7.17)46 and minimap2 (v.2.28)47, respectively, to further validate the quality of the final genome assembly. The mapping rates for Nanopore and MGI sequencing data were 99.44% and 99.73%, respectively, while genome coverage rates were 99.99% and 99.65% (Table 8 & Fig. 3). These results indicate a high degree of quality in the final genome assembly. Lastly, the quality of the genome annotation was evaluated using the BUSCO (v5.4.4) software, based on the metazoa_odb10 datasets. This assessment revealed that the final genome annotation encompassed 98.74% of the metazoa_odb10 genes, demonstrating a high completeness rate in gene predictions. Additionally, we aligned transcriptome sequencing data using STAR (v.2.7.11a)48 software, set to spliced transcripts mode. This alignment process resulted in 91.01% of the sequencing reads being accurately mapped to the predicted genes, thereby confirming the high accuracy of our gene predictions.
Orthologous gene prediction and functional annotation evaluation
The investigation into the orthologous gene clusters of H. scabra was conducted through OrthoFinder (v.2.5.4)49, utilizing genome-wide protein data from 16 related species obtained from GenBank and Figshare (Supplementary Table 1). Within the protein-coding genes of H. scabra, a total of 29,090 genes (84.51%) were identified as orthologous gene clusters, including 1,264 genes (3.71%) classified as species-specific type (Table 7). Additionally, the entire Echinodermata phylum shares 5,784 orthogroups, inclusive of 242 single-copy orthogroups. Sequence alignment of these single-copy orthogroups was conducted using MUSCLE (v.3.8.31)50, followed by the construction of a phylogenetic tree with RAxML (v.8.2.9)51, based on the super protein sequences integrated from these alignments, using the LG4M model and 1,000 bootstrap replicates. The phylogenomics analysis reveals a close relationship between H. leucospilota and H. glaberrima within the Holothuriidae clade, with H. scabra diverging from their shared ancestor at a later stage. Compared to the Apodida, the Stichopodidae and Holothuriidae families show a closer evolutionary relationship in the Holothuroidea clade. The evolutionary histories of gene families in H. scabra and 16 related species were examined using recalibrated evolutionary times for Stichopus monotuberculatus and A. japonicus obtained from TimeTree (www.timetree.org), processed with r8s (v.1.71) software. This was followed by a likelihood analysis comparing H. scabra to other Echinodermata species using the CAFÉ (v.5.0)52 tool with standard parameters. The likelihood analysis revealed that within the H. scabra genome, 359 gene families are unique, 215 are significant expansion, and only 113 are significant contraction, as illustrated in Fig. 4. Additionally, TBtools-II (v.2.0.81)53 was utilized for functional enrichment analyses to explore the biological functions of expanded gene families. The analyses revealed that these evolutionary gene families predominantly participate in biological processes related to signaling molecules and interaction, cytochrome P450, glycan biosynthesis and metabolism, glycosyltransferases, and environmental adaptation (Figs. 5, 6).
Code availability
No custom scripts were utilized in this study. All commands and pipelines for data processing were carried out in compliance with the established protocols of the bioinformatics software, on a local high-performance server (PowerEdge T630, Dell Technologies).
References
Cameron, C. B., Garey, J. R. & Swalla, B. J. Proceedings of the National Academy of Sciences of the United States of America. PNAS. 97, 4469–4474, (2000).
Simakov, O. et al. Hemichordate genomes and deuterostome origins. Nature. 527, 459–465 (2015).
Sonet, G., Smitz, N., Vangestel, C. & Samyn, Y. DNA barcoding echinoderms from the East Coast of South Africa. The challenge to maintain DNA data connected with taxonomy. PLoS One. 17, e0270321 (2022).
Alvarado, J. J., Chacón-Monge, J. L., Azofeifa-Solano, J. C. & Cortés, J. Diversity of Deep-Sea Echinoderms From Costa Rica. Front. Mar. Sci. 9, 918878 (2022).
Zhong, S. et al. The draft genome of the tropical sea cucumber Stichopus monotuberculatus (Echinodermata, Stichopodidae) reveals critical genes in fucosylated chondroitin sulfates biosynthetic pathway. Front. Genet. 14, 1182002 (2023).
Gajdosechova, Z. et al. Arsenic speciation in sea cucumbers: Identification and quantitation of water-extractable species. Environ. Pollut. 266, 115190 (2020).
Uthicke, S., Byrne, M. & Conand, C. Genetic barcoding of commercial Bêche-de-mer species (Echinodermata: Holothuroidea). Mol. Ecol. Resour. 10, 634–646, (2010).
Sun, L., Jiang, C., Su, F., Cui, W. & Yang, H. Chromosome-level genome assembly of the sea cucumber Apostichopus japonicus. Sci. Data. 10, 454 (2023).
E, Z. et al. Applications of Environmental DNA (eDNA) in Monitoring the Endangered Status and Evaluating the Stock Enhancement Effect of Tropical Sea Cucumber Holothuria Scabra. Mar. Biotechnol. 25, 778–789 (2023).
Nobsathian, S. et al. An antioxidant activity of the whole body of Holothuria scabra. Chem. Biol. Technol. Agric. 4, 1–5 (2017).
Hamel, J. F., Conand, C., Pawson, D. L. & Mercier, A. The sea cucumber Holothuria scabra (Holothuroidea: Echinodermata): Its biology and exploitation as beche-de-mer. Adv. Mar. Biol. 41, 129–223 (2001).
Yang, X. et al. Annual change of gonadal development in Holothuria scabra from Hainan Island. J. South. Agric. 46, 1117–1122 (2015).
Hamel, J. F. et al. Holothuria scabra. The IUCN Red List of Threatened Species 2013: e.T180257A1606648., (2013).
Luo, H. et al. De novo genome assembly and annotation of Holothuria scabra (Jaeger, 1833) from nanopore sequencing reads. Genes Genomics. 44, 1487–1498 (2022).
Wang, Y., Yang, Y., Li, Y. & Chen, M. Identification of sex determination locus in sea cucumber Apostichopus japonicus using genome-wide association study. BMC Genomics. 23, 391 (2022).
Chen, T. et al. The Holothuria leucospilota genome elucidates sacrificial organ expulsion and bioadhesive trap enriched with amyloid-patterned proteins. PNAS. 120, e2213512120 (2023).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27, 764–770, (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics. 36, 2253–2255 (2020).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 36, 2896–2898 (2020).
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 39, btac808 (2022).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57 (2016).
Huang, Y., Gilna, P. & Li, W. Identification of ribosomal RNA genes in metagenomic fragments. Bioinformatics. 25, 1338–1340, (2009).
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25, 1335–1337 (2009).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Shumate, A., Wong, B., Pertea, G. & Pertea, M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput. Biol. 18, e1009730 (2022).
Campbell, M. A., Haas, B. J., Hamilton, J. P., Mount, S. M. & Buell, C. R. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics. 7, 327 (2006).
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinf. 19, 189 (2018).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_037042905.1 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_011630105.1 (2020).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_029531755.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_037975245.1 (2024).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinf. 3, lqaa108 (2021).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 18, 366–368 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27010838 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27010031 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27022669 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP475262 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_037179385.1 (2024).
Zhong, S. Genome data of the tropical sea cucumber Holothuria scabra. figshare. https://doi.org/10.6084/m9.figshare.24771183.v2 (2023).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. bioinformatics. 25, 1754–1760, (2009).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 37, 4572–4574 (2021).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15–21 (2013).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797, (2004).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30, 1312–1313, (2014).
Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 36, 5516–5518 (2020).
Chen, C. et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol. Plant. 16, 1733–1742 (2023).
Acknowledgements
This study was supported by China Postdoctoral Science Foundation (2021M701798), Guangxi University of Chinese Medicine “GuiPai Traditional Chinese Medicine inheritance and innovation team” Project (2022A007), Development Program of High-level Talent Team under Qihuang Project of Guangxi University of Chinese Medicine (2021004), the Open Fund of Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture (GXKEYLA2022-03), Fund of Hainan Provincial Key Laboratory of Tropical Maricultural Technologies (TMTOF202204) and National Natural Science Foundation of China (31960225).
Author information
Authors and Affiliations
Contributions
Shengping Zhong, Xujia Liu and Xiuli Chen conceived study. Shengping Zhong and Xiaowan Ma wrote manuscript. Shengping Zhong, Xiaowan Ma, Hongtao Liu, and Ying Qiao bioinformatics analysis. Yan Jiang, Mengqing Zeng and Longyan Zhao collected samples and quality validation. Lianghua Huang, Guoqiang Huang and Yongzhen Zhao investigation gene annotation and quality validation. All authors approved the final submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhong, S., Liu, X., Ma, X. et al. Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra. Sci Data 11, 474 (2024). https://doi.org/10.1038/s41597-024-03340-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03340-x
- Springer Nature Limited