Chromosome-level genome assembly of the caenogastropod snail Rapana venosa

Song, Hao; Li, Zhuoqing; Yang, Meijie; Shi, Pu; Yu, Zhenglin; Hu, Zhi; Zhou, Cong; Hu, Pengpeng; Zhang, Tao

doi:10.1038/s41597-023-02459-7

Chromosome-level genome assembly of the caenogastropod snail Rapana venosa

Data Descriptor
Open access
Published: 16 August 2023

Volume 10, article number 539, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly of the caenogastropod snail Rapana venosa

Download PDF

Hao Song^1,2,3^na1,
Zhuoqing Li^1,3^na1,
Meijie Yang^1,2,3^na1,
Pu Shi^1,3^na1,
Zhenglin Yu⁴,
Zhi Hu^1,3,
Cong Zhou^1,3,
Pengpeng Hu^1,3 &
…
Tao Zhang^1,2,3

2180 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

The carnivorous gastropod Rapana venosa (Valenciennes, 1846) is one of the most notorious ecological invaders worldwide. Here, we present the first high-quality chromosome-scale reference R. venosa genome obtained via PacBio sequencing, Illumina paired-end sequencing, and high-throughput chromosome conformation capture scaffolding. The assembled genome has a size of 2.30 Gb, with a scaffold N50 length of 64.63 Mb, and is anchored to 35 chromosomes. It contains 29,649 protein-coding genes, 77.22% of which were functionally annotated. Given its high heterozygosity (1.41%) and large proportion of repeat sequences (57.72%), it is one of the most complex genome assemblies. This chromosome-level genome assembly of R. venosa is an important resource for understanding molluscan evolutionary adaption and provides a genetic basis for its biological invasion control.

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

Article Open access 25 October 2022

The chromosome-level genome assembly of the giant dobsonfly Acanthacorydalis orientalis (McLachlan, 1899)

Article Open access 08 April 2024

Chromosome-level genome assembly of the pygmy grasshopper Eucriotettix oculatus (Orthoptera: Tetrigoidea)

Article Open access 26 April 2024

Background & Summary

Caenogastropoda is an extraordinarily large and diverse group containing thousands of described species and comprising ~60% of extant gastropod species¹. These snails are extremely diverse in morphology, diet, and habitat and inhabit marine, terrestrial, and freshwater environments in the wild^2,3. To date, only two chromosome-level genomes of this clade have been published^4,5, which limits our understanding of the internal phylogeny and evolutionary adaption of this important clade.

Rapana venosa (Valenciennes, 1846) is a common marine carnivorous snail in the Caenogastropoda. It is native to the coasts of the Bohai, East, and Yellow Seas in China, the northern Korean peninsula, the far east of Russia, and northern Japan⁶, and is an economically important species in China⁷. Via global transport, R. venosa has unintentionally been introduced into the Rio de la Plata between Argentina and Uruguay, Chesapeake Bay, Quiberon Bay in France, and the coastal waters of the Netherlands, as an invasive species^8,9,10,11. Its successful establishment in these areas is based on its strong ecological fitness, involving high fecundity, easy dispersal as planktonic larvae, rapid growth rate, early sexual maturity, and broad tolerance to oxygen depletion, salinity, temperature, and water pollution¹². In the Chesapeake Bay region, R. venosa has very different prey and predation strategies from the native gastropod, Urosalpinx cinerea, and therefore disrupts the local trophic structure and attenuation of native shellfish resources¹³. As R. venosa feeds on economically valuable bivalves, such as oysters, mussels, and clams, it has also caused severe economic losses in the Black Sea area¹⁴. The economic importance in Asian countries and global ecological invasiveness of this species has led to extensive studies on its developmental mechanism and the genetic basis of its environmental adaptation^15,16,17. However, such studies are hampered by the lack of related genomic resources.

In this study, we used short reads generated by an Illumina platform, long reads generated by PacBio sequencing, and high-throughput chromosomal conformation capture (Hi-C) analysis to construct a high-quality R. venosa reference genome at the chromosomal level (Fig. 1). The genome sequences were assembled into 17,949 contigs, with a contig N50 length of 434.10 kb and a total length of 2.30 Gb. Chromosome scaffolding resulted in 5,242 sequences corresponding to 35 chromosomes. The largest 35 chromosome scale scaffolds are in total 2.25 Gb long, which corresponds to 97.88% of the total contig length. Using de novo and homolog-based strategies, 29,649 protein-coding genes were revealed by gene annotation, 77.22% of which were annotated in the publicly available NCBI RefSeq non-redundant protein, KEGG, TrEMBL, Swissprot, and InterPro databases. The R. venosa genome assembly has a high heterozygosity of 1.41% and a large proportion of repeat sequences (57.72%) and, therefore, is one of the most complex genome assemblies. Phylogenetic analysis indicated that R. venosa speciated from the common ancestor of Conus consors approximately 124.4 mya (78.3–177.5 mya).

Methods

Sample collection and sequencing

Living specimens of R. venosa were collected from Laizhou Bay, China. We extracted genomic DNA from R. venosa muscle samples using a QIAGEN DNeasy Kit (QIAGEN, Shanghai, China) as per the product manual. We used electrophoresis on a 1% agarose gel to examine the quality of the isolated genomic DNA. To ensure the DNA samples met the sequencing requirements, we used a Qubit instrument to quantify the concentration and 23.2 ng/µL DNA was obtained. Then, the genomic DNA was purified and concentrated by AMpure PB magnetic beads. The processed genomic DNA were further applied to prepare a single-molecule real-time bell sequencing library using the SMRTbell Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA, USA)¹⁸. The library was sequenced using the Pacific Biosciences Sequel II in continuous long-read (CLR) mode following the manufacturer’s instructions. As a result, 3 SMRT cells were sequenced, and we obtained a total of 256.49 Gb PacBio reads. The N50 and N90 lengths of the reads were 434.10 kb and 58.92 kb, respectively. Based on the protocol, we constructed the Illumina short-insert (350 bp) library. Paired-end sequencing was performed on the Illumina Novaseq 6000 platform (Illumina, Inc., San Diego, CA, USA) and a total of 153.00 Gb reads were obtained. For the Hi-C sequencing, fresh muscle was fixed in 1% formaldehyde and the fixation was terminated with 0.2 M glycine. In accordance with the protocol¹⁹, we prepared the Hi-C library and then sequenced on an Illumina NovaSeq 6000 sequencing platform¹⁹.

Genome assembly

R. venosa genome assembly was challenging because of the extremely high percentages of sequence repeats (57.72%) and heterozygosity (1.41%). We tried different genome-assembly strategies and ultimately selected that with the highest continuity and accuracy (Table 1). In total, 256.49 Gb of PacBio long-read data was used for de novo genome assembly using wtdbg v 2.4²⁰, which resulted in 17,949 contigs and a contig N50 length of 434.10 Kb. We then used Pilon v 1.23²¹ to polish the assembled genome with the Illumina short reads from the same individual. Purge Haplotigs software was used to remove redundancy from the assembled genome, obtaining a 2,293.82 Mb long assembly (Table 2). The total gene space was 38.3 Mb and the mean exon number per mRNA was about six. In our previous genome survey analysis, the estimated genome size of R. venosa was 2.20 Gb with 67.04% sequence repeats using a k-mer analysis, quite near to the assembly in this study²². The genome assembly size of R. venosa is substantially larger than those of some closely related mollusc species, such as Crassostrea gigas (557.74 Mb)²³, Biomphalaria glabrata (916.38 Mb)²⁴, Pomacea canaliculata (440.07 Mb)²⁵, and Achatina immaculata (1.65 Gb)²⁶, similar to those of Octopus bimaculoides (2.40 Gb)²⁷ and Conus consors (2.05 Gb)⁵, and smaller than that of Conus bullatus (3.43 Gb)⁴. Benchmarking Universal Single-Copy Orthologs (BUSCO) v 5.4.6²⁸ was used to evaluate the completeness and quality of the R. venosa genome assembly against the metazoa_odb10 database. Of the 978 BUSCO orthologous groups, 886 (90.6%) were identified as complete in the assembled genome (Table 3). This assembly was even better than the recently published genome of another Neogastropoda member, C. bullatus, with a contig N50 length of 171.48 kb and a BUSCO (v 5.4.6) value of 89.8%⁴. The GC content of the R. Rapana genome assembly is 42.38%.

Table 1 Comparison of effects of different genome assembly schemes.

Full size table

Table 2 Assembly statistics of R. venosa genome.

Full size table

Table 3 Statistical result of BUSCO evaluation results of genome assembly.

Full size table

Chromosomal-level genome scaffolding with Hi-C data

In total, 4991.96 million read pairs raw data were obtained from the Hi-C sequencing. We conducted quality control, sorting, and duplication removal using HiC-Pro v. 2.8.0²⁹. Using the Burrows-Wheeler Aligner (v. 0.7.10-r789)³⁰, 63.86% of the clean data were aligned to the draft genome assembly. Here, after using Juicer v1.5^31,32 and 3D-DNA v170123³³ to infer order and orientation, 97.88% of the contigs could be placed into 35 scaffolds (chromosomes), with their lengths ranging from 35.91 Mb to 129.26 Mb (Fig. 1, Table 4). After Hi-C scaffolding, the final Rapana genome assembly had a size of 2,251.40 Mb and a scaffold N50 of 64.63 Mb (Table 2). A chromatin contact matrix was manually curated in Juicebox v1.5³⁴ and the 35 scaffolds are clearly distinguishable in the heatmap in Fig. 2; the interaction signal around the diagonal is strongly apparent.

Table 4 Statistics of R. venosa genome sequence length (chromosome level).

Full size table

Repeat sequences and genome annotation

We used ab initio prediction and homology comparison to annotate the repetitive R. venosa genomic elements. For the ab initio repeat annotation, we used RepeatModeler v. 1.0.9³⁵, LTR_FINDER v. 1.0.7³⁶, and RepeatScout v. 1.0.7³⁷ to build a de novo repetitive element database. We used RepeatMasker v. 4.0.7³⁸ to annotate the repeat elements in the database. We used RepeatMasker v. 4.0.7 and RepeatProteinMask v 4.0.7 to identify the known repeat element types via searching the Repbase v. 20181026³⁹. In addition, Tandem Repeats Finder (TRF v. 4.09)⁴⁰ was used to annotate tandem repeats, identifying 1327.65 Mb of repetitive sequences, representing 57.72% of the assembled genome. This proportion is substantially higher than in closely related species, such as Lottia gigantea (10.39%)⁴¹, Aplysia californica (21.80%)⁴², P. canaliculata (11.27%)²⁵, and C. bullatus (38.56%)⁴. Among the repeat sequences, long interspersed nuclear elements were dominant (911.70 Mb, 39.636% of the assembled genome), and short interspersed nuclear elements were the rarest (6.09 Mb, 0.27%) (Table 5).

Table 5 Classification of repeat elements in the R. venosa genome.

Full size table

Candidate non-coding RNAs were annotated as follows. Ribosomal and transfer RNAs were predicted through BLASTN v. 2.2.28⁴³ and tRNAscan-SE v. 1.4⁴⁴ (www.lowelab.ucsc.edu/tRNAscan-SE/), respectively. We thus annotated 165 rRNA and 3,241 tRNA genes (e-value: 1e^–10). We searched against the Rfam database using Infernal v. 1.1.2⁴⁵ (http://infernal.janelia.org/) and identified 76 micro and 103 small nuclear RNAs.

We applied de novo, homolog-based, and transcriptomic strategies to annotate the protein coding genes in the R. venosa genome. For the de novo prediction, Augustus v. 3.2.3⁴⁶, pre-trained using the transcripts assembled from the RNA-seq of R. venosa, was employed to predict the coding regions on the repeat-masked assembly. The optimal parameters were obtained after the model training. For the homology-based prediction, we first downloaded the protein sequences of closely related molluscan species, including L. gigantea, C. consors, P. canaliculata, A. californica, A. immaculata, Elysia chlorotica, B. glabrata, C. gigas, Octopus vulgaris, and Haliotis rubra from the NCBI database. These protein sequences were aligned against the genome assembly using BLAT v. 35⁴⁷ with an e-value threshold of 1e⁻⁵. Then, we used GeneWise v. 2.4.1⁴⁸ to align the matching proteins to the homologous genomic sequences to accurately splice the alignments. For the transcriptomic prediction, Hisat v. 2.0.4⁴⁹ and Stringtie v. 1.2.3⁵⁰ were used for assembly based on the reference transcripts, and TransDecoder v. 5.5.0 (https://i5k.nal.usda.gov/Tigriopus_californicus) was used for gene prediction. Finally, all results were merged to form a consensus gene set using GLEAN⁵¹, and 29,649 protein-coding genes were predicted. To functionally annotate the protein-coding genes, we searched public biological functional databases (SwissProt, InterPro, KEGG, and TrEMBL) for their sequences using BLASTX v. 2.2.28⁴³ and BLASTN v. 2.2.28⁴³ with an e-value threshold of 1e⁻⁵; 22,894 genes (77.22%) were annotated in at least one public database.

Data Records

The raw Illumina, PacBio, and Hi-C sequencing data are deposited in the NCBI SRA database under the accession numbers SRR22889214⁵², SRR23517974⁵³, SRR23501451⁵⁴, SRR23501452⁵⁵, SRR23501453⁵⁶, and SRR23501454⁵⁷, respectively. The genome assembly has been deposited in the NCBI SRA database under the accession number JAQIHA000000000⁵⁸. The genome annotations are available from the Figshare repository⁵⁹.

Technical Validation

Evaluating genome assembly and annotation completeness

The assembled R. venosa genome size is 2.30 Gb with a scaffold N50 of 64.63 Mb (Fig. 1), close to the estimated size in previous studies²². Using blobtools v. 1.1.1⁶⁰, we created a blobplot to evaluate possible contamination of the contigs used for genome assembly (Fig. 3). As a result, we determined that 87.26% of the contigs had BLAST hits to mollusca. The remaining 12.74% of the contigs were categorized as follows: 8.03% as cnidaria, 2.49% as chordata, 0.17% as arthropoda, 0.07% as echinodermata, 0.04% as annelida, and 1.97% did not match any taxonomic group. These results suggest that the contigs used for R. venosa genome assembly were not contaminated with microorganisms. For the quality assessment of the genome assembly, an 90.6% completeness of BUSCO was obtained. The protein-coding sequence possessed an 89.1% completeness of BUSCO. These results suggest a high-quality R. venosa genome assembly considering its high heterozygosity and repeat content. The Illumina short reads were mapped to the assembled genome using BWA v. 0.7.10 to evaluate the completeness of the genome assembly³⁰. As shown in the Tables 6, 99.30% of the reads could be mapped, covering 78.51% of the assembled genome (Table 6). The Hi-C heatmap shows a well-organized interaction pattern within the chromosomal region (Fig. 2), and assembly resulted in 35 chromosome-level scaffolds, in line with previously published karyotyping⁴⁸. Taken together, these confidently confirm the accuracy of the chromosome scaffolding.

Table 6 Statistical results of short read alignment.

Full size table

Collinearity analysis and phylogenetic analysis

Collinearity analysis of chromosomes between R. venosa and another Caenogastropoda species Lautoconus ventricosus⁶¹ was conducted with LASTZ v. 1.02.00⁶². As shown in Fig. 4, almost 35 chromosome-level scaffolds of R. venosa displayed high homology with the corresponding chromosomes of L. ventricosus, which is suggestive of high quality sequencing and assembly and also make phylogenetic analysis more reliable. For phylogenetic analysis, we conducted pairwise sequence comparisons to predict orthologous genes. First, BLASTP v. 2.2.28 with an e-value cutoff of 1e^–7 was used to compare the protein sequences of all species. Then, TreeFam v. 9⁶³ was applied to cluster all genes. The species used in the gene family clustering analysis were R. venosa, H. rubra, L. gigantea, C. consors, P. canaliculata, A. californica, A. immaculata, E. chlorotica, B. glabrata, C. gigas, and O. vulgaris.

Phylogenetic trees were constructed based on single-copy orthologous gene families. Based on the alignment results of the orthologous protein sequences in MUSCLE v. 5.1⁶⁴, the corresponding coding regions of these protein sequences were selected. We extracted the fourfold degenerate synonymous sites of each alignment and concatenated them to form an individual supergene for each species. We used the supergene alignments to perform a maximum likelihood tree using PhyML v. 2.4.4⁶⁵, Mrbayes v. 3.2.6, and RAxML v. 8.2.12⁶⁶, respectively. Finally, the tree was visualized using Figtree (Fig. 4a). The phylogenetic tree shows that R. venosa and C. consors cluster into one clade, and the positions of the other clades are consistent with previously findings²⁶. MCMCtree⁶⁷ in PAML v. 4.4b⁶⁸, with a correlated molecular clock and HKY85 substitution model, was selected to estimate the divergence times between species. Five calibration nodes were used: C. gigas and O. vulgaris 532–582 mya, H. rubra and P. canaliculata 401–507 mya, L. gigantea and A. californica 401–507 mya, R. venosa and P. canaliculata 155–508 mya, and E. chlorotica and C. consors 334–489 mya. The divergence times of the calibrated nodes were retrieved from the TimeTree website (http://www.timetree.org/). As shown in the phylogenetic tree, the estimated split time between R. venosa and C. consors was approximately 124.4 mya (Fig. 5).

Code availability

No custom code was used in this study. The data analyses used standard bioinformatic tools specified in the methods.

References

Ponder, W. F. & Lindberg, D. R. Towards a phylogeny of gastropod molluscs: an analysis using morphological characters. Zool. J. Linn. Soc. 119, 83–265 (1997).
Article Google Scholar
Colgan, D. J., Ponder, W. F., Beacham, E. & Macaranas, J. Molecular phylogenetics of Caenogastropoda (Gastropoda: Mollusca). Mol. Phylogenet. Evol. 42, 717–737 (2007).
Article CAS PubMed Google Scholar
Barco, A. et al. A molecular phylogenetic framework for the Muricidae, a diverse family of carnivorous gastropods. Mol. Phylogenet. Evol. 56, 1025–1039 (2010).
Article CAS PubMed Google Scholar
Peng, C. et al. The first Conus genome assembly reveals a primary genetic central dogma of conopeptides in C. betulinus. Cell Discov. 7, 11 (2021).
Article CAS PubMed PubMed Central Google Scholar
Brauer, A. et al. The mitochondrial genome of the venomous cone snail Conus consors. PLoS One 7, e51528 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Mann, R. & Harding, J. M. Salinity tolerance of larval Rapana venosa: implications for dispersal and establishment of an invading predatory gastropod on the North American Atlantic coast. Biol. Bull. 204, 96–103 (2003).
Article CAS PubMed Google Scholar
Yang, M.-J. et al. Expression and activity of critical digestive enzymes during early larval development of the veined rapa whelk, Rapana venosa (Valenciennes, 1846). Aquaculture 519, 734722 (2020).
Article Google Scholar
Harding, J. M. & Mann, R. Observations on the biology of the Veined Rapa whelk, Rapana venosa (Valenciennes, 1846) in the Chesapeake Bay. J. Shellfish Res. 18, 9–17 (1999).
Google Scholar
Pastorino, G., Penchaszadeh, P. E., Schejter, L. & Bremec, C. Rapana venosa (Valenciennes, 1846) (Mollusca: Muricidae): A new gastropod in South Atlantic waters. J. Shellfish Res. 19, 897–899 (2000).
Google Scholar
Harding, J. M. & Mann, R. Veined rapa whelk (Rapana venosa) range extensions in the Virginia waters of Chesapeake Bay, USA. J. Shellfish Res. 24, 381–385 (2005).
Article Google Scholar
Lanfranconi, A., Brugnoli, E. & Muniz, P. Preliminary estimates of consumption rates of Rapana venosa (Gastropoda, Muricidae); a new threat to mollusk biodiversity in the Rio de la Plata. Aquat. Invas. 8, 437–442 (2013).
Article Google Scholar
Mann, R., Harding, J. M. & Westcott, E. Occurrence of imposex and seasonal patterns of gametogenesis in the invading veined rapa whelk Rapana venosa from Chesapeake Bay, USA. Mar. Ecol. Prog. Ser. 310, 129–138 (2006).
Article ADS Google Scholar
Harding, J. M., Kingsley-Smith, P., Savini, D. & Mann, R. Comparison of predation signatures left by Atlantic oyster drills (Urosalpinx cinerea Say, Muricidae) and veined rapa whelks (Rapana venosa Valenciennes, Muricidae) in bivalve prey. J. Exp. Mar. Biol. Ecol. 352, 1–11 (2007).
Article Google Scholar
Savini, D., Castellazzi, M., Favruzzo, M. & Occhipinti-Ambrogi, A. The alien mollusc Rapana venosa (Valenciennes, 1846; Gastropoda, Muricidae) in the northern Adriatic Sea: population structure and shell morphology. Chem. Ecol. 20(sup1), 411–424 (2004).
Article Google Scholar
Shi, P. et al. Molecular response and developmental speculations in metamorphosis of the veined rapa whelk, Rapana venosa. Integr. Zool. 18, 506–517 (2023).
Article CAS PubMed Google Scholar
Yang, M. J. et al. Symbiotic microbiome and metabolism profiles reveal the effects of induction by oysters on the metamorphosis of the carnivorous gastropod Rapana venosa. Comput. Struct. Biotechnol. J. 20, 1–14 (2022).
Article MathSciNet PubMed Google Scholar
Yang, M. J. et al. Integrated mRNA and miRNA transcriptomic analysis reveals the response of Rapana venosa to the metamorphic inducer (juvenile oysters). Comput. Struct. Biotechnol. J. 21, 702–715 (2023).
Article CAS PubMed Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article ADS CAS PubMed Google Scholar
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Article CAS PubMed Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Song, H. et al. Genome survey on invasive veined rapa whelk (Rapana venosa) and development of microsatellite loci on large scale. J. Genet. 97, e79–e86 (2018).
Article PubMed Google Scholar
Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490, 49–54 (2012).
Article ADS CAS PubMed Google Scholar
Bu, L. et al. Compatibility between snails and schistosomes: insights from new genetic resources, comparative genomics, and genetic mapping. Commun. Biol. 5, 940 (2022).
Article CAS PubMed PubMed Central Google Scholar
Liu, C. et al. The genome of the golden apple snail Pomacea canaliculata provides insight into stress tolerance and invasive adaptation. GigaScience 7, 9 (2018).
Article CAS Google Scholar
Liu, C. et al. Giant African snail genomes provide insights into molluscan whole-genome duplication and aquatic-terrestrial transition. Mol. Ecol. Resour. 21, 478–494 (2021).
Article CAS PubMed Google Scholar
Albertin, C. B. et al. Genome and transcriptome mechanisms driving cephalopod evolution. Nat. Commun. 13, 2427 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Article CAS PubMed Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310–1310 (2015).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Supplement 1), i351–i358 (2005).
Article CAS PubMed Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter, Unit 4.10 (2004).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526–531 (2013).
Article ADS CAS PubMed Google Scholar
Knudsen, B., Kohn, A. B., Nahir, B., McFadden, C. S. & Moroz, L. L. Complete DNA sequence of the mitochondrial genome of the sea-slug, Aplysia californica: conservation of the gene order in Euthyneura. Mol. Phylogenet. Evol. 38, 459–469 (2006).
Article CAS PubMed Google Scholar
Chen, Y., Ye, W., Zhang, Y. & Xu, Y. High speed BLASTn: an accelerated MegaBLAST search tool. Nucleic Acids Res. 43, 7762–7768 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Chan, P. P. TRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57 (2016).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kent, W. J. BLAT – The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
CAS PubMed PubMed Central Google Scholar
Doerks, T., Copley, R. R., Schultz, J., Ponting, C. P. & Bork, P. Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 12, 47–56 (2002).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360-U121 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Elsik, C. G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
Article PubMed PubMed Central Google Scholar
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR22889214 (2022).
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR23517974 (2022).
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR23501451 (2022).
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR23501452 (2022).
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR23501453 (2022).
NCBI sequence read archive. https://identifiers.org/ncbi/insdc.sra:SRR23501454 (2022).
Yang, M., Song, H. & Zhang, T. Rapana venosa breed wild species isolate MY-2022, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JAQIHA000000000 (2023).
Song, H. Annotations of Rapana venosa genome. Figshare. https://doi.org/10.6084/m9.figshare.22362598.v1 (2023).
Laetsch, D. R. & Blaxter, M. L. BlobTools: Interrogation of genome assemblies. F1000 Res. 6, 1287 (2017).
Article Google Scholar
Pardos-Blas, J. R. et al. The genome of the venomous snail Lautoconus ventricosus sheds light on the origin of conotoxin diversity. GigaScience 10, giab037 (2021).
Article PubMed PubMed Central Google Scholar
Harris, R. S. Improved Pairwise Alignment of Genomic DNA. Ph.D. dissertation, The Pennsylvania State University, Pennsylvania (2017).
Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 34, D572–D580 (2006).
Article CAS PubMed Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS PubMed Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Huelsenbeck, J. P. & Ronquist, F. MrBayes: bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755 (2001).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 32002409, 42206086, 31972814, and 32002374), the China Agriculture Research System of MOF and MARA, and the Creative Team Project of the Laboratory for Marine Ecology and Environmental Science, Qingdao National for Marine Science and Technology (Grant No. LMEESCTSP-2018). Hao Song was supported by the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2021QNRC001) and Youth Innovation Promotion Association by CAS. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Oceanographic Data Center, IOCAS for support of data analysis.

Author information

These authors contributed equally: Hao Song, Zhuoqing Li, Meijie Yang, Pu Shi.

Authors and Affiliations

CAS Key Laboratory of Marine Ecology and Environmental Sciences, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, 266071, China
Hao Song, Zhuoqing Li, Meijie Yang, Pu Shi, Zhi Hu, Cong Zhou, Pengpeng Hu & Tao Zhang
Laboratory for Marine Ecology and Environmental Science, Qingdao National Laboratory for Marine Science and Technology, Qingdao, 266237, China
Hao Song, Meijie Yang & Tao Zhang
University of Chinese Academy of Sciences, Beijing, 100049, China
Hao Song, Zhuoqing Li, Meijie Yang, Pu Shi, Zhi Hu, Cong Zhou, Pengpeng Hu & Tao Zhang
Research and Development Center for Efficient Utilization of Coastal Bioresources, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, 264003, China
Zhenglin Yu

Authors

Hao Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Meijie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pu Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhenglin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Pengpeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors’ contributions to specific working groups are indicated below. H. Song: steering committee, genome sequencing, genome assembly, genome annotation, data processing, statistical analysis, and manuscript writing. T. Zhang: steering committee. M. Yang: sampling, genome sequencing, genome assembly, genome annotation, data processing, statistical analysis, and manuscript writing. Z. Yu: sampling. Z. Hu: sampling. C. Zhou: sampling. P. Hu: sampling. P. Shi: genome sequencing, genome assembly, genome annotation, data processing, and statistical analysis. Z. Li: data processing and statistical analysis, manuscript writing. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, H., Li, Z., Yang, M. et al. Chromosome-level genome assembly of the caenogastropod snail Rapana venosa. Sci Data 10, 539 (2023). https://doi.org/10.1038/s41597-023-02459-7

Download citation

Received: 31 March 2023
Accepted: 09 August 2023
Published: 16 August 2023
DOI: https://doi.org/10.1038/s41597-023-02459-7
Springer Nature Limited

This article is cited by

Insights into the genome of the ‘Loco’ Concholepas concholepas (Gastropoda: Muricidae) from low-coverage short-read sequencing: genome size, ploidy, transposable elements, nuclear RNA gene operon, mitochondrial genome, and phylogenetic placement in the family Muricidae
- J. Antonio Baeza
- M. Teresa González
- Stacy Pirro
BMC Genomics (2024)

Chromosome-level genome assembly of the caenogastropod snail Rapana venosa

Abstract

Similar content being viewed by others

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

The chromosome-level genome assembly of the giant dobsonfly Acanthacorydalis orientalis (McLachlan, 1899)

Chromosome-level genome assembly of the pygmy grasshopper Eucriotettix oculatus (Orthoptera: Tetrigoidea)

Background & Summary