Abstract
Ellobium chinense is an airbreathing, pulmonate gastropod species that inhabits saltmarshes in estuaries of the northwestern Pacific. Due to a rapid population decline and their unique ecological niche in estuarine ecosystems, this species has attracted special attention regarding their conservation and the genomic basis of adaptation to frequently changing environments. Here we report a draft genome assembly of E. chinense with a total size of 949.470 Mb and a scaffold N50 of 1.465 Mb. Comparative genomic analysis revealed that the GO terms enriched among four gastropod species are related to signal transduction involved in maintaining electrochemical gradients across the cell membrane. Population genomic analysis using the MSMC model for 14 re-sequenced individuals revealed a drastic decline in Korean and Japanese populations during the last glacial period, while the southern Chinese population retained a much larger effective population size (Ne). These contrasting demographic changes might be attributed to multiple environmental factors during the glacial–interglacial cycles. This study provides valuable genomic resources for understanding adaptation and historical demographic responses to climate change.
Similar content being viewed by others
Background & Summary
Gastropods are one of the most diverse and specious molluscan classes, with some lineages having successfully radiated into diverse aquatic and terrestrial environments1. Recent comparative genomic analyses have provided significant insights into the adaptation of many molluscan species to different environments2,3, but the majority of genomic data are derived from marine or freshwater species and terrestrial/brackish water species are scarcely represented (76 marine, 24 freshwater, 1 brackish, and 5 terrestrial species in GenBank as of June 2023).
Ellobium chinense (Pfeiffer, 1854)4 is an airbreathing, pulmonate gastropod species that inhabits saltmarshes in estuaries of the northwestern Pacific, including Korea, Japan, and China5 (Fig. 1a,b). Due to a rapid population decline caused by habitat destruction from increased human activity, this species has attracted special attention regarding their conservation and is listed as Vulnerable (VU) in Korea and Japan6,7. Estuaries are transition zones between seas and rivers and constitute unique ecosystems, where seawater and freshwater draining from the land mix. In this respect, E. chinense provides an ideal model to study the genomic basis of adaptation acquired during its ecological transition (i.e., terrestrialization) from marine to nonmarine habitats8,9,10,11. In this study, we report the first genome sequences for this species, assembled into a draft genome of 949.470 Mb in size with a scaffold N50 of 1.465 Mb, and the results of a comparative genomic analysis of E. chinense with other gastropod species representing different habitat types (Aplysia californica [marine], Biomphalaria glabrata [freshwater], and Achatina fulica [terrestrial]). Comparative analysis of orthologous genes identified a total of 18,594 orthologous clusters, 8,947 of which were shared among four gastropod species in common and a total of 1,019 orthologous clusters were exclusively found in E. chinense (Fig. 2). Results from GO enrichment analysis for orthologous gene clusters revealed the top five GO terms uniquely enriched to E. chinense were DNA transposition (GO:0006313), DNA binding (GO:0003677), replication fork processing (GO:0031297), synaptic transmission (GO:0007271), and RNA-directed DNA polymerase activity (GO:0003964) (Fig. 2). Furthermore, the top five significantly enriched GO terms shared among four gastropod species were ubiquitin-dependent protein catabolic process (GO:0006511), sodium ion transport (GO:0006814), cell adhesion (GO:0007155), synaptic transmission (GO:0007271), and GTP binding (GO:0005525). Of these, GTP binding, synaptic transmission, and sodium ion transport are related to signal transduction that is involved in maintaining the electrochemical gradient across the cell membrane.
We also performed population genomic analysis on 14 re-sequenced individuals (sequenced to ~30 X coverage) sampled from three localities (China, Japan, and Korea) (Fig. 3a) covering their native range to examine their population genetic structure and historical demographic changes. The Japanese population was genetically differentiated from the Chinese (Fst = 0.028) and Korean populations (Fst = 0.027), while there was a much lower population differentiation between Chinese and Korean populations (Fst = 0.005). Similarly, in our principal component analysis (PCA) based on approximately 18 Mb of genome-wide single nucleotide polymorphism (SNP) data, PC1 first separates the Japanese individuals from the Korean/Chinese individuals and PC2 successively separates the Korean individuals from the Chinese ones (Fig. 3b). We also estimated the demographic history (i.e., the trajectory of effective population size, Ne) of E. chinense populations using the multiple sequentially Markovian coalescent (MSMC2 v2.11) model. Inferred Ne from different geographic origins showed similar demographic patterns across geographic isolates in their early stage of incremental growth until the Quaternary interglacial period of MIS 15 (Marine Isotope Stage), followed by a steep increase during the MIS 11, the longest and warmest interglacial interval, spanning between 424 kya and 374 kya (Fig. 3c). Separation of the Ne trajectories between populations suggests that these three regional populations split from each other after the MIS 11. Most notably, the Ne of the Chinese population stayed relatively high during the last glacial period, compared to the Japanese and Korean populations. The relatively high Ne of the Chinese population might be attributed to multiple factors, such as climatic factors, geological processes, and hydrological conditions during the glacial–interglacial cycles. The Chinese population is represented by individuals sampled from a mangrove forest in the Beibu Gulf, at the edge of the Indo-Pacific convergence region that is well known for its high biodiversity12,13. High temperature in this subtropical/tropical region might have played an important role in maintaining greater diversity and higher survival rates in intertidal species during glacial periods14,15. Since more solar radiation arrives in the tropics than at the poles, higher primary productivity may also have mediated processes that increased diversification. Furthermore, there are many subtropical–tropical islands in this region, and the extensive and diverse habitats of these peripheral islands might have provided southern Chinese populations with potential refugia during glacial periods, allowing for the maintenance of high genetic diversity16.
In summary, this study presents a reference genome assembly and population genomic data for Ellobium chinense, a pulmonated gastropod species inhabiting the saltmarshes of estuaries in the northwestern Pacific and a species of special interest for its conservation status. Comparative analysis of four gastropod draft genomes including that of E. chinense revealed that some commonly enriched GO terms are related to signal transduction that is involved in maintaining the electrochemical gradient across the cell membrane. A separate population genomic analysis using 14 re-sequenced individuals revealed contrasting demographic changes among studied populations (China, Japan, and Korea) during the last glacial period, that might be attributed to multiple environmental factors during the glacial–interglacial cycles. The draft genome sequence of E. chinense provides valuable genomic resources for understanding evolutionary adaptation, historical demographic responses to climate change, and for its future use in conservation genetics of endangered species. Nevertheless, the quality and continuity of the draft genome sequences are incomplete, thereby necessitating further investigation for its quality improvement using long-read sequencing strategy. High-quality of genome assembly from this further effort will provide a premise that can corroborate the main findings discussed in this study.
Methods
Sample collection and genome sequencing
For reference genome sequencing, live specimens of E. chinense were collected from estuarine saltmarshes in Korea (35°22'51.9“N, 126°24'47.6“E; Fig. 1a,b) under a governmental permit from the Yeongsan River Basin Environmental Office (Permit no. 2016–29). The collected samples were transferred alive to the laboratory and kept in the −80°C freezer after dissection. Total genomic DNA was extracted from foot tissue using a PCI (phenol:chloroform:isoamyl alcohol 25:24:1) solution. To construct a reference genome of E. chinense, we combined paired-end (180 bp, 400 bp inserts) and mate-pair (2 Kb, 5 Kb, and 8 Kb inserts) sequencing libraries on the Illumina platform (HiSeq 2000), generating a total of 118.94 Gb raw sequences accounting for approximately 125 X coverage of the final assembly (Table 1). For transcriptome sequencing, total RNA was extracted using TRIzol from the six tissues (albumen gland, digestive gland, foot, mantle, ovary, and stomach). Then, Illumina paired-end libraries with a 350 bp insert size were constructed using TruSeq RNA Sample Prep Kit v2 and sequenced on an Illumina Hiseq 4000 platform with a read length of 151 bp. Adaptor and low-quality sequences from the transcriptome data were trimmed using Trimmomatic-0.3617, and contaminated reads were filtered using the Kraken2 standard database18. The filtered transcriptome reads were then mapped to the assembled genome sequences using BWA v0.7.1719. The mapping rate of RNA sequence reads from six different tissue types ranged from 80.45% (stomach) to 95.41% (albumen gland) (see Supplementary Table 1 for their statistics).
Genome assembly
Raw data quality was assessed using FastQC v0.11.820. Adaptor and low-quality sequences were trimmed using Trimmomatic-0.3617 and mate-pair libraries were trimmed again with Trimgalore v0.4.221. Sequence errors in trimmed reads were corrected by a perl script, ErrorCorrectReads.pl in Allpaths-LG22. In all, approximately 1.04 Gb high-quality reads were generated (Table 1). A k-mer (k = 21) analysis using Jellyfish v2.3.023 and GenomeScope224 estimated the E. chinense genome size to be 822 Mb, with a heterozygosity of 2.15% which is relatively very high, compared with three other gastropod species (A. californica [0.962%], B. glabrata [1.42%], and A. fulica [0.138%]) (Fig. 4a and Supplementary Fig. 1). This significantly high heterozygosity level in the E. chinense genome sequences can lead to highly fragmented genome assembly25. De novo genome assembly of E. chinense was performed by Platanus (PLAT form for Assembling Nucleotide Sequences, v1.2.4)26. Contigs were constructed from the paired-end reads, then scaffolded and gap-closed using both paired-end and mate-pair sequences with SOAPdenovo227. To avoid potential contamination from bacterial DNA, the trimmed reads with high mapping rate against bacteria sequences were removed using a BLAST search against the NCBI bacterial genome database. In the end, the E. chinense assembled draft genome was 949.470 Mb in size with 10,059 scaffolds and an N50 of 1.465 Mb (Table 2).
Repetitive sequences, gene annotation, and comparative genomic analysis
A de novo repeat library was generated by RepeatModeler v2.0.228, and repetitive sequences were identified and masked using RepeatMasker v4.1.229. Approximately 37.05% (352 Mb) of the assembled sequences of E. chinense were identified as repetitive sequences. Excluding the unclassified repetitive sequences (25.62%) representing the largest component in repetitive sequences, DNA transposons were the most abundant (2.42%), followed by the LINEs (2.09%), the SINEs (1.33%), and the long terminal repeat (LTR) elements (0.90%) (Table 3). Repetitive sequence composition varied greatly among the four gastropod species compared, with LINEs (long-interspersed nuclear elements) being the most conspicuously variable repetitive elements, ranging from 2.09% (E. chinense) to 28.92% (A. fulica) (Fig. 4b).
After excluding repetitive sequences, gene models were predicted based on a combination of homology-based and ab initio gene prediction approaches. For homology-based prediction, the E. chinense assembled genome was compared to nine metazoan species, including three non-mollusk species, from NCBI (A. californica, B. glabrata, Crassostrea gigas, Lottia gigantea, Mytilus galloprovincialis, Octopus bimaculoides, Nematostella vectensis, Xenopus tropicalis, and Homo sapiens) using the TBLASTN search. Genewise v2.4.130 was used to infer gene structure based on the TBLASTN results. The transcriptome data was aligned to the assembled genome by Hisat231, and de novo assembled by Trinity v2.4.032 for ab initio gene model prediction. Hint files were generated by BLAT33 and PASA and incorporated into AUGUSTUS34 and GeneMark-ES35. EvidenceModeler combined gene prediction results and provided a consensus gene model36, identifying 37,866 genes in the assembled E. chinense genome (Table 4). Functional annotation of the predicted proteins was conducted against the NCBI NR database, the UniProtKB/Swiss-Prot database, Gene Ontology (GO), the KEGG pathway, and InterProscan. Of these identified genes, 77.40% (29,307) were assigned at least once to the databases (Table 4). For comparative genomic analysis, protein sequences from E. chinense and three other gastropod species inhabiting different habitats (A. californica [marine], B. glabrata [freshwater], A. fulica [terrestrial]) were compared. OrthoVenn237, a web-based tool, was used with default parameter settings to search orthologous gene clusters and GO term enrichment, except for ortholog clustering with an e-value cutoff set to 1e-5.
Population genomic analysis
To investigate the genetic diversity and genetic stratification of E. chinense populations, the whole genome was re-sequenced at ~30 X coverage for each of 14 individuals sampled from three countries covering their native range (Japan, China, and Korea). Re-sequenced reads (Supplementary Table 2) were aligned to the reference genome using BWA-mem v0.7.1719. The reads that mapped properly in pairs were retained using the option “−f 0 × 0003” and unmapped reads were filtered with “−F 0 × 0004” in samtools view (v1.9)38. PCR duplicates were removed using Picard MarkDuplicates v2.27.1, and low-quality reads (Q < 30) were filtered using samtools view. Variants were called and filtered using the Genome Analysis Toolkit (GATK) v3.8.1039. All sites for each individual were called by GATK HaplotypeCaller, and these per-individual gVCF files were combined into one by GATK CombineGVCFs. Then, variant sites were called by GATK GenotypeGVCFs. The biallelic SNPs with the Phred-scaled quality score ≥ 30 were kept (GATK SelectVariants), and low-quality SNPs were filtered out using GATK VariantFiltration with the following threshold; “DP < 136.0 || DP > 3400.0 || QD < 2.0 || SOR > 3.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < –12.5 || ReadPosRanksum < −8.0 || ExcessHet > 10.0”. For this curated set of biallelic SNPs, another round of quality control was performed to guarantee the quality of individual genotypes. Individual genotypes were assigned as missing if the ratio of the highest genotype likelihood value to the sum of three genotype likelihoods was less than 0.99. Next, SNPs that were missing at least once in any individual were filtered out, producing a total of 36,453,320 SNPs. For most population genetic analyses, variants specific to a single individual were excluded by removing variants with (i) a minor allele count of 1 or less and (ii) doubletons with one individual homozygous for the minor allele. In the end, 18,260,324 SNPs were obtained in this variant set (18 Mb SNPs dataset). The genome coverage was estimated using QualiMap v2.2140. The fixation index (Fst) was calculated by vcftools v0.1.1641 with the Weir & Cockerham estimator42. Principal component analysis (PCA) was performed on the 18 Mb SNPs dataset using smartPCA v18140 in the EIGENSOFT package v8.0.043.
To estimate the demographic history of E. chinense populations, we used the multiple sequentially Markovian coalescent model (MSMC2 v2.1.1)44 based on unphased data. Input multihetsep files were generated from scaffolds larger than 1 Mb, which account for about 64% of the reference genome sequences with default parameters, using the splitfa and gen_mask programs in the SNPable package (https://lh3lh3.users.sourceforge.net/snpable.shtml) and makemappabilityMask.py, bamCaller.py, and generate_multihetsep.py scripts from the MSMC-Tools package implemented in MSMC244. Then, MSMC2 was performed by pairing two haplotypes sampled from the same individual, with a default time segment parameter. To scale population parameters, we used a mutation rate estimated from Acanthodoris spp. (1.6 × 10−9 substitutions/site/generation)45 belonging to the Gastropoda. The generation time of E. chinense was set as 2 years, inferred from the life span of a closely related species, Melampus bidentatus46.
Data Records
All DNA and RNA sequenced datasets used for genome assembly and annotation have been deposited in the NCBI Sequence Read Archive with accession numbers SRR18670280–SRR1867028447,48,49,50,51, and SRR18693111–SRR1869311752,53,54,55,56,57,58 under BioProject PRJNA824186 (DNA) and PRJNA824985 (RNA), respectively. The re-sequenced Illumina datasets used for the population genomic analyses were also deposited in the NCBI Sequence Read Archive with accession numbers SRR25445169–SRR2544518259,60,61,62,63,64,65,66,67,68,69,70,71,72 under BioProject PRJNA999501. The assembled genome was deposited in the NCBI with GenBank accession number JAWQUT00000000073. The assembled genome, predicted genes, functional annotation for comparative genomic analysis, and the BAM files and SNP data file used for population genomic analysis are available in the figshare repository, respectively74,75.
Technical Validation
To assess the completeness of the E. chinense genome assembly, filtered Illumina reads were first mapped to the assembly using BWA v0.7.17. The mapping rate of the Illumina reads was calculated with samtools flagstat (samtools v1.11) to be 97.71%. Second, QUAST v5.0.276 was performed to check the assembly composition, and it was found that scaffolds longer than 50 Kb accounted for 95.4% of the total genome length (Fig. 5a). Third, genome completeness was assessed using Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis with BUSCO v4.1.477. The analysis was performed based on near-universal single-copy orthologs of Eukaryota, Metazoa, and Mollusca datasets (odb10) and identified 96.86% complete BUSCOs based on Eukaryota core genes, showing a high BUSCO completeness with a very low duplication rate (Fig. 5b). Finally, the assembled genome was validated by comparing it with the trimmed Illumina reads using KAT v2.4.278. The KAT completeness was 54.36%, and comparison plot of k-mer spectra copy number indicated a unique haplotype genome (Fig. 5c; in red) with very low levels of duplicates (Fig. 5c; in purple). These results indicate that the genome assembly successfully collapsed diploid genome sequences to haploid genome assembly.
Code availability
Default parameters were employed if no detailed parameters were mentioned below.
(1) Trimmomatic v0.36: phred33, LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, MINLEN:36
(2) Jellyfish v2.3.0: −C −m 21
(3) GenomeScope v2: k-mer length 21, ploidy 2
(4) Population genomic analyses: All bash command lines and scripts are available at the GitHub repository: https://github.com/CWJeongLab/Ellobium, which includes detailed parameters used for population genomic analyses.
References
Gomes-dos-Santos, A., Lopes-Lima, M., Castro, L. F. C. & Froufe, E. Molluscan genomics: the road so far and the way forward. Hydrobiologia 847, 1705–1726, https://doi.org/10.1007/s10750-019-04111-1 (2020).
Lan, Y. et al. Hologenome analysis reveals dual symbiosis in the deep-sea hydrothermal vent snail Gigantopelta aegis. Nat. Commun. 12, 1165, https://doi.org/10.1038/s41467-021-21450-7 (2021).
Sun, Y. et al. Genomic signatures supporting the symbiosis and formation of chitinous tube in the deep-sea tubeworm Paraescarpia echinospica. Mol. Biol. Evol. 38, 4116–4134, https://doi.org/10.1093/molbev/msab203 (2021).
Pfeiffer, L. Synopsis auriculaceorum. Malakozoologische Blatter 1, 145–156 (1854).
Walthew, G. The distribution of mangrove-associated gastropod snails in Hong Kong. Hydrobiologia 295, 335–342, https://doi.org/10.1007/BF00029140 (1995).
Lee S. P. Red Data Book of Endangered Mollusks in Korea. Vol. 6. Report No. 11-1480592-000409-01 (National Institute of Biological Resources, 2012).
Japanese Red List. Red Data Book and Red List 2020. (Japanese Ministry of the Environment, Government of Japan, 2020).
Croghan, P. C. Osmotic regulation and the evolution of brackish- and fresh-water faunas. J. Geol. Soc. 140, 39–46, https://doi.org/10.1144/gsjgs.140.1.0039 (1983).
Kameda, Y. & Kato, M. Terrestrial invasion of pomatiopsid gastropods in the heavy-snow region of the Japanese Archipelago. BMC Evol. Biol. 11, 118, https://doi.org/10.1186/1471-2148-11-118 (2011).
Whitfield, A. K., Elliott, M., Basset, A., Blaber, S. J. M. & West, R. J. Paradigms in estuarine ecology - A review of the Remane diagram with a suggested revised model for estuaries. Estuar. Coast. Shelf Sci. 97, 78–90, https://doi.org/10.1016/j.ecss.2011.11.026 (2012).
Kirchhoff, K. N., Hauffe, T., Stelbrink, B., Albrecht, C. & Wilke, T. Evolutionary bottlenecks in brackish water habitats drive the colonization of fresh water by stingrays. J. Evol. Biol. 30, 1576–1591, https://doi.org/10.1111/jeb.13128 (2017).
Roberts, C. M. et al. Marine biodiversity hotspots and conservation priorities for tropical reefs. Science 295, 1280–1284, https://doi.org/10.1126/science.1067728 (2002).
Renema, W. et al. Hopping hotspots: global shifts in marine biodiversity. Science 321, 654–657, https://doi.org/10.1126/science.1155674 (2008).
Williams, S. T. Origins and diversification of Indo-West Pacific marine fauna: evolutionary history and biogeography of turban shells (Gastropoda, Turbinidae). Biol. J. Linn. Soc. 92, 573–592, https://doi.org/10.1111/j.1095-8312.2007.00854.x (2007).
Sanciangco, J. C., Carpenter, K. E., Etnoyer, P. J. & Moretzsohn, F. Habitat availability and heterogeneity and the Indo-Pacific warm pool as predictors of marine species richness in the tropical Indo-Pacific. PLoS One 8, e56245, https://doi.org/10.1371/journal.pone.0056245 (2013).
Carpenter, K. E. et al. Comparative phylogeography of the coral triangle and implications for marine management. J. Mar. Biol. 2011, 1–14, https://doi.org/10.1155/2011/396982 (2011).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120, https://doi.org/10.1093/bioinformatics/btu170 (2014).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257, https://doi.org/10.1186/s13059-019-1891-0 (2019).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595, https://doi.org/10.1093/bioinformatics/btp698 (2010).
Andrews, S. FastQC: a quality control tool for high throughput sequence data. Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/fastqc (2010).
Krueger, F. TrimGalore: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files. Babraham Bioinformatics. https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (2015).
Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18, 810–820, https://doi.org/10.1101/gr.7337908 (2008).
Marçais, G. & Kingsford, C. A. fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432, https://doi.org/10.1038/s41467-020-14998-3 (2020).
Asalone, K. C. et al. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput Biol 16, https://doi.org/10.1371/journal.pcbi.1008104 (2020).
Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24, 1384–1395, https://doi.org/10.1101/gr.170720.113 (2014).
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18, https://doi.org/10.1186/2047-217X-1-18 (2012).
Smit, A. F. & Hubley, R. RepeatModeler http://www.repeatmasker.org/RepeatModeler/ (2008–2015).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0 http://www.repeatmasker.org. RMDownload.html (2013).
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995, https://doi.org/10.1101/gr.1865504 (2004).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652, https://doi.org/10.1038/nbt.1883 (2011).
Kent, W. J. BLAT —the BLAST-like alignment tool. Genome Res. 12, 656–664, https://doi.org/10.1101/gr.229202 (2002).
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–467, https://doi.org/10.1093/nar/gki458 (2005).
Besemer, J. & Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 33, W451–454, https://doi.org/10.1093/nar/gki487 (2005).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Xu, L. et al. OrthoVenn2: a web server for whole-genome comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 47, W52–W58, https://doi.org/10.1093/nar/gkz333 (2019).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, https://doi.org/10.1093/bioinformatics/btp352 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303, https://doi.org/10.1101/gr.107524.110 (2010).
Okonechnikov, K., Conesa, A. & Garcia-Alcalde, F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32, 292–294, https://doi.org/10.1093/bioinformatics/btv566 (2016).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158, https://doi.org/10.1093/bioinformatics/btr330 (2011).
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370, https://doi.org/10.1111/j.1558-5646.1984.tb05657.x (1984).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190, https://doi.org/10.1086/519795 (2006).
Schiffels, S. & Wang, K. MSMC and MSMC2: the multiple sequentially markovian coalescent. Methods Mol. Biol. 2090, 147–165, https://doi.org/10.1007/978-1-0716-0199-0_20 (2020).
Allio, R., Donega, S., Galtier, N. & Nabholz, B. Large variation in the ratio of mitochondrial to nuclear mutation rate across animals: implications for genetic diversity and the use of mitochondrial DNA as a molecular marker. Mol. Biol. Evol. 34, 2762–2772, https://doi.org/10.1093/molbev/msx197 (2017).
Apley, M. Field studies on life history, gonadal cycle and reproductive periodicity in Melampus bidentatus (Pulmonata: Ellobiidae). Malacologia 10, 381–397 (1970).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18670280 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18670281 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18670282 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18670283 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18670284 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693111 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693112 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693113 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693114 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693115 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693116 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18693117 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445169 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445170 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445171 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445172 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445173 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445174 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445175 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445176 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445177 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445178 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445179 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445180 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445181 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25445182 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc:JAWQUT000000000 (2023).
Kwak, H. et al. Ellobium chinense Genome assembly and annotation. figshare https://doi.org/10.6084/m9.figshare.23585247 (2023).
Kwak, H. et al. Population genomic analysis of Ellobium chinense. figshare https://doi.org/10.6084/m9.figshare.23771127 (2023).
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. & Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142–i150, https://doi.org/10.1093/bioinformatics/bty266 (2018).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J. & Clavijo, B. J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 33, 574–576 (2017).
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2020R1A2C2005393) and the management of Marine Fishery Bio-resources Center (2024) funded by the National Marine Biodiversity Institute of Korea (MABIK).
Author information
Authors and Affiliations
Contributions
Conception and study design: J.K.P., Laboratory experiments: J.P., Y.K., Sample collection: J.P., T.N., Y.W.D., Data analysis and interpretation: H.K., D.L., Y.K., H.Y., D.K., C.J., J.K.P., Drafting the manuscript: J.K.P., H.K., D.L., H.Y., Y.W.D., C.J.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kwak, H., Lee, D., Kim, Y. et al. Genome assembly and population genomic data of a pulmonate snail Ellobium chinense. Sci Data 11, 31 (2024). https://doi.org/10.1038/s41597-023-02851-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02851-3
- Springer Nature Limited