Chromosome-level genome assembly of the freshwater mussel Sinosolenaia oleivora (Heude, 1877)

Ma, Xueyan; Jin, Wu; Chen, Wanwen; Liu, Qian; Jiang, Haizhou; Zhou, Yanfeng; Xu, Pao; Wen, Haibo; Xu, Dongpo

doi:10.1038/s41597-024-03451-5

Chromosome-level genome assembly of the freshwater mussel Sinosolenaia oleivora (Heude, 1877)

Data Descriptor
Open access
Published: 08 June 2024

Volume 11, article number 606, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly of the freshwater mussel Sinosolenaia oleivora (Heude, 1877)

Download PDF

Xueyan Ma^1,2^na1,
Wu Jin^1,2,3^na1,
Wanwen Chen^1,2,
Qian Liu^1,3,
Haizhou Jiang^1,3,
Yanfeng Zhou^1,3,
Pao Xu ORCID: orcid.org/0000-0001-7007-8530^1,2,3,
Haibo Wen^1,2,3 &
…
Dongpo Xu^1,2,3

728 Accesses
2 Altmetric
Explore all metrics

Abstract

Sinosolenaia oleivora (Bivalve, Unionida, Unionidae), is a near-endangered edible mussel. In 2022, it was selected by the Ministry of Agriculture and Rural Affairs as a top-ten aquatic germplasm resource, with potential for industrial development. Using Illumina, PacBio, and Hi-C technology, a high-quality chromosome-level genome of S. oleivora was assembled. The assembled S. oleivora genome spanned 2052.29 Mb with a contig N50 size of 20.36 Mb and a scaffold N50 size of 103.57 Mb. The 302 contigs, accounting for 98.41% of the total assembled genome, were anchored into 19 chromosomes using Hi-C scaffolding. A total of 1171.78 Mb repeat sequences were annotated and 22,971 protein-coding genes were predicted. Compared with the nearest ancestor, a total of 603 expanded and 1767 contracted gene families were found. This study provides important genomic resources for conservation, evolutionary research, and genetic improvements of many economic traits like growth performance.

A Chromosome-level genome assembly of giant river prawn (Macrobrachium rosenbergii)

Article Open access 28 August 2024

Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra

Article Open access 09 May 2024

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

Article Open access 22 August 2024

Background & Summary

Freshwater mussels (Unionoida) represent the most diverse order of freshwater bivalves¹ and are found in all regions of the world except the Antarctic². They not only play an important role in the food web structure and material cycle of ecosystems^3,4 but also have high economic value, such as for food⁵, pearl cultivation⁶, and anti-tumor ingredients⁷. They also have been used as an indicator for biological monitoring and evaluation of heavy metal pollution⁸.

Freshwater mussels are benthic filter feeders⁹. Suitable substrate, water quality, and food are important factors for the survival and reproduction of mussels. In recent years, human activities, such as river diversion, chemical pollution, and overfishing have caused serious damage to mussel habitats¹⁰. The developmental life history of most mussels involves a parasitic larval stage (glochidia) that must attach to vertebrate hosts (primarily fish) to complete metamorphosis¹¹ which increases their vulnerability². The International Union for Conservation of Nature (IUCN) Red List reports that 173 species are extinct, endangered, or threatened, 99 are vulnerable or nearly threatened, and 84 are unclassified because data are deficient¹².

There are 57 endemic species in China¹³, and eight species have now been listed as Grade II national protected animals¹⁴. The biodiversity and population size of freshwater mussels in large water bodies such as the Yangtze River¹⁵ and the Songhua River¹⁶ have shown a significant decline. S.oleivora is endemic to China. In 2022, S. oleivora was identified as one of the top ten characteristic aquatic germplasm resources by the Ministry of Agriculture and Rural Affairs. S. oleivora has fresh and tender meat, delicious taste, and high nutrient content¹⁷. In Fuyang of Anhui Province, Tianmen of Hubei Province, and other places, S. oleivora is a famous delicacy with a high economic value, and it is called “abalone in Huaihe River.” It once ranged an extensive distribution—in five freshwater lakes and the tributaries of the Yangtze and Huaihe Rivers¹⁸. Habitat fragmentation and other human activities (e.g., overfishing) have resulted in their endangerment¹⁹. Tianmen in Hubei Province and Fuyang in Anhui Province has established the S. oleivora Nature Reserve to support this ecologically and economically vital resource.

Genomic data is considered fundamental for revealing biological characteristics, inferring evolutionary mechanisms, and promoting effective conservation²⁰. To date, only seven freshwater mussel species have had their genomes sequenced (Table S1, Supplementary File)^{21,22,23,24,25,26,27,28}, and only one of these is a Chinese species²⁷. The whole genome of S. oleivora is lacking. We applied multiple sequencing technologies, including Illumina Nova 6000 sequencing, PacBio long-read sequencing (PacBio), and high-throughput chromosome conformation capture (Hi-C) technology to complete genome sequencing and assembly. Three methods, including de novo gene prediction, homolog, and RNA-Seq-based prediction, were used to perform genomic annotation. In addition, the comparative genomics analysis of S. oleivora and 10 other distantly related species was performed. This study provides important genomic resources for conservation and evolutionary research and guides genetic trait improvements (e.g., growth).

Methods

Sample collection and sequencing

One female S. oleivora was sampled from the national-level protection zone of the aquatic germplasm resource of S. oleivora in the Fuyang Division of Huaihe River (32.428725°N, 115.600287°E). Total DNA was extracted from the adductor muscle of S. oleivora using the DNeasy Blood and Tissue Kit (Qiagen, Germany) for genome sequencing. For short-read sequencing, Covaris M220 was used to break DNA into 300–350 bp fragments. DNA library preparation was completed by terminal repair, an A-tail addition, sequencing junction addition, DNA purification, and bridge PCR. Based on a paired-end(PE) sequencing strategy. These libraries were sequenced on the Illumina NovaSeq Nova 6000 platform. For long-read sequencing, according to the PacBio standard protocol, a PacBio HiFi library was generated using an SMRTbell Template Prep Kit 2.0 (Pacific Biosciences, USA) and sequenced using the PacBio Sequel II platform. A Hi-C library was prepared following the Hi-C library protocol²⁹ and sequenced using the Illumina Novaseq 6000 platform. Total RNA was extracted from the adductor muscle of S. oleivora using TRIzol reagent (Invitrogen, MA, USA) for transcriptome sequencing. The RNA-seq library was generated using NEBNext^®Ultra^TM RNA Library Prep Kit (NEB, USA) for PE sequencing, and short reads were produced on the Illumina NovaSeq 6000 platform. A total of 192.1 Gb of Illumina data, 63.2 Gb of PacBio data, 191.8 Gb of Hi-C data, and 5.6 Gb RNA-Seq data were obtained (Fig. 1, Table 1).

Table 1 Statistics for the sequencing data of the Sinosolenaia oleivora genome.

Full size table

Estimation of genome size

A K-mer-based method³⁰ was applied to estimate the genome size, heterozygosity, and repeat content in S. oleivora. We performed a k-mer (k = 17) frequency distribution analysis using 192.1 Gb of Illumina clean data (Fig. 2). A total of 153,573,141,235 k-mers with a depth of 73 was obtained. The genome size was 2,025 Mb, the heterozygosity ratio was 0.78%, and the repeat sequence ratio was 61.37%.

Genome assembly

PacBio Hi-Fi reads were assembled using Hifiasm(v. 0.16.1-r375) software³¹ with the default parameters. Redundant sequences were filtered out using Purge_Haplotigs (v1.0.4) software³² with the parameter of cutoff “-a 70 -j 80 -d 200.” Based on PacBio sequencing data, the genome length was 2090.51 Mb. The number of contigs was 302 and N50 reached 23.99 Mb. The max length was 88.20 Mb and the GC content was 34.38% (Table 2).

Table 2 Gene assembly results of Sinosolenaia oleivora.

Full size table

Hi-C-assisted chromosome-level assembly

To assemble the chromosome-level genome, Hi-C sequencing data were mapped and sorted against the draft genome assembly with Juicer v1.6 software³³. The contigs were linked to 19 distinct chromosomes by 3D-DNA (v. 180922)³⁴. Based on chromosome interactions, the contig orientation was corrected and suspicious fragments were removed from the contigs in the Juicebox software³⁵. The genome contigs were further anchored and oriented to chromosomes by Hi-C scaffolding. The Hi-C library generated 191.8.2 Gb of clean data, with 55.56% valid pairs. A total of 302 contigs, accounting for 98.41% of the total assembled genome, were anchored into 19 chromosomes. The 19 pseudo-chromosomes were clearly distinguished from the Hi-C heatmap with strong pseudo-chromosome interactions confirming high-quality Hi-C assembly (Figs. 3, 4). This resulted in a high-quality genome of 2052.30 Mb, with a contig N50 of 20.36 Mb and scaffold N50 of 103.57 Mb (Table 3).

Table 3 Statistics of Hi-C assembly results of Sinosolenaia oleivora.

Full size table

Repeat annotation, gene prediction, and gene functional annotation

Combined homologous and de novo prediction methods, repeat elements of the S. oleivora genome, were annotated. For homologous alignment, we used RepeatMasker (v4.1.2-p1)³⁶ and Repeat-proteinmask (v4.1.0)³⁷ to annotate the transposable elements (TEs) by comparing sequences to the Repbase database³⁸. For de novo prediction, Tandem Repeat Finder (TRF) (version 4.09)³⁹ was executed to detect the tandem repeat elements based on sequence features. LTR_FINDER (v. 1.07)⁴⁰ and RepeatModeler (v. 2.0.3)³⁶ were used to construct a repeat library. The library was then used to detect repetitive sequences by RepeatMasker (v. 4.1.2-p1)³⁶. After eliminating redundancy, we obtained the final annotated repeat sets. A total of 1171.79 Mb repeat sequences were annotated accounting for 56.05% of the total genome sequence (Table 4). The major repetitive elements were DNA (15.74%), long interspersed nuclear elements (LINEs, 8.95%), and long terminal repeats (LTRs, 4.98%) (Table 5).

Table 4 Statistics of repetitive sequences in the Sinosolenaia oleivora genome.

Full size table

Table 5 Statistics of transposable elements for the Sinosolenaia oleivora genome.

Full size table

The genome sequence was soft-masked based on repetitive element predictions and then used for protein-coding gene prediction. We employed three methods for gene prediction. For homology-based annotation, the protein sequences of Mizuhopecten yessoensis, Crassostrea gigas, Crassostrea virginica, and Mytilus galloprovincialis were downloaded from NCBI and aligned to the genome sequence using BLAST(E-value: 1e-5)⁴¹. Homologous sequences were then aligned to corresponding matching proteins using GeneWise (v. wise2-4-1)⁴². For the RNA-seq-based annotation, transcriptomic data were assembled using Trinity v2.11⁴³, and BLAST(E-value: 1e-5)⁴¹ to align transcriptome to the genome. For de novo prediction, Augustus(v3.4.0)⁴⁴, and Genscan (version1.0)⁴⁵ were used to generate de novo-predicted gene sets. Maker (v2.31.10)⁴⁶ was used to integrate the results from these methods to produce the final gene set. The genome sequence was also aligned to the homologous single-copy gene database of Benchmarking Universal Single-Copy Orthologs(BUSCO)⁴⁷. MAKER (version 2.31.10)⁴⁸ and HiCESAP (Wuhan Gooalgene Co., Ltd., https://www.gooalgene.com/) were employed to merge all the data and filter out redundancies. The combination of de novo and homolog-based methods predicted 22,971 protein-coding genes (Table 6). The predicted genes were functionally annotated based on exogenous protein databases including SwissProt, InterPro, TrEMBL, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO). A total of 19,229 genes, accounting for 87.52% of all predicted genes, were annotated using public databases (Table 7).

Table 6 Statistics of gene predictions in the Sinosolenaia oleivora genome.

Full size table

Table 7 Functional annotations of predicted genes.

Full size table

Based on Rfam⁴⁹ and miRbase⁵⁰ databases, we used tRNAscan-SE (v1.3.1)⁵¹ to identify transfer RNAs (tRNAs), and Infernal(v1.1.2)⁵² to annotate other ncRNAs, including microRNAs (miRNAs) and small nuclear RNAs (snRNAs), and BLAST(E-value: 1e-5)⁴¹ was used to obtain ribosomal RNA (rRNA) to predict noncoding RNA (ncRNA) in the genome of S. oleivora. For non-coding RNA predictions, we successfully annotated 119 miRNAs, 2643 tRNAs, 366 rRNAs, and 867 snRNAs, with average lengths of 98, 74, 254, and 168 bp, respectively (Table 8).

Table 8 Non-coding RNA annotation of the Sinosolenaia oleivora genome.

Full size table

Comparative genomic analyses

To clarify the evolutionary position of S. oleivora, OrthoMCL (Verison v2.0.9)⁵³ with the parameter “-l 1.5” was used to detect orthologous groups by retrieving the protein sequences of Mizuhopecten yessoensis, Biomphalaria glabrata, Crassostrea gigas, C. virginica, Lingula anatina, Lottia gigantea, Mercenaria mercenaria, Ostrea edulis, Pecten maximus, and Pomacea canaliculate. Sequence alignment was performed by MUSCLE(v5)⁵⁴ for single-copy orthologous genes. Basing on this result, KaKs Calculator(v2.0)⁵⁵ was utilized to fetch Kolmogorov-Smirnov(Ks) with default parameters. The S. oleivora genome shared 82,067 gene families and 17,699 single-copy genes with ten other mollusk species. The S. oleivora genome contained 21971 genes clustered into 18,312 gene families and 2,273 unique families (Table 9). The phylogenetic tree was constructed using the “-f a -N 100 -m GTRGAMMA” parameter of RAxML (version 8.2.12)⁵⁶ based on multiple sequence alignment. Divergence times were estimated using the MCMCtree (v4.9) program in PAML (v4.9)⁵⁷ with clock = 3 and model = 0 parameters. The divergence time of L. anatina and C. gigas 619.3 (582.0–689.2 MYA); B. glabrata and C. gigas 544.1 (520.2–567.9 MYA); P. canaliculata and B. glabrata 444.6 (377.0–490.4 MYA) from TimeTree database⁵⁸ (http://www.timetree.org/) were used for calibration. Divergence time analysis showed that S. oleivora was closely related to M. mercenaria, with a divergence time of 516.7 (486.9–541.0) Mya (Fig. 5).

Table 9 Gene family clustering.

Full size table

CAFE^59,60 was applied for gene expansion and contraction analysis. Compared with the nearest ancestor, a total of 603 expanded and 1767 contracted gene families were found in S. oleivora (Fig. 6). There were 69 significantly expanded (984 genes) and 83 significantly contracted (118 genes) gene families (p < 0.05). We then performed GO and KEGG enrichment analysis and terms with enrichment-adjusted p-values ≤ 0.05 were chosen for further analysis. The program CODEML (v4.9)⁵⁷ of PAML was used for positive selection gene (PSG) identification. PSGs were also chosen for enrichment analysis. A total of 552 protein-coding genes were positively selected in S. oleivora (FDR < 0.05, Table 10). GO and KEGG enrichment of positively selected genes focused on the DNA binding, nucleolus, and protein processing in the endoplasmic reticulum, ribosome, and mTOR signaling pathway (Figs. 7, 8).

Table 10 Protein-coding genes under positive selection in Sinosolenaia oleivora (FDR < 0.05).

Full size table

Data Records

All sequencing data from three sequencing platforms have been uploaded to the NCBI SRA database (transcriptomic sequencing data: SRR28352171⁶¹, genomic Illumina sequencing data: SRR26551344⁶², genomic PacBio sequencing data: SRR28406055⁶³, Hi-C sequencing data: SRR28406264⁶⁴). The final chromosome-level assembled genome file has been uploaded to the GenBank database under the accession JBDPLI000000000⁶⁵. Genome annotation files have been uploaded to the Figshare database⁶⁶.

Technical Validation

Evaluating the quality of the DNA and RNA

The quality and concentration of extracted DNA/RNA were assessed using NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, San Jose, CA, USA) and Qubit 3.0 Fluorometer (Thermo Fisher Scientific, San Jose, CA, USA)(OD260/280 and OD260/230) before the genome sequencing and their integrity was further evaluated on 1% agarose gel stained with ethidium bromide.

Evaluating the quality of the genome assembly

We evaluated the genome assembly quality through the following measures: (i) Confirmation that the assembly result belongs to the target species was made by software BLAST(E-value: 1e-5)²⁶ comparison to the NCBI nucleotide database (NT library)(Table S2, S3, Supplementary File);(ii) Illumina short reads and PacBio reads were mapped onto the assembled genome using BWA (v. 0.7.17-r1188)⁶⁷ and Minimap2⁶⁸ to evaluate the completeness and accuracy of the genome. The read-mapping rates were 99.27% and 99.74%, and genome coverage rates were 99.7% and 99.98% for the Illumina and PacBio reads, respectively (Table 11), indicating high mapping efficiency and comprehensive coverage. (iii) BUSCO (v5.2.3)³² analysis was conducted to evaluate the assembly quality based on the mollusca_odb10 database. Using BUSCO analysis, 100% (5295/5295) of complete BUSCO genes were found in the assembly, including 88.6% complete BUSCOs, 85.8% complete and single-copy BUSCOs, and 2.8% complete and duplicated BUSCOs (Table 12).

Table 11 The alignment of Illumina and PacBio reads to Sinosolenaia oleivora.

Full size table

Table 12 BUSCO analysis results of the Sinosolenaia oleivora genome.

Full size table

Evaluating the quality of the genome annotation

BUSCO (v5.2.2)³² was used to evaluate the completeness of the genome annotation. The reference BUSCO database was mollusca_odb10. Among the 5295 BUSCO groups searched, 4575 (86.4%) of the complete BUSCOs were detected in the genome annotations (Table 12).

Code availability

The manuscript did not use custom code to generate or process the data described.

References

Bieler, R., Carter, J. G. & Coan, E. V. Classification of bivalve families. Pp. 113–133, in: Bouchet, P. & Rocroi, J.P. (2010), Nomenclator of Bivalve Families. Malacologia. 52, 1–184 (2010).
Article Google Scholar
Bogan, A. E. Global diversity of freshwater mussels (Mollusca, Bivalvia) in freshwater. Hydrobiologia. 595, 139–147 (2008).
Article Google Scholar
Nedeau, E. J. et al. Freshwater Mussels of the Pacific Northwest (second edition) (The Xerces Society in Portland, 2009).
Aldridge, D. C. The morphology, growth and reproduction of Unionidae (Bivalvia) in a Fenland waterway. J Mollus Stud. 65, 47–60 (1999).
Article Google Scholar
Wen, H. B. Study on basic biological characteristics and metamorphosis and development of hook larva of purple black winged mussel. Nanjing Agricultural University (2016).
Fukushima, E. et al. A xenograft mantle transplantation technique for producing a novel pearl in an akoya oyster host. Mar Biotechnol. 16, 10–16 (2014).
Article CAS Google Scholar
Liu, J. et al. Antitumor activities of liposome incorporated aqueous extracts of Anodonta woodiana (Lea, 1834). Eur Food Res Technol. 227, 919–924 (2008).
Article CAS Google Scholar
Yang, J., Harino, H., Liu, H. B. & Miyazaki, N. Monitoring the organotin contamination in the Taihu Lake of China by Bivalve mussel Anodonta woodiana. B Environ Contam Tox. 81, 164–168 (2008).
Article CAS Google Scholar
Fogelman, K. J., Stoeckel, J. A., Miller, J. M. & Helms, B. S. Feeding ecology of three freshwater mussel species (Family: Unionidae) in a North American lentic system. Hydrobiologia. 850, 385–397 (2022).
Article Google Scholar
Lopes-Lima, M. et al. Conservation of freshwater bivalves at the global scale: diversity, threats and research needs. Hydrobiologia. 810, 1–14 (2018).
Article Google Scholar
Barnhart, M. C., Haag, W. R. & Roston, W. N. Adaptations to host infection and larval parasitism in unionoida. J N Am Benthol Soc. 27, 370–394 (2008).
Article Google Scholar
International Union for Conservation of Nature (IUCN). The IUCN Ed List of Threatened Specie. Version 2023-1. https://www.iucnredlist.org.
Hu, Z. Q. Geographical distribution of endemic species of Chinese freshwater bivalves. Chin J Zool. 40, 80–83 (2005).
Google Scholar
Ministry of Forestry and Ministry of Agriculture, China. List of National Key Wildlife Species. China (2021).
Shu, F. Y., Wang, H. J., Pan, B. Z., Liu, X. Q. & Wang, H. Z. Assessment of species status of Mollusca in the mid-lower Yangtze Lakes. Acta Hydrobiol Sinica. 33, 1051–1058 (2009).
Article Google Scholar
Zhang, J. & Yu, H. X. Study on zoobenthos community structure and water quality assessment in Songhua River along Harbin city. J Aquacult. 22, 40–45 (2009).
CAS Google Scholar
Ma, X. Y. et al. Seasonal Variations of Nutrients and Mineral Elements in oleivora from Huaihe River. J Agron. 11, 90–94, 119. (2021).
Google Scholar
Liu, Y.Y., Zhang, W.Z. & Wang, Y.X. Economic Fauna of China: Freshwater Mollusks (Beijing Science Press, 1979).
Wen, H. B. Study of Germplasm of Major Economic Freshwater Mollusks of China (Nanjing Agricultural University, 2009).
Han, Z. et al. Chromosome-level genome assembly of burbot (Lota lota) provides insights into the evolutionary adaptations in freshwater. Mol Ecol Resour. 21, 2022–2033 (2021).
Article CAS PubMed Google Scholar
Renaut, S. et al. Genome survey of the freshwater mussel Venustaconcha ellipsiformis (Bivalvia: Unionida) using a hybrid de novo assembly approach. Genome Biol Evol. 10, 1637–1646 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gomes-dos-Santos, A. et al. The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). DNA Res. 28, dsab002 (2021).
Article PubMed PubMed Central Google Scholar
Gomes-dos-Santos, A. et al. The Crown Pearl V2: an improved genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). GigaByte. 1–14 (2023a).
Gomes-dos-Santos, A. et al. PacBio Hi-Fi genome assembly of the Iberian dolphin freshwater mussel Unio delphinus Spengler, 1793. Sci Data. 10, 340 (2023b).
Article CAS PubMed PubMed Central Google Scholar
Rogers, R. L. et al. Gene family amplification facilitates adaptation in freshwater unionid bivalve Megalonaias nervosa. Mol Ecol. 30, 1155–1173 (2021).
Article CAS PubMed Google Scholar
Smith, C. H. A high-quality reference genome for a parasitic bivalve with doubly Uniparental Inheritance (Bivalvia: Unionida). Genome Biol Evol. 13, evab029 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bai, Z. et al. Chromosome-level genome assembly of freshwater pearl mussel, Hyriopsis cumingii, provides insights into outstanding biomineralization ability. Authorea Preprints (2022).
Gomes-dos-Santos, A. et al. A PacBio Hi-Fi genome assembly of the Painter’s Mussel Unio pictorum (Linnaeus, 1758). Genome Biol Evol. 15, evad116 (2023c).
Article PubMed PubMed Central Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Liu, B. H. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant Biol. 35, 62–67 (2013).
Google Scholar
Cheng, H. Y., Concepcion, G. T., Feng, X. W., Zhang, H. W. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third- gen diploid genome assemblies. BMC Bioinformatics. 19, 460 (2018).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing LoopResolution Hi-c experiments. Cell Syst. 3, 95–98 (2016a).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016b).
Article CAS PubMed PubMed Central Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. Chapter 4, 4.10.11–14.10.14 (2009).
Google Scholar
Bairoch, A. & Apweiler, R. Te swiss-prot protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucl Acids Res. 35, 265–268 (2007).
Article Google Scholar
Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc. pdb.top17 (2007).
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mario, S., Rasmus, S., Stephan, W. & Burkhard, M. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 32, 309–312 (2004).
Article Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Carson, H. & Mark, Y. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).
Article Google Scholar
Manni, M., Berkeley, M. R., Mathieu, S., Simo, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cantarel, B. L., Korf, I., Robb, S. M. C., Parra, G. & Ross, E. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S., Bateman, A., Marshall, M. & Khanna, A. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2002).
Article Google Scholar
Kozomara, A. & Griffiths-Jones, S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39, 152–157 (2010).
Article Google Scholar
Chan, P. P. & Lowe, T. M. Trnascan-SE: Searching for tRNA genes in genomic sequences. Methods Mol Biol. 1962, 1–14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. GPB. 8, 77–80 (2010).
CAS PubMed PubMed Central Google Scholar
Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30, 1312–1313 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Article CAS PubMed Google Scholar
DeBie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 22, 1269–1271 (2006).
Article CAS Google Scholar
Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 36, 5516–5518 (2020).
Article CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28352171 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26551344 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28406055 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28406264 (2024).
Ma, X. Y. Solenia oleivora genome. GenBank https://identifiers.org/ncbi/insdc:JBDPLI000000000 (2024).
Ma, X. Y. Chromosomal-scale genome assembly and annotation of the Sinosolenaia oleivora. figshare https://doi.org/10.6084/m9.figshare.25458940 (2024).
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Central Public-interest Scientific Institution Basal Research Fund, CAFS [2023TD66; 2023TD65]; the Sinosolenaia oleivora Ecological Compensation Program (2023); the Central Public-interest Scientific Institution Basal Research Fund [2023JBFM12] and Anhui Fuyang City Seed Breeding, Propagation and Promotion Base Construction Project. The authors would like to thank Qi Liu in Wuhan OneMore-tech Co.,Ltd. for his assistance with bioinformatic analyses.

Author information

These authors contributed equally: Xueyan Ma, Wu Jin.

Authors and Affiliations

Key Laboratory of Integrated Rice-Fish Farming Ecology, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi, 214081, China
Xueyan Ma, Wu Jin, Wanwen Chen, Qian Liu, Haizhou Jiang, Yanfeng Zhou, Pao Xu, Haibo Wen & Dongpo Xu
Sino-US Cooperative Laboratory for Germplasm Conservation and Utilization of Freshwater Mollusks, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi, 214081, China
Xueyan Ma, Wu Jin, Wanwen Chen, Pao Xu, Haibo Wen & Dongpo Xu
Wuxi Fisheries College, Nanjing Agricultural University, Wuxi, 214081, China
Wu Jin, Qian Liu, Haizhou Jiang, Yanfeng Zhou, Pao Xu, Haibo Wen & Dongpo Xu

Authors

Xueyan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wanwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yanfeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Pao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Wen
View author publications
You can also search for this author in PubMed Google Scholar
Dongpo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Y.M. and H.B.W. designed the study. W.J., D.P.X., Q.L., W.W.C., H.Z.J., Y.F.Z. and P.X. collected the sequencing samples. X.Y.M. drafted the manuscript. D.P.X. and H.B.W. contributed to the revision of the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Haibo Wen or Dongpo Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ma, X., Jin, W., Chen, W. et al. Chromosome-level genome assembly of the freshwater mussel Sinosolenaia oleivora (Heude, 1877). Sci Data 11, 606 (2024). https://doi.org/10.1038/s41597-024-03451-5

Download citation

Received: 10 January 2024
Accepted: 31 May 2024
Published: 08 June 2024
DOI: https://doi.org/10.1038/s41597-024-03451-5
Springer Nature Limited

Chromosome-level genome assembly of the freshwater mussel Sinosolenaia oleivora (Heude, 1877)

Abstract

Similar content being viewed by others

A Chromosome-level genome assembly of giant river prawn (Macrobrachium rosenbergii)

Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

Background & Summary