Chromosome-level genome assembly of the silver pomfret Pampus argenteus

Wei, Jiehong; Xiao, Yongshuang; Liu, Jing; Herrera-Ulloa, Angel; Loh, Kar-Hoe; Xu, Kuidong

doi:10.1038/s41597-024-03070-0

Chromosome-level genome assembly of the silver pomfret Pampus argenteus

Data Descriptor
Open access
Published: 23 February 2024

Volume 11, article number 234, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly of the silver pomfret Pampus argenteus

Download PDF

Jiehong Wei^1,2^na1,
Yongshuang Xiao¹^na1,
Jing Liu¹,
Angel Herrera-Ulloa³,
Kar-Hoe Loh⁴ &
…
Kuidong Xu¹

1226 Accesses
1 Altmetric
Explore all metrics

Abstract

Pampus argenteus (Euphrasen, 1788) is one of the major fishery species in coastal China. Pampus argenteus has a highly specialized morphology, and its declining fishery resources have encouraged massive research efforts on its aquacultural biology. In this study, we reported the first high-quality chromosome-level genome of P. argenteus obtained by integrating Illumina, PacBio HiFi, and Hi-C sequencing techniques. The final size of the genome was 518.06 Mb, with contig and scaffold N50 values of 20.47 and 22.86 Mb, respectively. The sequences were anchored and oriented onto 24 pseudochromosomes based on Hi-C data corresponding to the 24-chromatid karyotype of P. argenteus. A colinear relationship was observed between the P. argenteus genome and that of a closely related species (Scomber japonicus). A total of 24,696 protein-coding genes were identified from the genome, 98.9% of which were complete BUSCOs. This report represents the first case of high-quality chromosome-level genome assembly for P. argenteus and can provide valuable information for future evolutionary, conservation, and aquacultural research.

Chromosome-level genome assembly of golden pompano (Trachinotus ovatus) in the family Carangidae

Article Open access 22 October 2019

Chromosome-level assembly of Triplophysa yarkandensis genome based on the single molecule real-time sequencing

Article Open access 05 January 2024

A chromosome-level genome assembly and evolutionary analysis of Coregonus ussuriensis Berg

Article Open access 18 July 2024

Background & Summary

Pampus argenteus (Euphrasen, 1788; Fishbase ID: 491), also known as the silver pomfret, is a commercially important fish in the Northwest Pacific area that is widely distributed throughout the South China Sea to coastal Japan, Korea, and Russia^1,2. It belongs to the family Stromateidae of the suborder Stromateoidei³, which was identified in Scombriformes according to a recent phylogenetic study⁴. This species is one of the major fishery species in coastal China, with harvests exceeding three million tons in 2016⁵. Overfishing and environmental changes have resulted in a noticeable decline in P. argenteus fishery resources in recent years^6,7. The aquaculture of P. argenteus has made substantial progress, which in some ways compensates for the decline in fishery resources^8,9. However, the industry is still facing many restrictions and issues owing to the high sensitivity of P. argenteus to injury and pathogenic infection during aquaculture and transportation¹⁰. Due to the medusivorous habit of P. argenteus¹¹, its aquaculture greatly relies on fish bait composed of jellyfish and minced fish meat. Using fish bait leads to higher costs in water quality control and risking outbreaks of harmful pathogens, which have become one of the major bottlenecks in P. argenteus aquaculture, necessitating substitution with better formulated feeds¹². However, the digestive and immune systems of P. argenteus are considered specialized for the digestion of jellyfish and tolerance of medusocongestin^13,14. The inclusion of jellyfish in an artificial diet can significantly improve the growth performance and survival rate of P. argenteus larvae and juveniles¹⁵. The impact of changing fish bait to formulated feed on P. argenteus at different growth stages still requires further clarification. Clarifying the genetic basis of the physiological process of P. argenteus, particularly those related to the immune response¹⁶, intestinal enzyme activities¹⁴, stress responses¹⁷, etc., is becoming increasingly important for the future prospects of the aquaculture industry. However, the genome of P. argenteus, which represents the foundation of physiological responses¹⁸, has not yet been completely sequenced.

In addition to its fishery importance, P. argenteus is considered one of the most advanced species within Stromateoidei¹⁹. The dorsal and anal fin spines of P. argenteus are reduced into small blades, with a pelvic fin absent from its abdominal region. Stromateoidei is distinct from other Actinopterygii by having a unique pharyngeal sac immediately behind the last gill arch, which functions to fragmentize food¹⁹. The saccular structure of P. argenteus, which primarily feeds on small crustaceans and medusae, is smaller, more elongated, and densely covered within elongated tooth-like papillae; additionally, this species ably adapts to shredding rubbery tissue of jellyfish¹⁹. The pharyngeal sac is believed to be a key innovation for stromateiods, while the specialized shape of pharyngeal sac in the genus Pampus might bring further advantages and lead to its broad success in the Indo-Pacific region¹⁹. Clarifying the genetic basis for the formation of the pharyngeal sac is crucial to understanding the evolution of the genus Pampus.

In this study, a high-quality chromosome-level genome assembly of P. argenteus was generated by integrating multiple sequencing technologies, including Illumina sequencing, PacBio circular consensus sequencing (CCS), and Hi-C techniques (Fig. 1). The final assembly size for the P. argenteus genome was 518.06 Mb, with 97.30% of the contigs anchored to 24 chromosomes (Table 1 & Fig. 2). The contig and scaffold N50 lengths for the genome were 20.47 and 22.86 Mb, respectively. The genome consisted of 13.45% repeated sequences and 17.18% nod-coding genes. A total of 24,696 protein-coding genes were predicted, 93.38% of which were functionally annotated. Compared to the Pampus genome reported by AlMomin et al.²⁰, the genome of P. argenteus generated herein was assembled into fewer and longer contigs and scaffolds (Table 1). More genes and repetitive regions were identified from this genome, with an average protein-coding gene length 7.5-fold greater than that of the previous version²⁰. These results suggested that the genome developed in this study has a much higher integrity and quality. The chromosome-level genome assembly of P. argenteus will provide valuable information for establishing effective molecular markers for future conservation and aquaculture goals. The genome also represents the first case of high-quality chromosome-level genome assembly for stromateoids; this information could be an important reference for whole-genome sequencing of its close relatives, and, foreseeably, could become one of the foundations for exploring the genomic evolution and phylogeny of the Stromateoidei.

Table 1 Comparison of the Pampus genome assemblies in this study and the study of AlMomin et al.²⁰.

Full size table

Methods

Sample collection

In 2021, a female P. argenteus specimen was caught from the wild for whole-genome sequencing using a fishing boat in Shengsi, Zhejiang Province, China. The specimen was identified based on the morphological descriptions of P. argenteus in Liu et al.³, who designated the P. argenteus neotype. Eye, muscle, ovary, heart and liver samples for DNA and RNA sequencing were isolated from the specimens immediately after they were caught. The samples were washed three times with phosphate-buffered saline (PBS), frozen in liquid nitrogen for three hours, and subsequently stored at −80 °C until DNA extraction. All the experiments were conducted under the approval and regulations of the Institutional Animal Care and Use Committee of the Institute of Oceanology, Chinese Academy of Science.

Library construction, sequencing and data preparation

The Illumina, PacBio HiFi, and Hi-C data were obtained and used for generating a chromosome-level genome assembly of P. argenteus. For Illumina sequencing, total genomic DNA was isolated from muscle samples using the cetyltrimethylammonium bromide (CTAB) method²¹. The quality of the extracted DNA was assessed using a Qubit 2.0 (Thermo Fisher Scientific, USA) and a NanoDrop^® Series (Thermo Fisher Scientific, USA). For Illumina sequencing, a short-fragment library with an insert size of 300–500 base pairs (bp) was prepared using the NEBNext^®ΜLtra™ DNA Library Prep Kit (New England Biolabs, USA) following the manual instructions. The library was purified with AMPure XP Beads (Beckman Coulter, USA) and then subjected to sequencing on an Illumina NovaSeq 6000 platform (Illumina, USA) to generate 150-bp paired-end (PE150) reads. After filtering in Fastp (v0.20.0)²², a total of 75.52 Gb of clean Illumina PE150 data were obtained, with Q20 and Q30 being 97.2% and 92.5%, respectively (Table 2). For PacBio CCS, total genomic DNA total genomic was extracted from muscle samples using the sodium dodecyl sulfate (SDS) method²³. The high-molecular-weight gDNA was sheared to 8–10 kb using g-TUBEs (Covaris, USA). The HiFi library was then prepared using the SMRTbell prep kit 3.0 and sequenced in CCS mode on the PacBio Sequel II system (Pacific Biosciences, USA) following the manufacturer’s protocols. After the removal of low-quality reads and adaptors from the raw data, 63.80 Gb of clean HiFi data was retained, with Q20 and Q30 values of 99.9% and 54.78%, respectively (Table 2). Hi-C library preparation was performed with muscle tissue using a Frasergen Hi-C Kit (Frasergen, China) following the protocol instructions, which included crosslinking, lysis, fragmentation, repairing, biotin labeling, ligation, extraction, purification, and library construction. All the purification steps were performed using AMPure XP beads (Beckman Coulter, USA), while the biotin-labeled DNA was enriched with Pierce™ Streptavidin Magnetic Beads (Thermo Fisher Scientific, USA). The library was assessed with an Agilent 2100 Bioanalyzer (Agilent, USA) to determine a sufficient concentration and an insert size of 300–800 bp. The Hi-C library was subjected to sequencing on an Illumina HiSeq X Ten platform (Illumina, USA) to generate PE150 reads. After filtering in Fastp²², a total of 138.39 Gb of clean Hi-C data were obtained, for which the Q20 and Q30 were 96.57% and 90.54%, respectively (Table 2).

Table 2 Sequencing data used for the P. argenteus genome assembly.

Full size table

To assist in gene prediction, muscle, eye, ovary, heart, and liver tissues were pooled to obtain the transcriptome of P. argenteus. Total RNA was extracted from the pooled sample using a TRIzol reagent kit (Invitrogen, USA) following the manufacturer’s instructions. The quality and concentration of the extracted RNA were assessed using a NanoDrop^® Series (Thermo Scientific, USA) and an Agilent 2100 Bioanalyzer. For RNA-seq data, three cDNA libraries (i.e., Pa-op1, Pa-op2, and Pa-op3; Table 3) were prepared via total RNA extraction using the NEBNext® Ultra™ RNA Library Prep Kit (New England Biolabs, USA) and subsequently subjected to sequencing on an Illumina NovaSeq 6000 platform (Illumina, USA). After filtering via Fastp²², a total of 20.91 Gb of clean RNA-seq data were obtained from the five tissue samples (Table 3). For isoform data, a single cDNA library was reverse transcribed from the total RNA using the Clontech SMARTer PCR cDNA Synthesis Kit (Takara Bio, USA) following the manufacturer’s instructions. The PCR products were purified using AMPure XP Beads (Beckman Coulter, USA) and used for SMRTbell library construction via the SMRTbell Prep Kit 3.0. The library was sequenced and processed with the PacBio Sequel II system (Pacific Biosciences, USA). After filtering, a total of 34.96 Gb of isoform data were obtained (Table 3). The reference genome and protein-coding gene data of closely related species of P. argenteus [i.e., Dunckerocampus dactyliophorus (Bleeker, 1853)²⁴, Hippocampus zosterae Jordan & Gilbert, 1882²⁵, Scomber japonicus Houttuyn, 1782²⁶, Thunnus albacares (Bonnaterre, 1788)²⁷, and T. maccoyii (Castelnau, 1872)²⁸] were downloaded from GenBank and subsequently used for gene prediction and comparisons.

Table 3 The transcriptomic data of P. argenteus used for gene prediction.

Full size table

Genome survey

A survey of the P. argenteus genome was performed using the k-mer method. K-mer analysis was conducted using jellyfish (v2.2.6)²⁹ with 75.52 Gb of Illumina data and the best K value of 17. After the removal of abnormal k-mers, 60,502,700,002 k-mers were yielded with a k-mer peak at a depth of 126.64 (Table 4 & Fig. 3). The genome size, heterozygosity rate, repetition rate, and GC content estimated from GenomeScope (v1.0.0)³⁰ were 463.10 Mb, 1.55%, 29.89% and 39.45%, respectively (Table 4).

Table 4 Genome survey results.

Full size table

Chromosome-level genome assembly

The genome of P. argenteus was first assembled into 416 contigs with HiFi long-read data using the default parameters in Hifiasm (v0.16.1)³¹. The Illumina PE150 data were used to correct the contig assemblies in Pilon (v1.23)³². Finally, 74 non-redundant contigs with a total length of 518.04 Mb were obtained in Redundans (v14a)³³. The contig N50 and maximum length were 20.47 and 25.41 Mb, respectively (Table 1). The clean Hi-C data were aligned to the genome assembly using BWA (v0.7.12)³⁴. Reading depth and coverage were calculated in Picard (v1.129)³⁵ and BEDtools (v2.25.0)³⁶. To obtain the chromosome-level genome, clean Hi-C data were assembled with 74 contigs and adjusted using Juicer (v1.6)³⁷, 3D-DNA (v180114)³⁸ and JuiceBox³⁹. Finally, the assembled sequences were anchored and oriented to 24 pseudochromosomes, ranging in size from 9.80–27.76 Mb (Table 1), which is congruent with the 24-chromatid karyotype reported by Liu et al.⁴⁰. The total length of the chromosome-level assembly was 518.06 Mb, with a scaffold N50 of 22.86 Mb (Table 1 & Fig. 1). Therefore, the clean Illumina, PacBio HiFi, and Hi-C data had 145.77-, 123.15-, and 267.13-fold coverage of the P. argenteus genome, respectively (Table 2). A collinearity dot plot generated using MCScanX⁴¹ and SynVisio⁴² indicated clear genomic collinearity between P. argenteus and the scombriform species S. japonicus²⁶ but scattered collinearity with the syngnathiform H. zosterae²⁷, supporting the closer affinity of P. argenteus to the scombriform species (Fig. 4).

Repeated sequence annotations

Tandem repeats were predicted using Tandem Repeats Finder (v4.10.0) (TRF)⁴³. Transposable elements (TEs) were identified by a combination of signature-based, de novo, and homology-based approaches. De novo prediction was performed using RepLoc (v2021-3-12)⁴⁴, and TEs shorter than 30 bp were removed from the results. Long terminal repeat retrotransposons (LTR-RTs) were identified using both LTR_FINDER (v1.0.2)⁴⁵ and LTRharvest (v1.6.2)⁴⁶. Miniature inverted-repeat transposable element (MITE) sequences were found by MiteFinder (v1.0.006)⁴⁷. Helitron sequences were scanned by a HelitronScanner (v1.0)⁴⁸. TIRvish (v1.6.2)⁴⁹ was used to find terminal inverted repeat (TIR) sequences. All the predicted repetitive sequences were combined with the known repetitive sequence data in the Repbase database⁵⁰ to generate the non-redundant repeat sequence library for P. argenteus using AB-BLAST (v3.0)⁵¹, MCL (v14–137)⁵², MMseqs. 2 (v13.45111)⁵³ and CD-HIT⁵⁴. The final repeat sequences in the P. argenteus genome were identified and classified by homology searching against the library using RepeatMasker (v4.1.2-p1)⁵⁵ and RepeatProteinMask (v4.1.2-p1)⁵⁶. In brief, 13.57% of the P. argenteus genome was annotated as repetitive elements, with 9.39% 9.39% TEs (48.64 Mb) and 8.51% tandem repeats (44.10 Mb) (Table 5).

Table 5 Repeat sequences of the P. argenteus genome annotated using different methods.

Full size table

Non-coding RNA annotation

For non-coding RNA annotation, tRNA and rRNA were predicted by tRNAScan-SE (v2.0.9)⁵⁷ and barrnap (v0.9)⁵⁸, respectively, while snRNA and miRNA were identified by aligning to the Rfam database⁵⁹ with Infernal cmscan (v1.1.4)⁶⁰. A total of 10,650 tRNAs, 634 rRNAs, 1,514 snRNAs, and 675 miRNAs were identified, comprising 17.18% of the P. argenteus genome (Table 6).

Table 6 Information of different types of non-coding RNA genes identified in the P. argenteus genome.

Full size table

Protein-coding gene prediction and annotation

The protein-coding genes were predicted based on four different strategies, namely, RNA-seq-based, isoform-based, homology-based, and de novo predictions. The clean RNA-seq data were assembled into the P. argenteus genome using two different methods: (i) assembly with the reference genome using HISAT (v2.1.1)⁶¹, StringTie (v2.2.0)⁶², and Scallop2 (v1.1.1)⁶³; and (ii) de novo assembly using RNA-Bloom (v2.0.1)⁶⁴, Evigene⁶⁵, minimap2 (v2.26)⁶⁶ and miniprot (v0.12)⁶⁷. For isoform-based prediction, SMRT-link (PacBio, USA) was used to generate isoforms and ESTs, and the transcriptome and protein sequences were generated with the Evigene⁶⁵ platform; these sequences were subsequently mapped onto the P. argenteus genome using minimap2⁶⁶ and miniprot⁶⁷. Protein sequences from D. dactyliophorus²⁴, H. zosterae²⁵, S. japonicus²⁶, T. albacares²⁷, T. maccoyii²⁸ and the Swissprot protein database⁶⁸ were obtained from their genomes and aligned to the P. argenteus genome for homology-based gene prediction in miniprot⁶⁶. The predicted gene models of RNA-seq, isoform, and homology-based strategies were used as training datasets in AUGUSTUS (v3.4.0)⁶⁹ and SNAP (v2.0)⁷⁰ for de novo prediction. Finally, all the predicted gene models were integrated into a single, non-redundant, and complete gene set using EvidenceModeler (v1.1.1)⁷¹. The untranslated region (UTR) and alternative splicing of these genes were annotated in Mikado (v2.3.2)⁷². The gene statistics, including gene length, coding sequence (CDS), intron length, and exon length, were similar between the reference^{24,25,26,27,28} and P. argenteus genomes (Fig. 5). The predicted protein-coding genes were annotated by searching the GenBank Non-Redundant (Nr) (ftp://ftp.ncbi.nih.gov/pub/nrdb/), SwissProt⁶⁸, eukaryotic orthologous groups (KOG)⁷³ and Kyoto Encyclopedia of Genes and Genomes (KEGG)⁷⁴ protein databases using Diamond (v0.7.9)⁷⁵, with an E-value threshold of 1e⁻⁵. EggNOG-mapper (v2.0)⁷⁶ was used for gene ontology (GO) annotation in combination with the eggNOG database (v5.0)⁷⁷. A total of 24,696 genes were predicted in the P. argenteus genome (Table 7). Among these, 23,062 genes (93.38%) were annotated by at least one database, while 12,974 genes (52.53%) were supported by all five databases (Table 8 and Fig. 6).

Table 7 Genes predicted in the P. argenteus genome using different methods.

Full size table

Table 8 Gene function annotation statistics of the assembled genome for P. argenteus.

Full size table

Data Records

The Illumina (SRR27308594), PacBio HiFi (SRR27308592–SRR27308592), Hi-C (SRR27308591), RNA-seq (SRR27308587–SRR27308589) and isoform (SRR27308590) data used for the genome assembly of P. argenteus were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under sequence read project SRP479325⁷⁸. The chromosome-level assembly of the P. argenteus genome was deposited in the NCBI genome database under accession number GCA_036321115⁷⁹. The chromosome assembly of P. argenteus, genomic annotation results, and software settings can be found in the figshare database⁸⁰.

Technical Validation

Evaluation of the genome assembly and annotation

The quality of this chromosome-level genome assembly was assessed using the following three criteria: (i) the mapping rate of Illumina PE150 reads, (ii) the Core Eukaryotic Genes Mapping Approach (CEGMA)⁸¹, and (iii) the Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment⁸². In brief, 99.30% of the Illumina PE150 reads could be aligned to the P. argenteus genome using BWA (v0.7.12)³⁴, for a coverage rate of 99.95%, which indicates high mapping efficiency and sufficient coverage. A total of 230 (92.74%) of the 248 highly conserved core genes for eukaryotes provided in CEGMA could be completely aligned with their homologous genes in the P. argenteus genome. In BUSCO (v4.1.2)⁸², 98.90% of the complete BUSCOs were detected in the P. argenteus genome, whereas fragmented and missing BUSCOs only comprised 1.08% of the total orthologs. This evidence indicated the high integrity and quality of the obtained chromosome-level genome assembly of P. argenteus.

References

Liu, J., Li, C. & Li, X. Studies on Chinese pomfret fishes of the genus Pampus (Pisces: Stromateidae). Stud. Mar. Sin. 44, 240–252 (2002).
Google Scholar
Wei, J. et al. Species diversity and distribution of genus Pampus (Pelagiaria: Stromateidae) based on global mitochondrial data. Front. Mar. Sci. 9, 1050386, https://doi.org/10.3389/fmars.2022.1050386 (2022).
Article Google Scholar
Liu, j, Li, C. S. & Ning, P. Identity of silver pomfret Pampus argenteus (Euphrasen, 1788) based on specimens from its type locality, with a neotype designation (Teleostei, Stromateidae). Acta Zootaxonomica Sin. 38, 171–177 (2013).
CAS Google Scholar
Hughes, L. C. et al. Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. PNAS 115, 6249–6254, https://doi.org/10.1073/pnas.1719358115 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Wei, J. et al. Validity of Pampus liuorum Liu & Li, 2013, Revealed by the DNA Barcoding of Pampus Fishes (Perciformes, Stromateidae). Diversity 13, 618, https://doi.org/10.3390/d13120618 (2021).
Article CAS Google Scholar
Yang, W. T., Li, J. & Yue, G. H. Multiplex genotyping of novel microsatellites from silver pomfret (Pampus argenteus) and cross-amplification in other pomfret species. Mol. Ecol. Notes 6, 1073–1075, https://doi.org/10.1111/j.1471-8286.2006.01438.x (2006).
Article CAS Google Scholar
Zhao, F. et al. Genetic diversity of silver pomfret (Pampus argenteus) in the Southern Yellow and East China Seas. Biochem. Syst. Ecol. 39, 145–150, https://doi.org/10.1016/j.bse.2011.02.002 (2011).
Article CAS Google Scholar
Shi, Z. H., Zhao, F., Fu, R., Huang, X. & Wang, J. Study on artificial larva rearing techniques of silver pomfret (Pampus argenteus). Mar. Fish. 31, 53–57, https://doi.org/10.3969/j.issn.1004-2490.2009.01.008 (2009).
Article Google Scholar
Hu, J. B. et al. Advances in the artificially reproductive and breeding studies of silver pomfret, Pampus argenteus. J. Biol. 33, 87–117, https://doi.org/10.3969/j.issn.2095-1736.2016.04.087 (2016).
Article CAS Google Scholar
Yu, N. et al. Reduced stress responses by MS-222 in juvenile silver pomfret (Pampus argenteus). J. World Aquac. Soc. 51, 1192–1207, https://doi.org/10.1111/jwas.12725 (2020).
Article CAS Google Scholar
Liu, C. et al. Medusa consumption and prey selection of silver pomfret Pampus argenteus juveniles. Chin. J. Oceanol. Limn. 32, 71–80, https://doi.org/10.1007/s00343-014-3034-5 (2014).
Article ADS CAS Google Scholar
Wang, L., Ren, X. & Wang, Y. Feeding rates of juvenile silver pomfret in the East China Sea on different soft pellet diet. Fish. Sci. Tech. Info. 47, 1–5, https://doi.org/10.16446/j.cnki.1001-1994.2020.01.001 (2020).
Article Google Scholar
Wang, Q. et al. Alternations in the liver metabolome, skin and serum antioxidant function of silver pomfret (Pampus Argenteus) is induced by jellyfish feeding. 3 Biotech 11, 192, https://doi.org/10.1007/s13205-021-02702-1 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Dietary jellyfish affect digestive enzyme activities and gut microbiota of Pampus argenteus. Comp. Biochem. Physiol. - D: Genom. Proteom. 40, 100923, https://doi.org/10.1016/j.cbd.2021.100923 (2021).
Article CAS Google Scholar
Liu, C. et al. Potential of utilizing jellyfish as food in culturing Pampus argenteus juveniles. Hydrobiologia 754, 189–200, https://doi.org/10.1007/s10750-014-1869-6 (2015).
Article CAS Google Scholar
Zhang, Y. et al. Immune response of silver pomfret (Pampus argenteus) to Photobacterium damselae subsp. Damselae: Virulence factors might induce immune escape by damaging phagosome. Aquaculture 578, 740014, https://doi.org/10.1016/j.aquaculture.2023.740014 (2024).
Article CAS Google Scholar
Sun, P., Tang, B. & Yin, F. Gene expression during different periods of the handling-stress response in Pampus argenteus. J. Oceanol. Limnol. 36, 1349–1359, https://doi.org/10.1007/s00343-018-7012-1 (2018).
Article ADS CAS Google Scholar
Hilsdorf, A. W. S. et al. 49–74 (Academic Press, 2020).
Haedrich, R. L. The stromateoid fishes: systematics and a classification. Bull. Mus. Comp. Zool. 135, 31–139 (1967).
Google Scholar
AlMomin, S. et al. Draft genome sequence of the silver pomfret fish, Pampus argenteus. Genome 59, 51–58, https://doi.org/10.1139/gen-2015-0056 (2015).
Article CAS PubMed Google Scholar
Richards, E., Reichardt, M. & Rogers, S. Preparation of Genomic DNA from Plant Tissue. Curr. Protoc. Mol. Biol. 27, 2–3, https://doi.org/10.1002/0471142727.mb0203s27 (1994).
Article Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhou, J., Bruns, M. A. & Tiedje, J. M. DNA recovery from soils of diverse composition. Appl. Environ. Microbiol. 62, 316–322, https://doi.org/10.57760/sciencedb.04022 (1996).
Article ADS CAS PubMed PubMed Central Google Scholar
NCBI GenBank assembly, https://identifiers.org/ncbi/insdc.gca:GCA_901007775.1 (2020).
NCBI GenBank assembly, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_025434085.1/ (2022).
NCBI GenBank assembly https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_027409825.1/ (2023).
NCBI GenBank assembly https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_914725855.1/ (2021).
NCBI GenBank assembly https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_910596095.1/ (2021).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204, https://doi.org/10.1093/bioinformatics/btx153 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLOS ONE 9, e112963, https://doi.org/10.1371/journal.pone.0112963 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Pryszcz, L. P. & Gabaldón, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113, https://doi.org/10.1093/nar/gkw294 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, https://doi.org/10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
Broad Institute. Picard Toolkit. https://broadinstitute.github.io/picard/ (2019).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, https://doi.org/10.1093/bioinformatics/btq033 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98, https://doi.org/10.1016/j.cels.2016.07.002 (2016).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst. 6, 256–258, https://doi.org/10.1016/j.cels.2018.01.001 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, K. et al. Chromosome Samples Preparation and Karyotype Analysis of Pomfret (Pampus argenteus). Progr. Fish. Sci. 38, 64–69, https://doi.org/10.11758/yykxjz.20161107001 (2017).
Article Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49, https://doi.org/10.1093/nar/gkr1293 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Bandi, V. et al. in Plant Bioinformatics: Methods and Protocols (ed. Edwards, D.). 285–308 (Springer US, 2022).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580, https://doi.org/10.1093/nar/27.2.573 (1999).
Article CAS PubMed PubMed Central Google Scholar
Feng, C., Dai, M., Liu, Y. & Chen, M. Sequence repetitiveness quantification and de novo repeat detection by weighted k-mer coverage. Brief. Bioinform. 22, bbaa086, https://doi.org/10.1093/bib/bbaa086 (2021).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268, https://doi.org/10.1093/nar/gkm286 (2007).
Article PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18, https://doi.org/10.1186/1471-2105-9-18 (2008).
Article CAS Google Scholar
Hu, J., Zheng, Y. & Shang, X. MiteFinderII: a novel tool to identify miniature inverted-repeat transposable elements hidden in eukaryotic genomes. BMC Med. Genom. 11, 101, https://doi.org/10.1186/s12920-018-0418-y (2018).
Article CAS Google Scholar
Xiong, W., He, L., Lai, J., Dooner, H. K. & Du, C. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc. Natl. Acad. Sci. USA 111, 10263–10268, https://doi.org/10.1073/pnas.1410068111 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 10, 645–656, https://doi.org/10.1109/TCBB.2013.68 (2013).
Article PubMed Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11, https://doi.org/10.1186/s13100-015-0041-9 (2015).
Article PubMed PubMed Central Google Scholar
Gish W. AB-BLAST, version 3.0 http://blast.advbiocomp.com/ (2009).
van Dongen, S. & Abreu-Goodger, C. in Bacterial Molecular Networks: Methods and Protocols (eds. Van Helden, J., Toussaint, A. & Thieffry, D.) 281–295 (Springer New York, 2012).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152, https://doi.org/10.1093/bioinformatics/bts565 (2012).
Article CAS PubMed PubMed Central Google Scholar
Steinegger, M. & Söding, J. MMseqs. 2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028, https://doi.org/10.1038/nbt.3988 (2017).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinformatics 25, 4.10.11–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Article Google Scholar
Tempel, S. in Mobile Genetic Elements: Protocols and Genomic Applications (ed Bigot, Y.) 29–51 (Humana Press, 2012).
Chan, P. P. & Lowe, T. M. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 1–14 (Springer New York, 2019).
Seemann, T. Barrnap 0.9: Rapid ribosomal RNA prediction. Available at: https://github.com/tseemann/barrnap.
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200, https://doi.org/10.1093/nar/gkaa1047 (2021).
Article CAS PubMed Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360, https://doi.org/10.1038/nmeth.3317 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295, https://doi.org/10.1038/nbt.3122 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q., Shi, Q. & Shao, M. Accurate assembly of multi-end RNA-seq data with Scallop2. Nat. Comput. Sci. 2, 148–152, https://doi.org/10.1038/s43588-022-00216-1 (2022).
Article CAS PubMed PubMed Central Google Scholar
Nip, K. M. et al. RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes. Genome Res. 30, 1191–1200, https://doi.org/10.1101/gr.260174.119 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gilbert, D. G. Genes of the pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ 7, e6374, https://doi.org/10.7717/peerj.6374 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100, https://doi.org/10.1093/bioinformatics/bty191 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Protein-to-genome alignment with miniprot. Bioinformatics 39, btad014, https://doi.org/10.1093/bioinformatics/btad014 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439, https://doi.org/10.1093/nar/gkl200 (2006).
Article CAS PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59, https://doi.org/10.1186/1471-2105-5-59 (2004).
Article Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Article CAS PubMed PubMed Central Google Scholar
Venturini, L., Caim, S., Kaithakottil, G. G., Mapleson, D. L. & Swarbreck, D. Leveraging multiple transcriptome assembly methods for improved gene structure annotation. GigaScience 7, giy093, https://doi.org/10.1093/gigascience/giy093 (2018).
Article CAS PubMed PubMed Central Google Scholar
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinform. 4, 41, https://doi.org/10.1186/1471-2105-4-41 (2003).
Article Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462, https://doi.org/10.1093/nar/gkv1070 (2016).
Article CAS PubMed Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60, https://doi.org/10.1038/nmeth.3176 (2015).
Article CAS PubMed Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314, https://doi.org/10.1093/nar/gky1085 (2019).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP479325 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036321115 (2024).
Wei, J. The first high-quality chromosome-level genome assembly of the silver pomfret (Pampus argenteus). figshare https://doi.org/10.6084/m9.figshare.24155052 (2023).
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067, https://doi.org/10.1093/bioinformatics/btm071 (2007).
Article CAS PubMed Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Nos. 31872195 and 32270472) and Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDB42000000).

Author information

These authors contributed equally: Jiehong Wei, Yongshuang Xiao.

Authors and Affiliations

Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
Jiehong Wei, Yongshuang Xiao, Jing Liu & Kuidong Xu
University of Chinese Academy of Sciences, Beijing, China
Jiehong Wei
Escuela de Ciencias Biológicas, Universidad Nacional, San José, Costa Rica
Angel Herrera-Ulloa
Institute of Ocean and Earth Sciences, Universiti Malaya, Kuala Lumpur, Malaysia
Kar-Hoe Loh

Authors

Jiehong Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yongshuang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Angel Herrera-Ulloa
View author publications
You can also search for this author in PubMed Google Scholar
Kar-Hoe Loh
View author publications
You can also search for this author in PubMed Google Scholar
Kuidong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.X. and J.L. conceived the research project. Y.X. and J.L. collected the samples. J.W. and Y.X. performed the analyses. J.W., Y.X., J.L., K.X., A.H. and K.L. wrote and revised the manuscript. Correspondence and requests for materials should be addressed to J.L. and K.X.

Corresponding authors

Correspondence to Jing Liu or Kuidong Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wei, J., Xiao, Y., Liu, J. et al. Chromosome-level genome assembly of the silver pomfret Pampus argenteus. Sci Data 11, 234 (2024). https://doi.org/10.1038/s41597-024-03070-0

Download citation

Received: 25 September 2023
Accepted: 13 February 2024
Published: 23 February 2024
DOI: https://doi.org/10.1038/s41597-024-03070-0
Springer Nature Limited

Chromosome-level genome assembly of the silver pomfret Pampus argenteus

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of golden pompano (Trachinotus ovatus) in the family Carangidae

Chromosome-level assembly of Triplophysa yarkandensis genome based on the single molecule real-time sequencing

A chromosome-level genome assembly and evolutionary analysis of Coregonus ussuriensis Berg

Background & Summary