Abstract
In East Asia, anguillid eels are commercially important. However, unlike other species, they have not been successfully cultivated throughout their lifecycle. Facing population decline due to overharvesting and environmental pressures, the industry is turning to alternatives, such as Anguilla bicolor pacifica (short-finned eel). However, genomic data for short-finned eels are unavailable. Here, we present in-depth whole-genome sequencing results for short-finned eel obtained using two sequencing platforms (PacBio Revio, and Illumina). In this study, we achieved a highly contiguous genome assembly of the short-finned eel, comprising 19 pseudochromosomes encompassing 99.76% of the 1.087 Gb genome sequence with an N50 of 16.88 and 61.07 Mb from contig and scaffold, respectively. Transcripts from four different tissues led to the annotation of 23,095 protein-coding genes in the eel genome, 98.66% of which were functionally annotated. This high-quality genome assembly, along with the annotation data, provides a foundation for future functional genomic studies of short-finned eels.
Similar content being viewed by others
Background & Summary
Anguillid eels are commercially important fish in East Asia, with approximately 270,000 metric tons of eels being cultivated worldwide1,2,3. Despite considerable and persistent efforts over many years, achieving a successful full-life cycle culture from egg to adult remains elusive in eels, distinguishing them from numerous other fish species where such comprehensive cultivation has become feasible4. Consequently, the eel farming industry relies heavily on the collection of wild glass eels that migrate towards estuaries or inland freshwater habitats5. Nonetheless, several factors, including impediments to migration, pollution, climate change, habitat loss, and overexploitation of juvenile glass eels, have significantly reduced eel population6,7. Notably, the Japanese eel is categorized as “Endangered,” while the European eel holds a “Critically Endangered” classification by the International Union for Conservation of Nature Red List (https://www.iucnredlist.org/search?query=anguilla&searchType=species). Consequently, alternative anguillid eel species, such as Anguilla marmorata and Anguilla bicolor, have garnered considerable interest8,9. Moreover, a recent study has shown that A. bicolor pacifica exhibits a faster growth rate than A. marmorata, indicating its potential suitability for cage culture10. Due to its comparable taste and texture, A. bicolor pacifica is recognized as the second-preferred choice, following A. japonica, indicating its significant economic importance concerning market demand11.
Anguilla bicolor (short-finned eel) is globally distributed throughout the Indo-Pacific region, ranging from East Africa to Papua New Guinea, including the Philippines and Indonesia12. However, because of its allopatric distribution and slight morphological variations, A. bicolor has been divided into two subspecies: A. bicolor bicolor (inhabiting the Indian Ocean) and A. bicolor pacifica (inhabiting the western Pacific Ocean)13.
Anguillid eels, known for their catadromous life patterns, adapt to varying environments throughout their life cycle, starting in marine environments as larvae, transitioning to brackish or inland shore waters as juveniles, and settling in fresh waters as adults14,15. Their notable resilience to an extensive range of salt concentrations provides opportunities to investigate how they coordinate osmotic pressure during migration. Understanding the osmoregulatory mechanisms will provide valuable insights into the ability of anguillid eels to achieve homeostasis. Currently, the genomic sequences of six eel species are available. Chromosome-level genome assemblies have been established for three eels: A. japonica16, A. anguilla17, and A. rostrata (GCA_018555375.3). The genomes of A. obscura, A. marmorata, and A. megastoma were assembled at the scaffold level18. However, genomic information on A. bicolor pacifica is lacking.
In summary, the genomic resources presented in this study are valuable for studying the molecular mechanisms that drive evolutionary adaptations in migratory euryhaline fish. Additionally, the chromosome-scale genome of short-finned eels will facilitate comparative genomic studies, which will shed light on the adaptive strategies employed by catadromous fish that enable them to survive and thrive across a diverse range of saline environments.
Methods
Ethics statement
The experimental protocols were approved by the Institutional Animal Care and Use Committee (IACUC) of Chonnam National University (CNU IACUC-YS-2023-9).
DNA sample collection, library construction, and DNA sequencing
Short-finned eels (Anguilla bicolor pacifica) collected from an aquafarm in Jeonnam, South Korea, were transported to the laboratory and kept in a 250 L aerated tank at a water temperature of 24 °C (Supplementary Table S1). For DNA extraction, the muscle tissue was sampled from a male eel with a standard body length of 20 cm (Supplementary Figure S1). A short-fragment library was generated using the TruSeq DNA Nano 550 bp kit with an insert size of 550 bp. The paired-end library was sequenced using an Illumina NovaSeq 6000 platform. The DNA, ranging from 2 to 5 μg were placed in a single lane of a BluePippin 0.75% gel. Electrophoresis was used to collect libraries of 9–13 kb and > 15 kb. The library was sequenced by circular consensus sequencing (CCS) on a PacBio Revio platform. Genomic DNA was extracted from the muscle tissue of an individual eel. Two platforms were used for the DNA sequencing. In total, 45.3 Gb (50 × coverage) of Illumina reads and 83.24 Gb (92 × coverage) of HiFi reads were generated (Table 1).
Genome size estimation, genome assembly, and quality assessment
We removed any Illumina reads shorter than 120 bp that contained adaptor sequences, low quality (Phred score < 20), or unknown bases (Ns) using Trim_galore (ver. 0.6.7)19. The trimmed reads were then used to count the 21-mer reads using jellyfish (ver. 2.3.0)20. Subsequently, based on the 21-mer histogram, the genome size of A. bicolor pacifica was estimated to be 899.9 Mb, with a heterozygosity rate of 1.25% by GenomeScope (ver. 2.0)21 (Supplementary Figure S2).
HiFi reads were used for draft assembly to produce highly contiguous draft contigs using Hifiasm (ver. 0.16.1)22. This process resulted in the generation of 405 contigs with a total length of 1.09 Gb and contig N50 of 16.9 Mb (Table 2).
To improve contig contiguity, final scaffolds were generated using RagTag (ver. 2.1.0)23. In scaffolding process, chromosomes of A. japonica genome were used as a reference16. During this process, gaps, indicated by “N” characters, were inserted between adjacent query sequences. These gaps represent regions within the sequences that remained unidentified. This step included reordering, orienting, and connecting the sequences using these gaps. Consequently, 405 contigs were integrated into 30 linear scaffolds. This final assembly comprised 19 pseudochromosome-level scaffolds (99.76%) and 11 unplaced scaffolds (0.24%) with an N50 value of 61.07 Mb and a total length of 1.09 Gb (Fig. 1a, Tables 2, 3).
To evaluate the completeness of the assembled short-finned eel genome, benchmark universal single-copy orthologs (BUSCO) (ver. 5.4.3)24 was used to compare the 3,640 orthologous genes present in Actinopterygii_odb10. The GC contents and genome sizes of the seven anguillid species were found to be comparable (Table 4). Additionally, Bowtie2 (ver. 2.4.5)25 was used to align Illumina short reads generated from DNA using the following parameters:--no-unal --very-sensitive-local. This resulted in an alignment ratio of 99.41%. Finally, genome quality was examined using QUAST (ver. 5.2.0)26.
Transcriptome sequencing and assembly
Four different tissue types, namely, the eye, heart, liver, and muscle, were collected from an individual eel. All collected samples were immediately preserved in RNAlater and stored at −80 °C until RNA extraction. Total RNA was extracted from the four samples using a TruSeq Stranded mRNA sample preparation kit following the manufacturer’s protocol. Complementary DNA libraries were constructed and sequenced using the Illumina NovaSeq 6000 platform to generate 5.24–6.91 Gb of paired-end reads. A total of 21.83 Gb of clean reads was obtained using Trim_galore and assembled using Trinity (ver. 2.15.1)27 through the default option (see Table 1).
Genome structure annotation
To identify and screen repetitive sequences within the genomes of short-finned eels, we integrated homology- and de novo-based prediction approaches using RepeatModeler (ver. 2.0.1) and RepeatMasker (ver. 4.1.2)28,29. A. bicolor pacifica repeat library was annotated by RepeatModeler with the National Center for Biotechnology Information (NCBI) searching engine RMBlast (ver. 2.9.0). This custom repeat library and two repeat libraries from Actinopterygii and Anguilla in Dfam30 were used by RepeatMasker. A total of 28.39% of the repetitive sequences were present in the genome of A. bicolor pacifica (Table 5).
BRAKER Pipeline (ver. 3.0.6)31 was used to predict the gene models in the genome of A. bicolor pacifica. This process began with soft-masking repeats in the genomes generated using RepeatModeler and RepeatMasker. GeneMark-ETP (ver. 1)32 was used to generate hints from the RNA-Seq and protein data. For the protein data, we combined the Metazoa ortholog data from OrthoDB 1133 and six actinopterygian species (A. anguilla, GCF_013347855.1; Danio rerio, GCF_000002035.6; Pleuronectes platessa, GCF_947347685.1; Poecilia reticulata, GCF_000633615.1; Scleropages formosus, GCF_900964775.1; and Takifugu rubripes, GCF_901000725.2). To train the gene sets, ab initio gene prediction was performed using Augustus software (ver. 3.4.0)34, incorporating hints provided by GeneMark-ETP. Finally, the results were integrated with those of TSEBRA35. Gene models were annotated by combining evidence from homology, de novo, and transcriptome data, yielding 23,095 non-redundant protein-coding genes. The BUSCO analysis identified 3,448 (94.7%) actinopterygian orthologous genes (Table 6).
Genome annotation
The functions of the integrated gene models were annotated using the SWISS-PROT protein database36 and the NCBI non-redundant database (https://www.ncbi.nlm.nih.gov/protein). The diamond (ver. 2.1.9.163)37 blastx was used with the following parameters: -–dbsize 530000000 -–max-targetseqs. 1 -–outfmt 6 -–evalue 1e-5. Furthermore, we used eggNOG-mapper (ver. 2.1.8)38 to annotate their functions against the Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), euKaryotic Orthologous Group (KOG), and protein families (Pfam) databases. The “--evalue 1e-5 -m diamond” parameters in eggnog-mapper were applied. In summary, 98.66% of gene models were functionally annotated using publicly accessible databases (Table 6).
Genome-wide collinearity analysis
Two anguillid genomes were used for comparative analyses. Macrosynteny pairs between short-finned and Japanese eels were obtained using MCscan with the default option (https://github.com/tanghaibao/jcvi/wiki/MCscan, Python version)39. Macrosynteny blocks were visualized using the Python scripts provided by MCscan. The 19 pseudochromosomes from A. bicolor pacifica showed a highly conserved collinear relation with the chromosomes of A. japonica (Fig. 1b).
Data Records
The raw sequencing data for this study are deposited in the NCBI under BioProject ID: PRJNA1073276. Illumina, transcriptome, and PacBio sequencing data are available under the Sequence Read Archive ID: SRR27869073–SRR2786907840. The assembled genome has been deposited in the GenBank database under the accession number JBDGNX02000000041. Additionally, assembled genome and annotations can be downloaded from Figshare42 under https://doi.org/10.6084/m9.figshare.25139891. All data sets used in this study are available at: http://eyunlab.cau.ac.kr/shortfinned_eel.
Technical Validation
Evaluation of genome assembly and annotation
Five methods were applied to evaluate the completeness, accuracy, and contiguity of the A. bicolor pacifica genome assembly. These included statistics of N50, BUSCO analysis, mapping of short reads of DNA to the genome, and comparison of synteny blocks in the genomes of A. bicolor pacifica and A. japonica. Furthermore, the total size of the assembled genome is similar to that estimated by jellyfish. All assessments indicated that the genome assembly was contiguous, and of high quality.
Code availability
The software used in this study is publicly available. Parameters for all commands used in this study were described in Method. The default parameters were applied for any commands where specific parameters were not mentioned. No custom script or code was used in this study.
References
Sugeha, H. Y. & Genisa, M. U. External and internal morphological characteristics of glass eels Anguilla bicolor bicolor from the Cibaliung River Estuary, Banten, Indonesia. OLDI 41, 37–48 (2015).
Marini, M. et al. Genetic diversity, population structure and demographic history of the tropical eel Anguilla bicolor pacifica in Southeast Asia using mitochondrial DNA control region sequences. GECCO 26, e01493 (2021).
Yuan, Y., Yuan, Y., Dai, Y., Gong, Y. & Yuan, Y. Development status and trends in the eel farming industry in Asia. N. Am. J. Aquacult. 84, 3–17 (2022).
Tanaka, H. Progression in artificial seedling production of Japanese eel Anguilla japonica. Fish. Sci. 81, 11–19 (2015).
Liao, I. C., Hsu, Y. K. & Lee, W. C. Technical innovations in eel culture systems. Rev. Fish. Sci. 10, 433–450 (2002).
Guhl, B., Stürenberg, F. J. & Santora, G. Contaminant levels in the European eel (Anguilla anguilla) in North Rhine-Westphalian rivers. Environ. Sci. Eur. 26, 26 (2014).
Belpaire, C. G. J. et al. Decreasing eel stocks: survival of the fattest? Ecol. Freshwat. Fish 18, 197–214 (2009).
Muthmainnah, D., Honda, S., Suryati, N. K. & Prisantoso, B. I. Understanding the current status of anguillid eel fisheries in Southeast Asia. Fish for the People 14, 19–25 (2016).
Cuvin-Aralar, M. L., Aya, F. A., Romana-Eguia, M. R. R. & Logronio, D. J. Nursery culture of tropical anguillid eels in the Philippines (Aquaculture Department, Southeast Asian Fisheries Development Center, 2019).
Aya, F. A. & Garcia, L. M. B. Cage culture of tropical eels, Anguilla bicolor pacifica and A. marmorata juveniles: Comparison of growth, feed utilization, biochemical composition and blood chemistry. Aquacult. Res. 53, 6283–6291 (2022).
Arai, T. Do we protect freshwater eels or do we drive them to extinction? Springerplus 3, 534 (2014).
Ege, V. A revision of the genus Anguilla Shaw. Vol. 16 8-256 (Brill, 1939).
Watanabe, S., Miller, M. J., Aoyama, J. & Tsukamoto, K. Evaluation of the population structure of Anguilla bicolor and A. bengalensis using total number of vertebrae and consideration of the subspecies concept for the genus Anguilla. Ecol. Freshwat. Fish 23, 77–85 (2014).
Arai, T. Ecology and evolution of migration in the freshwater eels of the genus Anguilla Schrank, 1798. Heliyon 6, e05176 (2020).
Wright, R. M. et al. First direct evidence of adult European eels migrating to their breeding place in the Sargasso Sea. Sci. Rep. 12, 15362 (2022).
Wang, H. et al. A Chromosome-level assembly of the Japanese eel genome, insights into gene duplication and chromosomal reorganization. GigaScience 11, giac120 (2022).
Parey, E. et al. Genome structures resolve the early diversification of teleost fishes. Science 379, 572–575 (2023).
Barth, J. M. I. et al. Stable species boundaries despite ten million years of hybridization in tropical eels. Nat. Commun. 11, 1433 (2020).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 17, 10–12 (2011).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinform. 27, 764–770 (2011).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Simão, F. A. et al. assessing genome assembly and annotation completeness with single-copy orthologs. Bioinform. 31, 3210–3212 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G. & QUAST Quality assessment tool for genome assemblies. Bioinform. 29, 1072–1075 (2013).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Smit, A. F. A. & Hubley, R. RepeatModeler Open-1.0, http://www.repeatmasker.org (2008–2015).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0 http://www.repeatmasker.org (2013–2015).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–89 (2016).
Gabriel, L. et al. BRAKER3: fully automated genome annotation using RNA-Seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv (2023).
Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. bioRxiv (2023).
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 51, D445–D451 (2022).
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Gabriel, L., Hoff, K. J., Brůna, T., Borodovsky, M. & Stanke, M. TSEBRA: transcript selector for BRAKER. BMC Bioinform 22, 566 (2021).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional annotation, orthology assignments and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486–488 (2008).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP488076 (2024).
Choi, H., Nam, J., Yang, S. & Eyun, S. Anguilla bicolor pacifica, whole genome sequencing project. GenBank. https://identifiers.org/ncbi/insdc:JBDGNX020000000 (2024).
Choi, H., Nam, J., Yang, S. & Eyun, S. Chromosome-level genome assembly and gene annotation of short-finned eel (Anguilla bicolor pacifica). figshare. https://doi.org/10.6084/m9.figshare.25139891.v5 (2024).
Acknowledgements
This work was supported by the National Research Foundation of Korea (2022R1A2C4002058) and the Korea Institute of Marine Science & Technology Promotion (RS-2022-KS221676) funded by the Ministry of Oceans and Fisheries.
Author information
Authors and Affiliations
Contributions
S.E. supervised and conceived the project. H.C. and J.N. collected the sample. J.N. performed the experiments. H.C. performed bioinformatics analysis. H.C., S.Y. and S.E. wrote the article. All authors contributed to discussions and interpretations of the results and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Choi, H., Nam, J., Yang, S. et al. Highly contiguous genome assembly and gene annotation of the short-finned eel (Anguilla bicolor pacifica). Sci Data 11, 952 (2024). https://doi.org/10.1038/s41597-024-03817-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03817-9
- Springer Nature Limited