Abstract
Dryocosmus kuriphilus, commonly known as the chestnut gall wasp, belongs to the family Cynipidae and is native to China. It is a highly invasive insect species causing serious damage to chestnut trees and has rapidly spread to various continents, including Europe, North America, and Oceania. The D. kuriphilus has become one of the important pests of chestnut plants in the world and is listed as a quarantine object by the European and Mediterranean Plant Protection Organization (EPPO). In this study, we used PacBio long reads, Illumina short reads, and Hi-C sequencing data to construct a chromosome-level assembly of the D. kuriphilus genome. The assembled genome includes 14,729 contigs with a total length of 2.28 Gb and a contig N50 of 0.8 Mb. With Hi-C technology, 2.17 Gb (95.02%) of contigs were anchored and oriented into the 10 pseudochromosomes with the scaffold N50 of 198.8 Mb and the scaffold N90 of 158.8 Mb. In total, 24,086 protein-coding genes were predicted in the assembled D. kuriphilus genome as the reference gene set. A total of 1.82 Gb repeats (occupying 79.7% of the genome), including 1.42 Gb of transposable elements and 0.40 Gb of tandem repeats, were identified in D. kuriphilus genome. In the evaluation of completeness, the BUSCO analysis determined a level of 98.1% completeness for the assembled genome sequences based on the Insecta database (OrthoDB version 10). The high-quality genome assembly of D. kuriphilus will not only provide a valuable reference for the study of its evolutionary history and genetic structure but also facilitate the research of host-pest interactions and invasiveness. Moreover, this genome assembly will promote in the development of effective management strategies to mitigate the economic and ecological impacts of this invasive pest on chestnut trees and ecosystems.
Similar content being viewed by others
Background & Summary
The Oriental chestnut gall wasp, Dryocosmus kuriphilus (Hymenoptera: Cynipidae) is native to China and naturally distributed in Shanxi, Hebei, Shandong, Hunan, Hubei, Anhui, Henan, Fujian, Zhejiang, Jiangxi and Jiangsu provinces1. It is one of a long list of invasive alien hymenopterans that have established themselves outside their native range. Its rapid expansion ability, coupled with its associated ecological and economic damage, make it one of the most important pests of chestnuts (Castanea spp.) worldwide. As the invasive species, D. kuriphilus has invaded Japan, South Korea, Nepal, Italy, France, Slovakia, Hungary, and the United States2,3,4. Chestnut gall wasps could harm almost all chestnut species belonging to the Castanea genus, such as Castanea mollissima, Castanea henryi, Castanea seguinii, Castanea crenata, Castanea sativa and Castanea dentata, causing serious damage to the production of chestnut5,6,7. It became an important horticultural pest and was listed for quarantine by the European and Mediterranean Plant Protection Organization (EPPO). The larva of chestnut gall wasp lives in the completely sealed gall and efficient absorption of host nutrients, which increases the difficulty of management control8. The pest is a parthenogenetic insect among 48 known species of the Dryocosmus genus and has a strong reproductive capacity with a single female adult producing about 300 eggs. The gall formation and parthenogenesis were considered to be the key reasons for its rapid spread and population growthe9,10. D. kuriphilus could reduce chestnut production by decreasing the formation of female flowers11 and yield can be reduced up to 80%12. D. kuriphilus infestations indirectly reduce leaf area, leading to earlier leaf mortality and abscission, lower leaf biomass, and a reduced ability to produce winter buds3,13 Massive attacks and lasting damage gradually led to a reduction in tree vitality.
In this study, a chromosome-level of the D. kuriphilus genome was performed using a combination of PacBio long reads, Illumina short reads, and chromosome conformation capture (Hi-C) sequencing technologies. The gene models were predicted by EVidence Modeler, embedded in a pipeline that integrates evidences from ab initio predictions, homology-based searches, and full-length transcriptome and RNA-seq alignments. This high-quality reference genome of D. kuriphilus provided not only a valuable resource for understanding the genetics, ecology, and evolution of D. kuriphilus, but also theoretical guidance for explaining the evolutionary mechanism of its environmental adaptation and invasion.
Methods
Sampling, sequencing, and genome size estimation
The galls of Dryocosmus kuriphilus were collected from chestnut trees in a local paddy field in Changsha, Hunan province, China, from May to June 2019. The collected gall was placed in a breeding cage (length × width × height = 30 cm × 20 cm × 30 cm) at 25 °C, and the emerging adult worms were frozen in liquid nitrogen at −80 °C for subsequent sequencing. Genomic DNA was extracted from 20 D. kuriphilus adults for constructing polymerase chain reaction-free (PCR) Illumina 300–500 bp insert libraries and PacBio 20 kb insert library and sequenced on Illumina HiSeq 2500 and PacBio Sequel platforms, respectively. A total of 227.2 Gb Illumina clean reads and 336.5 Gb Pacbio long reads were generated in this study (Table 1), Illumina sequencing data quality assessment revealed that the quality of paired-end Illumina sequencing data is high, with the single-base error rate of over 91.5% sequences being less than 0.001 (i.e., Quality scores 30; Fig. 1A,B). For the PacBio sequencing reads, an average length of long reads was 19.5 Kb with an N50 length of 28.5 Kb, and an N90 length of 10.8 Kb.
The genome size of D. kuriphilus was estimated using k-mer-based estimation methods. The k-mer distribution of Illumina reads was counted by using jellyfish v2.3.014 (k-mer = 21, parameters: count -m 21 -t 10 -s 1 G). The genome size and the heterozygosity rate were estimated to be ~2752.38 Mb and 0.43%, respectively, by the GenomeScope online version (http://qb.cshl.edu/genomescope/) using the k-mer count distribution file (Fig. 2).
Hi-C library preparation and sequencing
Crosslinking was stopped by adding glycine and additional vacuum infiltration. Fixed tissue was then ground to powder before re-suspending in nuclei isolation buffer to obtain a suspension of nuclei. the purified nuclei were digested with 100 units of DpnII and marked by incubating with biotin-14-dATP. Biotin-14-dATP from non-ligated DNA ends was removed owing to the exonuclease activity of T4 DNA polymerase. The ligated DNA was sheared into 300–600 bp fragments and then was blunt-end repaired and A-tailed, followed by purification through biotin-streptavidin-mediated pull-down. Finally, the Hi-C libraries were quantified and sequenced using the Illumina MGI-2000 platform. A total of 357.2 Gb of clean data was generated in this study (Table 1).
Transcriptome sequencing
The transcriptome Illumina sequencing was performed with three periods of D. kuriphilus, including the larvae, pupa and adult, respectively. RNA libraries were prepared using the TruSeq RNA Sample Prep Kit (Illumina, USA) according to the manufacturer’s instructions, and PE150 sequencing was conducted on an Illumina NovaSeq 6000 platform at Novogene Biotech Co., Ltd. (Beijing, China). A total of 35.5 Gb of clean data were generated (Table 1).
For the transcriptome Pacbio sequencing, the equal volume mixed 300 ng total RNA of three periods (the larvae, pupa and adult) was reverse transcribed into cDNA and amplified using NEBNext® Single Cell/Low Input cDNA Synthesis & Amplification Module and Iso-Seq Express Oligo Kit. cDNAs were purified by ProNex Beads and used to construct the library by SMRTbell Express Template Prep Kit 2.0. The SMRTbell template was annealed to sequencing primer bound to polymerase and sequenced on the PacBio Sequel II platform. In this study, a total of 22.2 Gb full-length transcriptome data were generated.
Genome assembly
Wtdbg 215 (parameters: -t 32 -g 2.6 g -x sq -l 4096 -L 10000) was used for the assembly of the D. kuriphilus genome. To polish the draft assembly, PacBio subreads were subjected to three rounds of polishing with the program Racon v1.4.3 (https://github.com/isovic/racon), and then the Illumina paired-end reads were further subjected to three rounds of polishing with the program Pilon v1.2316 (parameters:–fix all–changes). Finally, the total length of the draft genome was 2.17 Gb, comprising 9,372 contigs with a contig N50 of 0.8 Mb (Table 2).
The Hi-C reads were employed to anchor the contigs onto chromosomes through sorting, orientation, and ordering. The 357.2 Gb Hi-C paired-end data were used to group these contigs to the chromosomes by ALLHiC17. Then, we divided the assembled chromosomes into equally sized bins (500 Kb) and constructed an interaction heatmap based on the number of valid paired-end reads supporting interactions between each pair of bins. The visual correction of the assembly was finalized using JuiceBox v.2.1.1018 based on the intensity of chromosome interaction (Fig. 3). Finally, the chromosome-level genome was generated with a N50 of 198.8 Mb and N90 of 158.8 Mb (Table 2).
Genome annotation
A de novo repeat library for D. kuriphilus was constructed by RepeatModeler v. 1.0.4 (http://www.repeatmasker.org/RepeatModeler.html). Transposable elements (TEs) in the D. kuriphilus genome were also identified by RepeatMasker v4.0.6 (http://www.repeatmasker.org/) using both the Repbase library and the de novo library. A total of 1.8 Gb repeat sequences, which occupied 79.7% of the D. kuriphilus genome, were identified in this study, including 62.3% of TEs and 17.4% of tandem repeat (Table 3). We masked the TEs of D. kuriphilus genome for further gene prediction.
For the gene prediction of D. kuriphilus genome, a strategy integrating ab initio prediction, homology searching and transcriptome-based approaches was performed in this study. A total of 122,822 genes were predicted in the D. kuriphilus genome by Augustus (v3.5.3)19. For homologous annotation, we queried the D. kuriphilus genome sequences against a database containing non-overlap protein sequences from three species (Apis mellifera (GCA_003254395.2), Nasonia vitripennis (GCA_009193385.2) and Tribolium castaneum (GCA_000002335.3)) by genBlastA20 (with parameter: -e 1e-2 -g T -f F -a 0.5 -d 100000 -r 10 -c 0.5 -s 0), followed by Genewise21 annotation. A total of 20,395, 29,337, and 24,050 genes were predicted from Apis mellifera, Nasonia vitripennis, and Tribolium castaneum gene sets, respectively. For the RNA-seq annotation, the Illumina pair-end and Pacbio full-length transcript data were mapped to the assembled genome of D. kuriphilus, followed by gene predicted using cufflinks v2.2.1e and PASA v2.3.322,23. The gene sets were generated by combining all the predictions using the EVidenceModeler program (EVM-1.1.1)24. To maintain the confidence of predicted genes, we retained only gene models that had at least one supporting evidence from homologous proteins of closely related species, InterProScan domain and RNA-seq data. Finally, a total of 24,086 protein-coding gene models were predicted in the D. kuriphilus genome (Table 4).
For functional annotation, we performed searches of our predicted protein-coding genes against the non-redundant (NR) using BLASTP v2.9.03325, Pfam, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and eggNOG databases. A total of 89.1% (21,460 of 24,086) of protein-coding genes were annotated in this study (Table 5).
Data Records
The D. kuriphilus genome project was deposited at NCBI under Bioproject No. PRJNA109237826. Genomic Pacbio sequencing data were deposited in the Sequence Read Archive at NCBI under accession number SRR2846712727. Genomic Illumina sequencing data were deposited in the Sequence Read Archive at NCBI under accession SRR2864693428. Hi-C sequencing data were deposited in the Sequence Read Archive at NCBI under accession number SRR2867963529. Illumina RNA-seq data were deposited in the Sequence Read Archive at NCBI under accession number SRR28520759-SRR2852076630,31,32,33,34,35,36,37, and Pacbio RNA-seq data were deposited in the Sequence Read Archive at NCBI under accession number SRR2850881138. The final chromosome assembly was deposited in GenBank at NCBI under accession number JBBWUJ00000000039. The gene set of D. kuriphilus was available in Figshare under a DOI number of https://doi.org/10.6084/m9.figshare.25800868.v140.
Technical Validation
Three methods were used to evaluate the completeness of the genome assembly. First, 98.1% of the eukaryote core genes from OrthoDB (insecta_odb10) were identified as complete in the reference gene set by BUSCO v5.3.241 (Table 6). Then, we used another evaluation software compleasm v0.2.542 with the insecta-odb10 database to assess the completeness of D. kuriphilus genome. The results showed that 98.32% of the evaluated D. kuriphilus genes were identified as complete (single-copied gene: 95.39%, duplicated gene: 2.93%) (Table 6). We also evaluated the completeness of predicted genes and results showed that 93.8% of predicted gene were identified as complete. Additionally, we used the Illumina short reads and Pacbio long reads to align to the D. kuriphilus reference genome using BWA-MEM version 0.7.1721 (https://github.com/lh3/bwa). The analysis revealed that 98.65% of the short reads and 95.55% of the long reads were successfully mapped to the D. kuriphilus genome.
Code availability
No specific script was utilized in this study. For all analyses, the version and parameters of the software have been included in the Methods section.
References
Zhu, D. H., He, Y. Y., Fan, Y. S., Ma, M. Y. & Peng, D. L. Negative evidence of parthenogenesis induction by Wolbachia in a gallwasp species, Dryocosmus kuriphilus. Entomologia Experimentalis et Applicata 124, 279–284, https://doi.org/10.1111/j.1570-7458.2007.00578.x (2007).
Yasumatsu, K. A new Dryocosmus injurious to chestnut trees in Japan (Hymenoptera, Cynipidae). Mushi 22, 89–92 (1951).
Kato, K. & Hijii, N. Effects of gall formation by Dryocosmus kuriphilus Yasumatsu (Hym., Cynipidae) on the growth of chestnut trees. Journal of Applied Entomology 121, 9–15, https://doi.org/10.1111/j.1439-0418.1997.tb01363.x (1997).
Bosio, G., Gerbaudo, C. & Piazza, E. Dryocosmus kuriphilus Yasumatsu: an outline seven years after the first report in Piedmont (Italy). Acta Horticulturae 866, 341–348, https://doi.org/10.17660/ActaHortic.2010.866.43 (2010).
Aebi, A. et al. Native and introduced parasitoids attacking the invasive chestnut gall wasp Dryocosmus kuriphilus. Bulletin OEPP/EPPO Bulletin 37, 166–171, https://doi.org/10.1111/j.1365-2338.2007.01099.x (2007).
Abe, Y. & Miuyra, K. Doses Wolbachia induce unisexuality in oak gall wasps? (Hymenoptera: Cynipidae). Annals of the Entomological Society of America 95, 583–586, https://doi.org/10.1603/0013-8746(2002)095[0583:DWIUIO]2.0.CO;2 (2002).
Quacchia, A., Moryia, S., Bosio, G., Scapin, I. & Alma, A. Rearing, release and the prospect of establishment of Torymus sinensis, biological control agent of the chestnut gall wasp Dryocosmus kuriphilus, in Italy. Biological Control 53, 829–839, https://doi.org/10.1007/s10526-007-9139-4 (2008).
Otake, A. Chestnut gall wasp, Dryocosmus kuriphilus Yasumatsu (Hymenoptera: Cynipidae): analyses of records on cell content inside galls and on emergence of wasp and parasitoids outside galls. Applied Entomology and Zoology 24, 193–201, https://doi.org/10.1303/aez.24.193 (1989).
Avtzis, D. & Matošević, D. Taking Europe by storm: a first insight in the introduction and expansion of Dryocosmus kuriphilus in central Europe by mtDNA. Sumarski List 8, 387–394 (2013).
Graziosi, I. & Rieske, L. K. Potential fecundity of a highly invasive gall maker Dryocosmus kuriphilus (Hymenoptera, Cynipidae). Environmental Entomology 43, 1053–105, https://doi.org/10.1603/EN14047 (2014).
Conedera, M. & Gehring, E. Danni da cinipide e miele di castagno. L’ape Rivista Svizzera di Apicoltura 98, 6–8 (2015).
Battisti, A., Benvegnù, I., Colombari, F. & Haack, R. A. Invasion by the chestnut gall wasp in Italy causes significant yield loss in Castanea sativa nut production. Agricultural and Forest Entomology 16, 75–79, https://doi.org/10.1111/afe.12036 (2014).
Sartor, C. et al. Impact of the Asian wasp Dryocosmus kuriphilus (Yasumatsu) on cultivated chestnut: Yield loss and cultivar susceptibility. Scientia Horticulturae 197, 454–460, https://doi.org/10.1016/j.scienta.2015.10.004 (2015).
Marcais, G. & Kingsford, C. A fast, lock-free approach for efcient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770, https://doi.org/10.1093/bioinformatics/btr011 (2011).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nature methods 17, 155–158, https://doi.org/10.1038/s41592-019-0669-3 (2020).
Hu, J. et al. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255, https://doi.org/10.1093/bioinformatics/btz891 (2020).
Zhang, X. T., Zhang, S. C., Zhao, Q., Ming, R. & Tang, H. B. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nature Plants 5, 833–845, https://doi.org/10.1038/s41477-019-0487-8 (2019).
Robinson, J. T. et al. Juicebox. js provides a cloud-based visualization system for Hi-C data. Cell Systems 6, 256–258, https://doi.org/10.1016/j.cels.2018.01.001 (2018).
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, 215–225, https://doi.org/10.1093/bioinformatics/btg1080 (2003).
She, R. et al. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Research 19, 143–149, https://doi.org/10.1101/gr.082081.108 (2009).
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Research 14, 988–995, https://doi.org/10.1101/gr.1865504 (2004).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 7, 562–578, https://doi.org/10.1038/nprot.2012.016 (2012).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, 1–22, https://doi.org/10.1186/gb-2008-9-1-r7 (2008).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402, https://doi.org/10.1093/nar/25.17.3389 (1997).
NCBI Broproject https://identifers.org/ncbi/bioproject:PRJNA1092378 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28467127 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520607 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28679635 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520759 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520760 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520761 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520762 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520763 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520764 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520765 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28520766 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR28508811 (2024).
GenBank. https://identifiers.org/ncbi/insdc:JBBWUJ000000000 (2023).
Ren, Y. S. Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus). figshare. https://doi.org/10.6084/m9.figshare.25800868.v1 (2024).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workfows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Huang, N. & Li, H. compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics 39, btad595, https://doi.org/10.1093/bioinformatics/btad595 (2023).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2018YFE0127100 and 2016YFE0128200). Shenzhen Science and Technology Program (KQTD20180411143628272). The Agricultural Science and Technology Innovation Program.
Author information
Authors and Affiliations
Contributions
D.H.Z. and B.L. conceived the idea. B.L. and Y.S.R. analyzed the data and drafted the first version of the manuscript. C.Y.S. and Y.Z. prepared the materials. X.D.W. analyzed gene structure and completeness of the genome. D.H.Z. and B.L. supervised the project. All authors have read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, B., Ren, Ys., Su, Cy. et al. Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus). Sci Data 11, 963 (2024). https://doi.org/10.1038/s41597-024-03827-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03827-7
- Springer Nature Limited