A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati

Zhong, Yu-Wei; Fan, Yun-Yun; Zuo, Zhang-Qi; Shu, Run-Guo; Liu, Yin-Quan; Luan, Jun-Bo; Li, Fei; Liu, Shu-Sheng

doi:10.1038/s41597-023-02450-2

A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati

Data Descriptor
Open access
Published: 06 September 2023

Volume 10, article number 585, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati

Download PDF

1971 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Hymenoptera is an order accounting for a large proportion of species in Insecta, among which Chalcidoidea contains many parasitoid species of biocontrol significance. Currently, some species genomes in Chalcidoidea have been assembled, but the chromosome-level genomes of Aphelinidae are not yet available. Using Illumina, PacBio HiFi and Hi-C technologies, we assembled a genome assembly of Eretmocerus hayati (Aphelinidae, Hymenoptera), a worldwide biocontrol agent of whiteflies, at the chromosome level. The assembled genome size is 692.1 Mb with a contig N50 of 7.96 Mb. After Hi-C scaffolding, the contigs was assembled onto four chromosomes with a mapping rate of > 98%. The scaffold N50 length is 192.5 Mb, and Benchmarking Universal Single-Copy Orthologues (BUSCO) value is 95.9%. The genome contains 370.8 Mb repeat sequences and total of 24471 protein coding genes. P450 gene families were identified and analyzed. In conclusion, our chromosome-level genome assembly provides valuable support for future research on the evolution of parasitoid wasps and the interaction between hosts and parasitoid wasps.

Chromosome-level genome assembly of Microplitis manilae Ashmead, 1904 (Hymenoptera: Braconidae)

Article Open access 10 May 2023

Chromosome-level genome assembly of the bethylid ectoparasitoid wasp Sclerodermus sp. ‘alternatusi’

Article Open access 02 May 2024

Chromosome-level genome assembly of predatory Arma chinensis

Article Open access 04 September 2024

Background & Summary

Hymenoptera is a mega-diverse insect order with >169,000 described species, and a large proportion of species in this order are parasitoid wasps, such as the species of Chalcidoidea, that play an important role in the biological control of insect pests. Derived from the same ancestor that lived in the Permian or Triassic, parasitoid wasps have evolved numerous branches with unique characteristics to adapt to different living environments¹.

Within Hymenoptera, over 30 species genomes in Chalcidoidea have been assembled hitherto, and some of them were assembled at the chromosome level, including Aphelinus atriplicis (Aphelinidae), Aphelinus certus (Aphelinidae)², Muscidifurax raptorellus (Pteromalidae)³, Nasonia vitripennis (Pteromalidae)^4,5, Pteromalus puparum (Pteromalidae)⁶, Theocolax elegans (Pteromalidae)⁷, Valisia javana (Agaonidae)⁸, Anastatus japonicus (Eupelmidae), Anastatus fulloi (Eupelmidae)⁹, Megastigmus duclouxiana (Megastigmidae) and Megastigmus sabinae (Megastigmidae)¹⁰. Among these, the chromosome-level assemblies of A. atriplicis and A. certus were generated according to a genetic map of hybrids between each other².

Eretmocerus hayati (Aphelinidae, Hymenoptera) is an obligate parasitoid of the global prominent pest whitefly Bemisia tabaci (Aleyrodidae, Hemiptera)^11,12,13. The release of this parasitoid wasp can effectively control the outbreak of whitefly^14,15,16. The parasitoid wasps of the genus Eretmocerus have unique parasitic habits. With a curved tip on their ovipositor, they can lay eggs between the host nymphs and plant leaves instead of directly into host body¹⁷. The hatched first instar larva breaks through the abdomen of host, but still lives outside the host body, and does not burrow into the host body to complete the subsequent development until the host has developed to the last nymphal stage^18,19. The first and second instar larvae are wrapped by the capsule structure produced by the host, and will not fully contact the host tissue until the third instar¹⁸.

Here, we report a chromosome-level genome assembly of E. hayati using combined Illumina, PacBio and Hi-C sequencing technology. The assembly is 692.1 Mb in length with a scaffold N50 of 192.5 Mb and BUSCO completeness 97.4%. 53.58% repeat sequences and 24471 protein coding sequences were identified. We also identified 74 P450 genes and discovered the expression patterns at different development stages and sex. This assembly provides a valuable resource for evolutionary and host-parasitoid interaction studies in parasitoid wasps as well as biological control application of E. hayati.

Methods

Sampling and genome sequencing

A strain of E. hayati was initially imported by the Chinese Academy of Agricultural Sciences in 2008 and had been reared on Gossypium hirsutum plants carrying whitefly nymphs in our laboratory (26 ± 1°C, 14 L:10D, 70 ± 10%RH) over seven years. To collect the newly emerged parasitoid wasps, the pupae of E. hayati were first collected in a petri dish, then a triangle funnel of appropriate size was inversely placed on the petri dish with a 1.5 ml plastic sampling tube on the outlet of the funnel neck (Fig. 1a,b). We anesthetized the emerged parasitoid wasps on ice and collected approximately 3000 newly emerged males for DNA extraction using QIAamp DNA Mini Kit (QIAGEN) (Fig. 1c,d). After extraction, the DNA purity, concentration and integrity were detected using NanoDrop 2000&8000, Qubit fluorescence photometer and Agilent 4200 Bioanalyzer, respectively.

A short-read sequencing library with an insert size of 350 bp was generated using Universal DNA Fragmentase kit V2.0 (Annoroad) and Universal DNA Library Prep Kit V2.0 (Annoroad) with 0.5 ug DNA as input material. Hiseq x ten platform was used to sequence the library and generate 47.65 Gb paired-end clean reads after removing adapters and low-quality reads.

For long-read library, the SMRT bell express template prep kit 2.0 (Pacific Biosciences) was used to construct a SMRT bell library using 8 ug sheared DNA. After size-selection, primer annealing and binding the SMRT bell templates to polymerases, the library was sequenced on the Pacific Bioscience Sequel II platform in Annoroad Gene Technology company (TianJin). Single SMRT cells were processed, generating 481.17 Gb of subreads data.

Estimation of genome size

Flow cytometry was performed as described below to estimate the genome size. The heads of E. hayati were cut off with a dissecting needle after parasitoid wasps were anesthetized on ice. Next, we collected 300 heads as one sample in a 1.5 mL centrifuge tube and ground them with an abrasive rod and 200 μL Galbraith’s nuclear dissociation solution²⁰. Then the fully ground tissue solution was filtered with a 38 μm pore size nylon mesh to obtain the nuclear suspension. Then, the cells were centrifuged at 1000 r/min for 5 min at 4 °C.

Next the precipitated nuclei were resuspended with 400 μL 1 × PBS buffer by lightly scratching at the bottom of the tube. Finally, we added PI solution (final concentration was 50 μg/mL) and RNase A solution (final concentration was 20 μg/mL) to the nuclei, and stained the samples at 4 °C in dark for 5–20 min. Heads of Drosophila melanogaster (genome size is 176.4 Mb) were collected as a reference to calculate the size of the genome.

Flow cytometry results show that the DNA content in the nucleus of the female E. hayati was four times that of the female Drosophila melanogaster, indicating that the estimated genome size was about 756.227 Mb (Fig. 2a, Table 1).

Table 1 Summary of flow cytometry results.

Full size table

We also used k-mer distribution analysis²¹ to estimate the genome size and characteristics of E. hayati. For k-mer analyze, 47.65 Gb clean reads from the short-read library were used with K-mer size of 21. The genome size of E. hayati was estimated to be 694.3 Mb by 21 k-mer analysis, with a heterozygosity of about 0.01% and a repeat sequence ratio of 40.94% (Table 2).

Table 2 The K-mer analysis result of the Eretmocerus hayati genome.

Full size table

Genome assembly

We used HiFiasm v0.15.1 to preliminarily assemble the E. hayati genome, which could resolve near-identical repeats and segmental duplications to generate better haplotype assemblies²². The HiFiasm outputs a primary assembly after performing all-versus-all read overlap alignment and correcting sequencing errors.

Purge Haplotigs software was used to complete genome de-redundancy after initial assembly and error corrected, and the redundant heterozygous contigs were identified and removed according to reads depth distribution and sequence similarity²³.

The hybrid set with 47.65 Gb Illumina short read sequencing (67-fold coverage) and 481.17 Gb PacBio sequencing reads (693-fold coverage) (Table 3) was used to obtain a de novo genome assembly for E. hayati with total length of 763.42 Mb and contig N50 length of 7.96 Mb (Table 4).

Table 3 Summary of sequencing data for genome assembly.

Full size table

Table 4 Summary of assembly results of de novo and after Hi-C scaffolding for Eretmocerus hayati.

Full size table

Hi-C scaffolding

To obtain the genome at the chromosomal level, Hi-C technology was applied²⁴. We commissioned Annoroad (TianJin) to construct a Hi-C sequencing library using more than 1500 adult male wasps. Firstly, DNA containing biotin was captured under the adsorption of avidin magnetic beads and terminal repair of DNA fragments was performed. Then poly A was added to the end of the DNA fragment, the joint was connected, the number of PCR amplification cycles was evaluated, and the final purification library was built. The library was sequenced on the Illumina platform and generated 70 Gb sequencing data (98-fold coverage), which was used to construct the genome at the chromosome level.

We used bowtie 2 (v2.2.3)²⁵ to map the paired-end reads to the preliminary assembly. Then, HiC-Pro (v2.7.8)²⁶ was used to detect the ligation site of unmapped reads, which were mainly composed of the chimeric regions spanning across the ligation junction. The 5’ fractions of unmapped reads were aligned back onto the genome. Next, in order to generate a high quality alignment file, we discarded the low mapping quality, multiple hitting reads and singletons according to the mapping results. To obtain the chromosome-scale scaffolds from the primary contig-level assembly, we used 3d-DNA²⁷ and LACHESIS²⁸ to treat the valid interaction pairs and generate the interaction matrices. According to the results of the karyotype study of Chalcidoidea, the subfamily Aphelininae harbored parasitoids with n = 4–5²⁹. We consulted the literature and the output results of 3D-DNA to assess the number of chromosomes, which appears to be four. To ascertain the precision of the outcomes, we built the interaction graph with Juicer³⁰ and performed manual visual error correction using JuiceBox (v2.13.07) (https://github.com/aidenlab/Juicebox). After that, we cut the pseudochromosomes into equal 100 kb length bins and used Hic-Pro to constructe the heatmap showing the significant linkage clusters on diagonal.

The 244 contigs were divided, anchored, sorted, oriented, and merged into 4 chromosomes using LACHESIS and corrected by JuiceBox (Fig. 2b), with 13 sequences unanchored. The chromosomal heat map showed good collinearity on the diagonal (Fig. 2c), which confirms the high quality of scaffolding. The final genome assembly was 692.1 Mb with a scaffold N50 of 192.5 Mb (Table 4, Fig. 2d).

Genome completeness and quality assessment

We used three methods to assess the completeness and quality of the assembly. First, the genome sequence was split into 1000 bp fragments, and then these fragments were BLASTN against the NCBI nucleotide database (NT library). Second, we used BWA (v0.7.17)³¹ and Minimap2 (v2.24)³² to compare the second and third generation sequencing data with the genome, and calculated the comparison rate and sequence coverage. Third, BUSCO (v5.3.2)³³ was used to estimate the gene space of chromosome-level genome assembly by searching the 1367 BUSCO genes in insecta_odb10.

The best five hits of BLASTN again NCBI nt database were from Nasonia, Gossypium, Eretmocerus, Ceratosolen and Copidosoma (Table 5). 96.09% of the Illumina short reads and 100% of the PacBio HiFi long reads were mapped to the genome assembly successfully. Moreover, we compared the insect_odb10 database using BUSCO. The assessment showed 95.9% of BUSCO genes were successfully detected, of which 94.6% were single copy and 1.3% were duplicated (Table 6). The results of these evaluations indicate that the genome assembly has a high level of completeness and accuracy.

Table 5 Blast search of interrupted contig sequences in NCBI NT database.

Full size table

Table 6 BUSCO assessment of the Eretmocerus hayati genome assembly.

Full size table

In addition, the alignment results of second generation sequencing data were used to identify SNP and INDELS using GATK (v4.4.0.0)³⁴ with hard filtering parameters “–filter-expression “QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0””. VCFtools (0.1.16)³⁵ was used to calculate the density of SNP and INDELS on the assembly, and window size was 50,000 bp.

RNA sequencing and analysis

A total of 17 RNA-seq libraries were constructed, including four different development stages and various tissues of E. hayati. We extracted total RNA from egg (<post-deposition, 2000 individuals, single sample), 1^st instar larva (1-2 d after hatching and 5–6 d after the eggs were first deposited, still attached to the leaf, 1000 individuals, three replicates), later instar larva (7–8 d after hatching and 10–11 d after the eggs were first deposited, mostly 3^rd-4^th instar within the body of whitefly nymphs and sampled by dissecting whitefly nymphs, 100 individuals, three replicates), female adult (<12 h post emergence and without exposure to male wasps, 150 individuals, three replicates), male adult (<12 h post emergence and without exposure to female wasps, 150 individuals, three replicates), female head (anaesthetized and dissected on ice with the method described above, 600 individuals, single sample), and male head (anaesthetized and dissected on ice with the method described above, 600 individuals, single sample). Total RNA was used for library preparations, which were sequenced on an Illumina Novaseq platform. All 150 bp paired-end raw reads were processed and a total of 98.5 Gb (approximately 14 Gb per sample) clean reads data were obtained. Then the clean reads were used to annotate the genome.

Repeat sequence annotation

We identified repeat sequences and transposable elements (TEs) using the methods of de novo assembly and homologous prediction. First, we used RepeatModeler (v2.0.3) (https://github.com/Dfam-consortium/RepeatModeler) to predict the repeat sequence with default parameters. Then RepBase database³⁶ and RepeatMasker (v4.1.0) (https://github.com/rmhubley/RepeatMasker) were used to annotate the sequence homologous.

The results showed that 370.8 Mb are repeat sequences, accounting for 53.58% of the E. hayati genome. Among these repeat sequences, most (32.39%) are unclassified elements, followed by 8% of long terminal repeats (LTRs), 7.27% of DNA elements, 5.63% of long interspersed nuclear elements (LINEs) and only 0.29% of short interspersed nuclear elements (SINEs) (Table 7).

Table 7 Summary of repeat elements annotation for Eretmocerus hayati.

Full size table

Gene prediction and function assignment

We annotated protein coding genes in the E. hayati genome using a pipeline that combines de novo prediction, homology searching and transcriptome evidence³⁷. The de novo prediction was implemented with BRAKER (v2)³⁸. For details, we randomly selected 10000 intact E. hayati genes from the transcriptomes to generate prediction parameters using AUGUSTUS (v3.1)³⁹, GeneMark-ET Suite (v4.21)⁴⁰, and SNAP (v2006-07-28)⁴¹. After that, the newly trained parameters were applied to the repeat sequence masked genome to predict genes following the BRAKER pipeline. For homology searching, the protein sequences of Nasonia vitripennis⁵ and Encarsia Formosa were TBLASTN (v2.8.1) (E-value < 1e-5) against the E. hayati genome⁴². Then, the gene models were defined by aligning the genome sequences to the matched protein using GeneWise (v2.4.1)⁴³. For transcriptome evidence, we aligned the RNA-seq reads to the E. hayati genome by Hisat2 (v2.2.1)⁴⁴ to identify gene information such as candidate exon regions, donor, and acceptor sites. StringTie⁴⁵ was used to assemble the alignments into transcripts. Finally, the above three lines of gene prediction evidence were integrated with EVidenceModeler (EVM)⁴⁶ to generate a consensus gene set. Specifically, the weight for different methods were “4”, “5” and “10” for de novo prediction, homology searching, and transcriptome evidence, respectively.

We used several convention methods to annotate the function of predicted protein coding genes. First, gene ontology (GO) annotation was generated by using both PANNZER2⁴⁷ web server and BLAST2GO (v5)⁴⁸ with default parameters. Second, online service of BLASTKOALA (v2.2)⁴⁹ was used to map the protein sequences of E. hayati to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with database parameter “family_eukaryotes”. InterProScan (v5.57-90.0)⁵⁰ was used to obtain motifs and domains information of protein sequences by searching the default available database (e.g. CDD, Gene3D, PANTHER, Pfam and SMART). To obtain function prediction of protein sequences, we combined the results of eggNOG-mapper (v2)⁵¹ and blastp against SwissProt and Trembl database (E-value < 1e-5).

A total of 24,471 protein coding genes were annotated following the pipeline combined with above-mentioned three methods. In total, 7,803 genes could be assigned with GO terms and 7,159 genes with KEGG ID. In addition, 16808 genes have BLASTN hits in NCBI non-redundant database, 11,556 genes contain Pfam domains (Table 8). Finally, a total number of 23974 genes were annotated by at least one database. Gene density was generated by TBtools (v1.120)⁵².

Table 8 Summary of gene annotation results for Eretmocerus hayati.

Full size table

Cytochrome P450 genes

P450 genes are key genes closely related to insect stress resistance and environmental adaptation. E. hayati shows many unique and interesting features in association with host. For example, its eggs and first instar larvae survive in vitro, and the subsequent stages survive inside the host body. We hope to infer the mechanism of its unique parasitic habits through the identification and expression pattern analysis of P450 genes. To identify the P450 genes in the E. hayati genome, we first downloaded P450 protein sequences from UniPortKB, GeneBank and FlyBase to build a reference database. All predicted protein sequences of E. hayati were used as query to BLASTP against the P450 reference sequences. The Blast hits were retained for further domain analysis using InterProScan. Finally, genes with P450 conservative domain pf00047 were retained and classified according to the BLASTP analysis results.

To reconstruct the phylogeny tree of P450 genes from E. hayati, E. Formosa and N. vitripennis, we used MAFFT (v7.471)⁵³ with the parameter “-auto” to align P450 sequences. The alignment was filtered by TRIMAL (v1.4.rev22)⁵⁴ with the parameter “-automated1”,and IQ-tree (v2.1.2)⁵⁵ with parameters “-m MFP -bb 1000” was used to construct the maximum likelihood (ML) phylogenetic tree with the best model (JTT + F + R8) estimated by ModelFinder⁵⁶. RSEM (v1.3.3)⁵⁷ was used to map the transcriptome clean data to the P450 sequences to estimate expression level of P450 genes in various tissues.

In total, we annotated 74 P450 genes in E. hayati, among which 6 genes belong to CYP2 clan, 42 belong to CYP3 clan, 19 belong to CYP4 clan, and 7 belong to Mito clan (Table 9). Chromosome mapping results show that the P450 genes are distributed on all four chromosomes, with more on chromosome 1, 2 and 4, and fewer on chromosome 3. There are six gene clusters containing three or more P450 genes on four chromosomes (Fig. 3a). The results of the phylogenetic tree show that the P450 genes from E. hayati, E. formosa and N. vitripennis expanded in the CYP3 and CYP4 branches (Fig. 3b).

Table 9 P450 genes identified in Eretmocerus hayati.

Full size table

The expression patterns of P450 genes differ significantly among various tissues. Some CYP3 clan genes are expressed in all stages, but the expression levels are significantly higher in adult male wasps than in other stages (e.g. Ehay097930.1, Ehay252140.1, and Ehay264040.1). Some CYP3 clan genes are expressed only in the adult stage and are expressed at higher levels in male than in females (e.g. Ehay163240.1, Ehay288700.1, and Ehay270900.1). Some P450 genes show relatively higher expression levels in the 1^st instar larvae (e.g. Ehay260430.1 and Ehay170850.1), and others show higher expression levels in later instar larvae (e.g. Ehay202870.1 and Ehay269780.1) (Fig. 3c).

P450 genes are highly related to responses to external stimuli and stress, and whether the different expression levels of P450 in different stages of E. hayati are related to the parasitic behavior of the wasp in vivo and in vitro deserves further study.

Data Records

Illumina, PacBio and Hi-C data for E. hayati genome sequencing have been deposited in the NCBI Sequence Read Archive with accession number SRR24094107, SRR24099399 and SRR24094118 under BioProject accession number PRJNA951478⁵⁸.

Illumina transcriptome data for egg (SRR24094058, SRR24094059, SRR24094060), 1^st instar larva (SRR24094061, SRR24094062, SRR24094063), later instar larva (SRR24094064, SRR24094050, SRR24094051), female adult (SRR24094052, SRR24094053, SRR24094054), male adult (SRR24094055, SRR24094056, SRR24094057), female head (SRR24094065) and male head (SRR24094066) are available under Bioproject PRJNA951478⁵⁸.

This Whole Genome Shotgun project has been deposited at GenBank under the accession JARUXH000000000⁵⁹. The version described in this paper is version JARUXH010000000.

The annotation file is available in figshare^60,61.

Technical Validation

After extraction, the DNA purity, concentration and integrity were detected using NanoDrop 2000&8000, Qubit fluorescence photometer, and Agilent 4200 Bioanalyzer (Agilent Technologies, CA, USA), respectively. RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). High-quality DNA and RNA were used for sequencing.

Code availability

All software and scripts were executed according to user manual, and default parameters were applied if not mentioned in the Methods described above.

References

Peters, R. S. et al. Evolutionary history of the hymenoptera. Curr. Biol. 27, 1013–1018 (2017).
PubMed CAS Google Scholar
Wittmeyer, K. T., Oppenheim, S. J. & Hopper, K. R. Assemblies of the genomes of parasitic wasps using meta-assembly and scaffolding with genetic linkage. G3 (Bethesda, Md.) 12 (2022).
Xiong, X. et al. Long-read assembly and annotation of the parasitoid wasp Muscidifurax raptorellus, a biological control agent for filth flies. Front. Genet. 12, 748135 (2021).
PubMed PubMed Central CAS Google Scholar
Werren, J. H. et al. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 327, 343–348 (2010).
PubMed CAS Google Scholar
Dalla Benetta, E. et al. Genome elimination mediated by gene expression from a selfish chromosome. Sci Adv 6, eaaz9808 (2020).
ADS PubMed PubMed Central Google Scholar
Ye, X. H. et al. A chromosome‐level genome assembly of the parasitoid wasp Pteromalus puparum. Mol. Ecol. Resour. 20, 1384–1402 (2020).
PubMed CAS Google Scholar
Xiao, S. et al. Genome assembly of the ectoparasitoid wasp Theocolax elegans. Sci. Data 10, 159 (2023).
PubMed PubMed Central CAS Google Scholar
Chen, L. et al. A chromosome-level genome assembly of the pollinating fig wasp Valisia javana. DNA Res. 29 (2022).
Ye, X. H. et al. Genomic signatures associated with maintenance of genome stability and venom turnover in two parasitoid wasps. Nat. Commun. 13, 6417 (2022).
ADS PubMed PubMed Central CAS Google Scholar
Kuang, J. G. et al. Chromosome-level de novo genome assembly of two conifer-parasitic wasps, Megastigmus duclouxiana and Megastigmus sabinae, reveals genomic imprints of adaptation to hosts. Mol Ecol Resour 23, 1142–1154 (2023).
PubMed CAS Google Scholar
De Barro, P. J., Liu, S. S., Boykin, L. M. & Dinsdale, A. B. Bemisia tabaci: a statement of species status. Annu. Rev. Entomol. 56, 1–19 (2011).
PubMed Google Scholar
Rose, M. & Zolnerowich, G. Eretmocerus Haldeman (Hymenoptera: Aphelinidae) in the United States, with descriptions of new species attacking Bemisia (tabaci complex) (Homoptera: Aleyrodidae). Proc. Entomol. Soc. Wash. 99, 1–27 (1997).
Google Scholar
Zolnerowich, G. & Rose, M. Eretmocerus Haldeman (Hymenoptera: Aphelinidae) imported and released in the United States for control of Bemisia (tabaci complex)(Homoptera: Aleyrodidae). Proc. Entomol. Soc. Wash. 100, 310–323 (1998).
Google Scholar
Abd-Rabou, S. Biological control of Bemisia tabaci biotype “B” (Homoptera: Aleyrodidae) by introduction, release and establishment of Eretmocerus hayati (Hymenoptera: Aphelinidae). J. Pest Sci. 77, 91–94 (2004).
Google Scholar
De Barro, P. J. & Coombs, M. T. Post-release evaluation of Eretmocerus hayati Zolnerowich and Rose in Australia. Bull. Entomol. Res. 99, 193–206 (2009).
PubMed Google Scholar
Liu, T. X., Stansly, P. A. & Gerling, D. Whitefly parasitoids: distribution, life history, bionomics, and utilization. Annu. Rev. Entomol. 60, 273–292 (2015).
PubMed CAS Google Scholar
Gerling, D., Quicke, D. L. J. & Orion, T. Oviposition mechanisms in the whitefly parasitoids Encarsia transvena and Eretmocerus mundus. Biocontrol 43, 289–297 (1998).
Google Scholar
Gerling, D. & Blackburn, M. B. Immature development of Eretmocerus mundus (Hymenoptera: Aphelinidae). Arthropod. Struct. Dev. 42, 309–314 (2013).
PubMed Google Scholar
Gerling, D., Orion, T. & Delarea, Y. Eretmocerus penetration and immature development: a novel-approach to overcome host immunity. Arch. Insect Biochem. Physiol. 13, 247–253 (1990).
Google Scholar
Galbraith, D. W. et al. Rapid flow cytometric analysis of the cell cycle in intact plant tissues. Science 220, 1049–1051 (1983).
ADS PubMed CAS Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
PubMed PubMed Central CAS Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Meth. 18, 170–175 (2021).
CAS Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
PubMed PubMed Central CAS Google Scholar
van Berkum, N. L. et al. Hi-C: a method to study the three-dimensional architecture of genomes. J. Vis. Exp. (2010).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Meth. 9, 357–359 (2012).
CAS Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
ADS PubMed PubMed Central CAS Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
PubMed PubMed Central CAS Google Scholar
Gokhman, V. E. Chromosomes of parasitic wasps of the superfamily Chalcidoidea (Hymenoptera): An overview. Comparative Cytogenetics 14, 399–416 (2020).
PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3, 95–98 (2016).
PubMed PubMed Central CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
PubMed PubMed Central CAS Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
PubMed PubMed Central CAS Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO Update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of Eukaryotic, Prokaryotic, and Viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
PubMed PubMed Central CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
PubMed PubMed Central CAS Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
PubMed PubMed Central CAS Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11 (2015).
PubMed PubMed Central Google Scholar
Mei, Y. et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 50, D1040–D1045 (2022).
ADS PubMed CAS Google Scholar
Bruna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics 3, lqaa108 (2021).
PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–439 (2006).
PubMed PubMed Central CAS Google Scholar
Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42, e119 (2014).
PubMed PubMed Central Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
PubMed PubMed Central Google Scholar
Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).
PubMed PubMed Central CAS Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Meth. 12, 357–360 (2015).
CAS Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
PubMed PubMed Central Google Scholar
Toronen, P., Medlar, A. & Holm, L. PANNZER2: a rapid functional annotation web server. Nucleic Acids Res. 46, W84–W88 (2018).
PubMed PubMed Central CAS Google Scholar
Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).
PubMed CAS Google Scholar
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
PubMed CAS Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
PubMed PubMed Central CAS Google Scholar
Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
PubMed PubMed Central CAS Google Scholar
Chen, C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 13, 1194–1202 (2020).
PubMed CAS Google Scholar
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
PubMed PubMed Central CAS Google Scholar
Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
PubMed PubMed Central CAS Google Scholar
Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
PubMed CAS Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Meth. 14, 587–589 (2017).
CAS Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
PubMed PubMed Central CAS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP431306 (2023).
Zhong, Y. W., Liu, S. S. & Liu, Y. Q. Eretmocerus hayati isolate ZJU_SS_LIU_2023, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JARUXH000000000 (2023).
Zhong, Y. W. Annotation files of Eretmocerus hayati. figshare https://doi.org/10.6084/m9.figshare.22709485 (2023).
Zhong, Y. W. Repeat sequence analyze result of Eretmocerus hayati. figshare https://doi.org/10.6084/m9.figshare.22709494 (2023).

Download references

Acknowledgements

This work was supported by the Key R&D Program of Zhejiang Province (Grant No. 2021C02045) and the earmarked fund for China Agriculture Research System (CARS-23-C05).

Author information

Authors and Affiliations

Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Institute of Insect Sciences, Zhejiang University, 310058, Hangzhou, China
Yu-Wei Zhong, Yun-Yun Fan, Zhang-Qi Zuo, Run-Guo Shu, Yin-Quan Liu, Fei Li & Shu-Sheng Liu
College of Plant Protection, Shenyang Agricultural University, 110866, Shenyang, China
Jun-Bo Luan

Authors

Yu-Wei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Yun Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zhang-Qi Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Run-Guo Shu
View author publications
You can also search for this author in PubMed Google Scholar
Yin-Quan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Bo Luan
View author publications
You can also search for this author in PubMed Google Scholar
Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Sheng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.S.L. supervised the project. F.L., S.S.L. and Y.Q.L. planned and coordinated the project. Y.W.Z. and Y.Y.F. prepared the samples for PacBio, Illumina and Hi-C sequencing as well as tissue samples for transcriptome sequencing. Y.W.Z. and R.G.S. prepared and conducted flow cytometry experiments. Z.Q.Z. and Y.W.Z. performed the genome annotation. J.B.L. provided the annotation of Encarsia formosa. Y.W. Z performed P450 gene analysis. Y.W.Z., Y.Y.F., Y.Q.L., F.L. and S.S.L. wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shu-Sheng Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhong, YW., Fan, YY., Zuo, ZQ. et al. A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati. Sci Data 10, 585 (2023). https://doi.org/10.1038/s41597-023-02450-2

Download citation

Received: 01 May 2023
Accepted: 07 August 2023
Published: 06 September 2023
DOI: https://doi.org/10.1038/s41597-023-02450-2
Springer Nature Limited

This article is cited by

An expanded odorant-binding protein mediates host cue detection in the parasitic wasp Baryscapus dioryctriae basis of the chromosome-level genome assembly analysis
- Xiaoyan Zhu
- Yi Yang
- Bingzhong Ren
BMC Biology (2024)

A chromosome-level genome assembly of the parasitoid wasp Eretmocerus hayati

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of Microplitis manilae Ashmead, 1904 (Hymenoptera: Braconidae)

Chromosome-level genome assembly of the bethylid ectoparasitoid wasp Sclerodermus sp. ‘alternatusi’

Chromosome-level genome assembly of predatory Arma chinensis

Background & Summary