Background & Summary

The Nilaparvata muiri (Hemiptera: Delphacidae) is a sibling species of the brown planthopper, a destructive rice insect pest, Nilaparvata lugens (Fig. 1), and is primarily found in China, Japan, South Korea, and Vietnam. N. muiri and N. lugens share a high degree of morphological similarity, have a broad overlapping distribution, and are active during similar periods (April to October)1. In the early 1980s, N. muiri was incorrectly identified as N. lugens, leading to an academic debate: Does the initial generation of N. lugens in China stem from local populations or from southeast Asia migration? Supporters for local populations cited evidence that N. lugens has been observed to overwinter in Leersia hexandra in China2. Subsequent research clarified that it was N. muiri, not N. lugens, that overwinters3. N. lugens cannot survive the cold conditions and does not overwinter in the colder parts of North Asia, such as Japan, Korea, and northern China4.

Fig. 1
figure 1

N. muiri and N. lugens. N. muiri and N. lugens are very similar morphologically. (a) N. muiri long-winged (macropterous) male adult. (b) N. lugens long-winged (macropterous) male adult63.

Despite high morphological similarity, they exhibit significantly different feeding behavior. N. muiri can only feed on gramineous weeds such as Leersia sayanuka and L. hexandra, not on rice, quite the opposite of N. lugens, which can only feed on rice5. In addition to conducting field surveys, black light trapping serves as another essential technique for monitoring N. lugens populations6. A recent report in Luhe, Nanjing shows that among the Nilaparvata specimens captured using black light traps, N. muiri accounting for up to 84.0%, 71.6%, 84.5% of the total in the years 2019, 2020, and 2021, respectively7. Traditional species-level identification of rice planthoppers, based on morphological characteristics, is expertise-dependent, labor-intensive, and time-consuming, occasionally leading to misidentifications. Large populations of N. muiri can profoundly affect the accurate monitoring and forecasting of the target insect pest, N. lugens8.

In this study, we assembled the first high quality chromosome-level genome of N. muiri using a combination of PacBio HiFi sequencing, Illumina short-read sequencing, and Hi-C sequencing technologies. The genome assembly (525 Mb) is anchored to 15 pseudochromosomes (Fig. 2), with a scaffold N50 of 43.32 Mb (Fig. 3). This genome assembly provides a valuable resource for N. muiri and N. lugens studies on feeding behavior, overwintering strategies, and mechanisms of growth and development. Unveiling the genomic differences between the two species could help the development of more effective biological control methods and more eco-friendly pesticides.

Fig. 2
figure 2

Hi-C contact map of N. muiri. Genome-wide all-by-all Hi-C interaction identified 15 pseudo-chromosome linkage groups of N. muiri genome.

Fig. 3
figure 3

Circos plot of distribution of the genomic elements in N. muiri. The tracks indicate (a) length of the chromosome, (b) distribution of transposable element (TE) density, (c) gene density ranges, and (d) GC density. The densities of TEs, genes, and GC were calculated in 100 kb windows.

Methods

Sample collection and sequencing

N. muiri adults were provided by the College of Life Sciences, Zhejiang Normal University, and reared on their host L. hexandra in climate chambers at 25 °C (±2 °C) under a photoperiod of 16:8 hr (light: dark) with 70% RH (relative humidity). One male and one female were selected for sibling inbreeding for 15 generations. Insects were collected with aspirator and stored in liquid nitrogen. To reduce possible contamination, all adult insects used for genome sequencing had their midguts removed before stored in liquid nitrogen.

For PacBio sequencing, DNA was extracted from about 50 mixed-sex adults individuals. A single-end 20 kb libraries were constructed with SMRTbell Express Template Prep Kit 2.0. 65.77 Gb (126 × coverage) of PacBio HiFi reads were generated from two cells sequenced on the PacBio Sequel IIe platform (Table 1), with a mean read length of 19.59 kb (N50 = 19.61 kb). For Hi-C sequencing, 50 mixed-sex adults individual was used as inputs following previously described standard protocols9 and sequenced on an Illumina NovoSeq6000 platform. 126.46 Gb of paired-end 150 bp Hi-C reads (241 × coverage) were generated (Table 1). For Illumina short-read sequencing, genomic DNA used was extracted from 30 female adults and 30 male adults, respectively. Paired-end libraries with an insert size of 350 bp was constructed by NEBNext Ultra DNA Library Prep Kit (NEB, USA) and sequenced on an Illumina NovoSeq6000 platform. A total 38.50 Gb (73 × coverage) of paired-end 150 bp reads were generated to estimate genome size and identify sex chromosome (Table 1). All the DNA extraction, library construction and sequencing procedures were performed by the Novogene Company (Tianjin, China) according to the manufacturer’s protocols.

Table 1 Statistics of sequencing read data.

Total RNA was isolated from N. muiri at different developmental stages (including eggs, 400 1st-2nd instar nymphs, 150 3rd-5th instar nymphs, 50 mixed-sex adults), using the TRIzol reagent (Thermo Fisher Scientific, USA). Paired-end libraries was constructed by NEBNext Ultra RNA Library Prep Kit (NEB, USA). and sequenced on an Illumina NovoSeq6000 platform by the Novogene Company (Tianjin, China). A total of 31.95 Gb RNA-seq reads were generated (Table 1).

Genome assembly

Genome size estimation, heterozygosity, and repetitiveness were analyzed using KMC v3.2.210 and GenomeScope 2.011 with 17-mer frequencies, which estimated a genome size of 528 Mb (Fig. 4). The PacBio HiFi reads were used to produce a draft assembly by Hifiasm 0.19.7-r59812,13 with the”–primary” parameter. Purge_Dups v1.2.314 was used to remove redundant heterozygous contigs from Hifiasm primary assembly, with custom cutoff values (-l 17 -m 80 -h 200). Hi-C reads were mapped to the draft assembly by Chromap v0.2.5-r47315 with the Hi-C preset. YaHS v1.2a116 was used to anchor primary contigs to chromosomes with “-e GATC” parameter. Juicebox v2.1.317 was used to manually correct the errors and remove remaining redundancy. Microbial contamination was detected and removed by NCBI FCS-GX v0.4.018 (r2023-01-24 database) with source organism set to N. muiri (--tax-id 706586). All these contaminants are unanchored contigs that had no interaction with N. muiri chromosomes in Hi-C contact map or had no Hi-C signal, and they were also validated by Blobtoolkit v4.2.119 from blast hits, GC% content and sequencing coverage. BUSCO v5.7.120 was used to assess genome assembly with the insecta_odb10 database. The Hi-C heatmap was generated with HiCExplorer v3.7.221. Genome circos plot was generated by TBtools-II Advanced Circos22. In addition, the mitochondrial genome of N. muiri was assembled by MitoHiFi23 pipeline. The summary of N. muiri genome assembly is showed in Table 2.

Fig. 4
figure 4

The estimated characteristics of N. muiri genome based on Illumina short-read data using 17-mers count histogram. Genome size was estimated to be 528.51 Mb, with a heterozygosity rate of 2.56%.

Table 2 Summary of N. muiri genome assembly.

Genome annotation

Repeat masking was conducted using the EarlGrey pipeline v4.0124, which leverages widely-used tools such as RepeatModeler225 and RepeatMasker26, and integrates automated curation and filtering. The Arthropoda library from the Dfam database v3.727 and RepBase 2018102628 are used in pipeline with “-r Arthropoda”. Putative spurious TE annotations < 100 bp were remove by ‘-m yes’ parameter. The result showed that repeat sequences make up 35.83% of the genome (Table 3). Gene prediction was conducted with the BRAKER v3.0.629 pipeline. RNA-seq reads was aligned to the genome assembly using HISAT v2.2.130 and the open reading frame (ORF) was predicted using StringTie v2.2.131 within BRAKER pipeline as transcriptome-based evidence. Protein sequences of N. lugens were downloaded from NCBI RefSeq database32, and protein sequences of Sogatella furcifera, and Laodelphax striatellus were downloaded from InsectBase V233. These proteins were combined with OrthoDB v1134 Arthropoda clades as homology-based evidence. Genes predicted by AUGUSTUS v3.5.035 and GeneMark-ETP36 were combined by TSEBRA37. The predicted gene sets were functionally annotated using DIAMOND38 BLASTP search against NR database, eggNOG-mapper39,40 web service, InterProScan v5.6541,42 and PANNZER243 web service. Infernal v1.1.544 was used to identify 808 tRNAs, 1 sRNA, 443 rRNAs, 143 snRNAs, and 67 miRNAs based on the alignment with the Rfam45 library (Table 4).

Table 3 Repetitive element contents of N. muiri genome.
Table 4 Statistics of non-coding RNAs in N. muiri genome.

Identification of X chromosome

Previous studies have reported different karyotypes in closely related planthoppers46,47,48,49: The karyotypes of N. lugens are 14 + XY for male and 14 + XX for female. The karyotypes of S. furcifera, L. striatellus, and Nilaparvata bakeri are 14 + X0 for male and 14 + XX for female. Thus the karyotypes of N.muiri was inferred to be 14 + X0 or 14 + XY for male, and 14 + XX for female. We used multiple methods to identify X chromosome and determine whether it is 14 + X0 or 14 + XY for male.

Synteny analysis between N. muiri and N. lugens was carried out by using the TBtools-II v2.07750 with the default parameters. Then chromosomes are renamed and reordered based on synteny analysis. Chromosome 1–14 and X of N. muiri showed conserved synteny with the Chromosome 1–14 and X of N. lugens. No scaffold or contig exhibited synteny with N. lugens Y chromosome. The result is displayed by circos51, which supports our assembly (Fig. 5).

Fig. 5
figure 5

Synteny plot between N. muiri and N. lugens. Chromosome 1–14 and X of N. muiri shows conserved synteny with the Chromosome 1–14 and X of N. lugens.

According to the pipeline of identification methods in sex chromosomes52, Log2(female:male sequencing depth ratio) of X, Y, and autosomes are expected to be around 1, −1, 0. For calculating the sequencing depth ratio between female and male insects, BWA-MEM2 v2.2.153 was used to align reads to the chromosome-level assembly with default parameters. SAM files were converted, sorted and built an index to obtain BAM and BAI files by SAMtools v1.1854. Then, BAM and BAI files were used to analyse chromosomes by sliding windows of 500 bp using BEDTools v2.31.055. In each 500 bp window, we normalized reads numbers, then calculated Log2(female:male ratio) value. Chromosome X has a mean value around 1, and other autosomes have mean value around 0 (Fig. 6). Y chromosome is not detected. Besides, we also tried Redkmer56 pipeline, but no Y chromosome could be assembled. From these results X chromosome was identified and the karyotypes of N.muiri was determined to be 14 + X0 for male and 14 + XX for female.

Fig. 6
figure 6

Identification of X chromosome of N. muiri. (a) average Log2(female:male ratio) per chromosome (b) X chromosome has a Log2(female:male ratio) around 1.

Data Records

PacBio, Illumina, Hi-C and transcriptome sequencing raw data for N. muiri have been deposited in the NCBI Sequence Read Archive with accession number SRP49850357. The genome assembly has been deposited at GenBank under the accession number GCA_039955075.158. It has also been deposited in the Genome Warehouse (GWH) in National Genomics Data Center under accession number GWHESPY0000000059. The gene annotation and TE consensus library are available in Figshare60.

Technical Validation

All Illumina raw reads were filtered by fastp61 by default parameters before further analysis. For assessment of genome assembly, Illumina short reads were mapped to the assembly by BWA-MEM2 v2.2.153 and PacBio HiFi reads were mapped by Minimap2 v2.26-r117562, with mapping rates of 95.3% and 99.8%, respectively. The Hi-C heatmap revealed a well-organized interaction pattern at the chromosomal level, which indirectly confirmed the accuracy of the chromosome assembly. 99.1% and 98.5% BUSCO genes (insecta_odb10) were successfully identified in the genome assembly and annotation, respectively, suggesting a remarkably complete assembly of the N. muiri genome.