Abstract
Chironomids are one of the most abundant aquatic insects and are widely distributed in various biological communities. However, the lack of high-quality genomes has hindered our ability to study the evolution and ecology of this group. Here, we used Nanopore long reads and Hi-C data to produce two chromosome-level genomes from mixed genomic data. The genomes of Smittia aterrima (SateA) and Smittia pratorum (SateB) were assembled into three chromosomes, with sizes of 78.45 Mb and 71.56 Mb, scaffold N50 lengths of 25.73 and 23.53 Mb, and BUSCO completeness of 98.5% and 97.8% (n = 1,367), 5.68 Mb (7.24%) and 1.94 Mb (2.72%) of repetitive elements, and predicted 12,330 (97.70% BUSCO completeness) and 11,250 (97.40%) protein-coding genes, respectively. These high-quality genomes will serve as valuable resources for comprehending the evolution and environmental adaptation of chironomids.
Similar content being viewed by others
Background & Summary
The non-biting chironomid midges (Diptera: Chironomidae) have developed unique adaptations enabling them to endure and prosper in severe abiotic conditions, such as extreme temperatures, oxygen deprivation, pollution, high salinity, and varying pH levels, as well as complete desiccation (Henriques-Oliveira et al., 2003; Osbourne et al., 2000; Vos et al., 2002; Pinder, 1986). As remarkably resilient aquatic organisms, research has indicated that chironomid survival in diverse water habitats necessitates specific detoxification enzymes1,2,3,4.
To gain a better understanding of the fundamental mechanisms of tolerance, researchers have increasingly focused on the genomes of organisms capable of thriving in extreme environments. Three notable examples include the chironomid midges Polypedilum vanderplanki, which can survive near complete water loss in its larval form, Belgica antarctica, which exhibits remarkable freeze tolerance, and Propsilocerus akamusi, a species recognized for its ability to tolerate pollution5,6,7. Such genomes had shaped by their environment and the adaptations they undergo to survive and reproduce. The Orthocladiinae, which is the largest subfamily within Chironomidae, is characterized by a remarkable diversity of species that have evolved varied ecological and physiological adaptations to their respective environments. Among the Orthocladiinae, the genus Smittia, encompassing species such as S. aterrima and S. pratorum, was notable for its thriving in littoral environments owing to their unique terrestrial/amphibious way of life, as the majority of chironomid larvae are aquatic8,9,10. Besides, research on Smittia sp. has been instrumental in advancing our understanding of parthenogenesis, polytene chromosomes, embryonic development, and nucleolar RNA synthesis, demonstrating significant scientific importance11,12,13,14,15,16.
With the multiple purposes of generating genomic resources to investigate chironomid genome evolution, chromosomal composition, and tolerant specialization, we utilized Oxford Nanopore Technologies (ONT) and high-throughput chromosome conformation capture (Hi-C) sequencing to produce two chromosome-level reference genomes for S. aterrima and S. pratorum. We performed genome assembly, and annotation, and conducted evolutionary analyses. Our work generates a valuable chromosome-level genomic resource for chironomids, establishing a foundation for future research into the environmental tolerance mechanisms of these insects.
Methods
Sample collection and sequencing
Live male adult specimens of S. aterrima and S. pratorum were collected at Baitan Lake (Huanggang City, Hubei Province, China, 30.463250°N, 114.942184°E). After sample collection, the whole bodies of samples were immediately immersed into liquid nitrogen and stored at −80 °C. There were 300 male adults (about 200 male adults of S. aterrima and 100 male adults of S. pratorum were mixed together) used for genome sequencing.
DNA was extracted using the 1D DNA Ligation Sequencing kit SQK-LSK109. RNA was extracted with the TRIzol™ Reagent kit. After the determination of the DNA quality and quantity, a paired-end sequencing library (350 bp in length) was constructed and sequenced using the Beijing Genomics Institute (BGI), and the library construction was completed by Berry Genomic Corporation (Beijing, China). In addition, a Single Molecule Real-Time DNA library was prepared for sequencing using SQK-LSK109 Kit with an insert size of 30 kb. The Oxford Nanopore third-generation sequencing was completed by BenaGen Corporation in Wuhan, China. The RNA library was constructed with Illumina TruSeq RNA v2 Kit according to the manufacturer’s instructions, and the three-generation full-length (ONT) RNA was extracted by (DP441) RNA prep Pure Plant Plus Kit to construct an ONT PromethION library, was completed by Berry Genomic Corporation in Beijing, China. Hi-C libraries were constructed according to the improved Hi-C procedures17. including cross-linking of formaldehyde, restriction enzyme digestion, ends repair of fragments, DNA cyclization, DNA purification, and other steps with MboI as the restriction enzyme. Finally, we obtained 93.95 Gb of sequencing data, comprising 28.11 Gb of Illumina reads, 21.16 Gb of Nanopore reads, 13.40 Gb of Hi-C data, and 31.28 Gb of RNA data, which consisted of 21.78 Gb of Illumina sequencing and 9.50 Gb of ONT sequencing. The mean/N50 lengths of the Nanopore and ONT reads were 6.01/21.65 kb and 0.99/1.41 kb, respectively (Table 1). The 28.11 Gb Illumina DNA data was retained after the quality control process and then used for the genome survey. The k-mer (k = 21) analysis demonstrated that the genomes with a low heterozygous ranging from 0.70%‐0.85% (Fig. 1a, Table 2), and the estimated size was about 83.52‐84.51 Mb.
Genome size estimation and assembly
Quality control of the BGI data was carried out by BBTools v38.8218: “clumpify.sh” is used to remove repeats; “bbduk.sh” is used for specific quality control, i. e. removing sites with a base mass score below 20 (>Q20), filtering sequences less than 15 bp, removing poly-A/G/C ends over 10 bp, and correcting bases using the overlap region (overlapping reads). The k-mer frequencies were assessed using “khist.sh” (BBTools) with a length set to 21 k-mer. The k-mer analysis was then performed using GenomeScope v2.019, with a maximum k-mer coverage of 1,000 (“-m 1000”).
For genomic contig assembly, the ONT raw data were error-corrected by NextDenovo v2.5.0 (https://github.com/Nextomics/NextDenovo), filtered for contaminated sequences using Kraken v2.1.220, and then assembled using NextDenovo software with parameters read_cutoff = 1k. Sequences below 1 kb in the raw data were filtered. One round of long sequence correction using Inspector v1.221 and two rounds of short sequence correction with NextPolish v1.3.022 to obtain the corrected genome sequence and further improve the assembly accuracy (Table 3). Minimap2 v.2.1723 was employed as the read Fundamental mapper during long and short-read polishing stages.
In order to obtain clean data, the adapter sequences of raw reads were trimmed and low-quality reads were removed using Juicer v1.6.224. Subsequently, the clean reads were mapped to the draft genome into the chromosome using 3D-DNA. Juicebox v1.11.0824 was used to correct possible errors (such as misjoins, translocations, and inversions) in the candidate assembly by visualizing Hi-C heatmaps. Judging from the Hi-C heatmap, information on both species (S. aterrima and S. pratorum) was obtained simultaneously (Table 4, Fig. 1b). Possible contaminants were detected using MMseqs. 2 v1125, which performed BLASTN-like searches based on the NCBI nucleotide (nt) and UniVec databases. The completeness of the genome was evaluated using BUSCO v3.0.226 with insecta_odb10 dataset (n = 1,367 single-copy orthologues) and BUSCO v5.4.427 with diptera_odb10 dataset (n = 3,285 single-copy orthologues). To calculate the mapping rate, we mapped ONT long reads and BGI short reads to the assembly using Minimap2. We then calculated the mapping rate using SAMtools v.1.1028 with the ‘flagstat’ parameter. Finally, the genomes of S. aterrima and S. pratorum were assembled into three chromosomes with sizes of 78.45 Mb and 71.56 Mb, the scaffold N50 lengths evaluated with insecta_odb10 dataset were 25.73 Mb and 23.53 Mb, while the GC content was 36.93% and 41.72%, respectively. (Table 5). The results evaluated with diptera_odb10 dataset in Table 6.
Genome annotation
Genomes are often annotated with repeat sequences, protein-coding genes, and non-coding RNA.
We used the software RepeatModeler v2.0.2a29 with an LTR discovery pipeline (-LTRStruct) to construct a repeat DNA library. Then the Dfam 3.330 and RepBase-2018102631 databases were merged into a custom library, and finally the software RepeatMasker v4.1.2p132 with the default commands was used to predict the repeat sequence according to the custom library. The genomes of S. aterrima and S. pratorum produced a total of 27,699 repeats (5.68 Mb) and 15,775 repeats (1.94 Mb), respectively, resulting in a repeat sequence ratio of 7.24% and 2.72%. The five most prevalent classes of repeat sequences were unknown (4.40% and 1.25%), LTR elements (1.13% and 0.62%), DNA elements (0.68% and 0.23%), Simple repeats (0.44% and 0.20%), and LINEs (0.30% and 0.19%). Statistical results are shown in Tables S1, S2.
The protein-coding genes were annotated by integrating the evidence of ab initio, transcriptome-based prediction, and homology-based annotations. The protein coding gene structures were predicted using MAKER v3.01.03 with the default commands33. For the predictions of ab initio, BRAKER v2.1.634 and GeMoMa v1.835 were used to integrate the transcriptomic and protein evidence and to integrate the predicted results of both as the input file for MAKER ab initio (ab. gff3). The transcriptome was aligned with the RNA-seq data to the genome by HISAT2 v2.2.036 to generate BAM files. Augustus v3.3.437 and GeneMark-ES/ET/EP 4.68_3.60_lic38 were automatically trained by BRAKER39, and integrate arthropod protein sequences (OrthoDB10 v1 database40) to improve the prediction accuracy. We used RNA-seq alignments produced from HISAT2 to perform genome-guided assembly by StringTie v2.1.641. For the homology-based approach, GeMoMa with GeMoMa. c = 0.4 GeMoMa. p = 10 parameter was used to perform the annotation of protein-coding based on the annotation of genes of Anopheles arabiensis (GCF_016920715.1), Bradysia coprophila (GCF_014529535.1), Culex quinquefasciatus (GCF_015732765.1), Drosophila melanogaster (GCF_000001215.4), and Hermetia illucens (GCF_905115235.1) from GenBank. Finally, we predicted a total of 12,330 and 11,250 protein-coding genes in S. aterrima and S. pratorum, respectively. These genes had an average of 5.2/5.2 exons per gene, with an average exon length of 429.3/414.5 bp, and an average of 4.1/4.1 introns per gene, with an average intron length of 416.7/414.5 bp. Further, each gene contained 4.9/5.0 CDS, with an average CDS length of 344.4/342.8 bp. BUSCO completeness of the protein sequences was 97.7%/97.4% (n = 1,367), including 75.4%/74.4% single-copy, 22.3%/23.0% duplicated, 0.1%/0.4% fragmented, and 2.2%/2.2% missing BUSCOs, suggesting high-quality predictions.
Non-coding RNAs including transfer RNAs (tRNAs), microRNAs (miRNAs), ribosome RNAs (rRNAs), and small nuclear RNAs (snRNAs) were also identified. The rRNAs, snRNAs, and miRNAs were detected from the Rfam database (release 13.0)42 using Infernal v1.1.443. The tRNAs were predicted using tRNAscan-SE v2.0.944 with the script “EukHighConfidenceFilter”. The rRNAs and subunits were predicted using RNAmmer v1.245. We identified a total of 273 and 273 noncoding RNA sequences were annotated for S. aterrima and S. pratorum. These included 35 and 34 microRNAs (miRNAs), 26 and 42 ribosomal RNAs (rRNAs), 26 and 20 small nucleolar RNAs (snRNAs), 137 and 123 tRNAs, and 45 and 50 other RNA sequences, respectively. The snRNAs identified included 15 and 9 spliceosomal RNAs (U1, U2, U4, U5, U6), 10 C/D box snoRNAs, and 1 HACA-box snoRNA in each species, respectively (Tables S3, S4).
Two strategies were used for the annotation of gene functions. We conducted the gene functional annotation search against the UniProtKB (SwissProt + TrEMBL)46 and the nonredundant protein sequence database (NR) using the sensitive mode of Diamond v2.0.11.14947 in sensitive mode with the parameters “–very-sensitive -e 1e-5”. We further employed eggNOG-mapper v2.1.548 and InterProScan 5.53‐87.049 to assign Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway annotations and to identify protein domains. Four databases including protein families (Pfam)50, SMART51, Superfamily52, CDD53 were searched by InterProScan. The results predicted by the above tools were integrated to obtain the final prediction of gene functions. For S. aterrima and S. pratorum, a high percentage of annotated genes matched the UniProtKB database, with 11,740 (95.21%) and 10,835 (96.31%) genes respectively. The InterProScan database identified protein domains in 9,419/8,811 protein-coding genes, while 10,135/9,447 GO and 4,746/4,491 KEGG were identified by InterProScan and eggNOG-mapper. Furthermore, 7140/6695 genes were annotated as GO terms, 7673/7202 as KEGG ko terms, 2749/2614 as Enzyme Codes, 4746/4491 as KEGG pathways, 9419/8811 as Reactome pathways, and 10762/10047 as COG functional categories (Table 7).
Data Records
All raw sequencing data and the genome assembly of S. aterrima and S. pratorum underlying this article are available at the NCBI and can be accessed with Bioproject ID PRJNA809421. The Nanopore, Illumina, Hi-C, and transcriptome data can be found under identifcation numbers SRR23797681-SRR2379768554,55,56,57,58. The assembled genome has been deposited in the NCBI assembly with the accession number GCA_033063855.159 and GCA_033064975.160. All annotations data for repeated sequences, gene structure, and functional prediction are available for download through Figshare61 (https://doi.org/10.6084/m9.figshare.22762118).
Technical Validation
Two genome assembly methods, NextDenovo, and 3D-DNA, were executed and subsequently compared (Table 3). The NextDenovo assembly genome was slightly larger than predicted at 151.48 Mb and contained 78 primary contigs, with an N50 contig length of 3.39 Mb, the longest contig of 9.63 Mb, and 98.40% of BUSCO genes. The average GC content was 39.16%. After haplotig purging and genome polishing, the genome length was 151.31 Mb, contained 78 primary contigs with an N50 contig length of 3.38 Mb, the longest contig of 9.62 Mb, and 99.10% of BUSCO genes. The 3D-DNA assembly showed a remarkable improvement, yielding 89 scaffolds with a scaffold N50 length of 25.73 Mb, including the longest scaffold of 29.79 Mb. Additionally, the BUSCO completeness percentage achieved in the 3D-DNA version was 99.2%, with negligible levels of fragmentation and missing genes at 0.00% and 0.80% respectively. After utilizing Hi-C long-range scaffolding to enhance our assembly, we securely anchored the assembled scaffolds into six chromosomes that can be categorized into two species; S. aterrima (SateA) and S. pratorum (SateB) (Fig. 1b). Subsequently, we imported the 3D-DNA assembly to obtain the final results of the chromosome assembly.
The final genome assembly showed a BUSCO completeness of 98.5% and 97.8% (n = 1,367), and duplicated BUSCOs were 1.3% and 0.9%, respectively. Additionally, the high mapping ratios of both BGI and ONT data were 93.38% and 92.35%, respectively. These indicators collectively demonstrate that the assembly has achieved a remarkable degree of continuity and integrity (Table 5).
Code availability
The versions of software used were included in the methods section. No specific script was utilized for this work. All commands and pipelines employed in data processing were executed in accordance with the manual and protocols of the corresponding bioinformatic software.
References
Andersen, T., Baranov, V. & Hagenlund, L. K. Blind Flight? A New Troglobiotic Orthoclad (Diptera, Chironomidae) from the Lukina Jama‐Trojama Cave in Croatia. PloS One. 11, e0152884 (2016).
Londoño, D. K., Siegfried, B. D. & Lydy, M. J. Atrazine induction of a family 4 cytochrome P450 gene in Chironomus tentans (Diptera: Chironomidae). Chemosphere. 56, 701–706 (2004).
Londoño, D. K. et al. Cloning and expression of an atrazine inducible cytochrome P450, CYP4G33, from Chironomus tentans (Diptera: Chironomidae). Pestic. Biochem. Physiol. 89, 104–110 (2007).
Sun, Z., Liu, Y., Xu, H. & Yan, C. Genome-Wide Identification of P450 Genes in Chironomid Propsilocerus akamusi Reveals Candidate Genes Involved in Gut Microbiota-Mediated Detoxification of Chlorpyrifos. Insects. 13, 765 (2022).
Gusev, O. et al. Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge. Nat. Commun. 5, 4784 (2014).
Shaikhutdinov, N. & Gusev, O. Chironomid midges (Diptera) provide insights into genome evolution in extreme environments. Curr Opin Insect Sci. 49, 101–107 (2022).
Sun, X. et al. A chromosome level genome assembly of Propsilocerus akamusi to understand its response to heavy metal exposure. Mol. Ecol. Resour. 21, 1996–2012 (2021).
Cranston, P. S., Oliver, D. R. & Sæther, O. A. in Chironomidae of Holarctic region. Keys and diagnoses (ed. Wiederholm, T.) Part 1. Larvae. (Ent. Scand. Suppl. 19, 1983).
Delettre, Y. R. Short-range spatial patterning of terrestrial Chironomidae (Insecta: Diptera) and farmland heterogeneity. Pedobiologia. 49, 15–27 (2005).
Frouz, J. The effect of vegetation patterns on oviposition habitat preference: A driving mechanism in terrestrial chironomid (Diptera: Chironomidae) succession? Res Popul Ecol. 39, 207–213 (1997).
Brown, P. M. & Kalthoff, K. Inhibition by ultraviolet light of pole cell formation in Smittia sp (Chironomidae, Diptera): Action spectrum and photoreversibility. Dev. Biol. 97, 113–122 (1983).
Hägele, K. Studies on polytene chromosomes of Smittia parthenogenetica (Chironomidae, Diptera). Chromosoma. 76, 47–55 (1980).
Jacob, J. An electron microscope autoradiographic study of the site of initial synthesis of RNA in the nucleolus of Smittia. Exp. Cell Res. 48, 276–282 (1967).
Jäckle, H. & Kalthoff, K. Proteins foretelling head and abdomen development in the embryo of Smittia spec. (Chironomidae, Diptera). Dev. Biol. 85, 287–298 (1981).
Kalthoff, K., Ran, K.-G. & Edmond, J. C. Modifying effects of UV irradiation on the development of abnormal body patterns in centrifuged insect embryos (Smittia spec., Chironomidae, Diptera). Dev. Biol. 91, 413–422 (1982).
Ripley, S. & Kalthoff, K. Changes in the apparent localization of anterior determinants during early embryogenesis (Smittia spec., Chironomidae, Diptera). Wilhelm Roux’s Arch. Dev. Biol. 192, 353–361 (1983).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Bushnell, B. BBtools. Retrieved from https://sourceforge.net/projects/bbmap/ (2014).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Chen, Y., Zhang, Y. X., Wang, A. Y., Gao, M. & Chong, Z. C. Accurate long-read de novo assembly evaluation with Inspector. Genome Biol. 22, 312 (2021).
Hu, J., Fan, J., Sun, Z. Y., Liu, S. L. & Berger, B. NextPolish: a fast and efficient genome polishing tool for long read assembly. Bioinformatics. 36, 2253–2255 (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34, 3094–3100 (2018).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Steinegger, M. & Söding, J. MMseqs. 2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 38, 4647–4654 (2021).
Li, H. et al. The Sequence Alignment/Map Format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457 (2020).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 6, 1–6 (2015).
Smit, A. F. A., Hubley, R., & Green, P. RepeatMasker Open-4.0. Available online: http://www.repeatmasker.org (accessed on 1 October 2022) (2013‐2015).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinf. 12, 491 (2011).
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: unsupervised RNA-Seq-Based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 32, 767–769 (2016).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol. Biol. 1962, 161–177 (2019).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: Eukaryotic gene prediction with self-training in the space of genes and proteins. Nar Genomics Bioinf. 2, lqaa26 (2020).
Tomas, B., Katharina, J. H., Alexandre, L., Mario, S. & Mark, B. BRAKER2: Automatic eukaryotic genome annotation with GeneMark- EP+ and AUGUSTUS supported by a protein database. Nar Genomics Bioinf. 3, lqaa108 (2021).
Kriventseva, E. V. et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Kalvari, I. et al. Rfam 13.0: Shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342 (2018).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935 (2013).
Chan, P. P. & Lowe, T. M. TRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods Mol Biol. 1962, 1–14 (2019).
Lagesen, K. et al. RNAmmer: Consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Morgat, A. et al. Enzyme annotation in UniProtKB using Rhea. Bioinformatics 36, 1896–1901 (2020).
Buchfink, B., Reuter, K. & Drost, H. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods. 18, 366–368 (2021).
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Finn, R. D. et al. InterPro in 2017—beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2017).
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Letunic, I. & Bork, P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 46, D493–D496 (2018).
Wilson, D. et al. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386 (2009).
Marchler-Bauer, A. et al. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45, D200–D203 (2017).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23797681 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23797682 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23797683 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23797684 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR23797685 (2023).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_033063855.1 (Smittia aterrima) (2023).
NCBI Assembly https://identifiers.org/ncbi/insdc.gca:GCA_033064975.1 (Smittia pratorum) (2023).
Fu, Y. Genome assembly and annotations of Smittia aterrima and Smittia pratorum (Diptera, Chironomidae). figshare https://doi.org/10.6084/m9.figshare.22762118 (2023).
Acknowledgements
This study was supported by the National Natural Science Foundation of China (NSFC) (Grant No. 32070483, 31460572), Natural Science Foundation of Hubei Province (Grant No. 2020CFB757), Scientific Research Starting Foundation for Ph.D. of Huanggang Normal University (Grant No. 2020010).
Author information
Authors and Affiliations
Contributions
Yue Fu conceived the study and wrote the manuscript, Yue Fu and Xiangliang Fang collected the samples. Bin Mao and Zigang Xu performed bioinformatics analysis. Yunli Xiao and Mi Shen extracted the DNA. Xinhua Wang put forward suggestions for modification. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fu, Y., Fang, X., Xiao, Y. et al. Two chromosome-level genomes of Smittia aterrima and Smittia pratorum (Diptera, Chironomidae). Sci Data 11, 165 (2024). https://doi.org/10.1038/s41597-024-03010-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03010-y
- Springer Nature Limited