Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Li, Hua; Zhai, Xuyang; Peng, Haixu; Qing, You; Deng, Yulin; Zhou, Shijie; Bei, Tairui; Tian, Ji; Zhang, Jie; Hu, Yujing; Qin, Xiaoxiao; Lu, Yanfen; Yao, Yuncong; Wang, Sen; Zheng, Yi

doi:10.1038/s41597-024-03049-x

Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Data Descriptor
Open access
Published: 13 February 2024

Volume 11, article number 201, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Download PDF

Hua Li^1,2^na1,
Xuyang Zhai^1,2^na1,
Haixu Peng^1,2,
You Qing ORCID: orcid.org/0000-0002-7273-3364^1,2,
Yulin Deng^1,2,
Shijie Zhou^1,2,
Tairui Bei²,
Ji Tian¹,
Jie Zhang¹,
Yujing Hu¹,
Xiaoxiao Qin¹,
Yanfen Lu¹,
Yuncong Yao ORCID: orcid.org/0000-0001-5512-1668¹,
Sen Wang^1,2 &
…
Yi Zheng^1,2

1168 Accesses
Explore all metrics

Abstract

Malus hybrid ‘Flame’ and Malus hybrid ‘Royalty’ are representative ornamental crabapples, rich in flavonoids and serving as the preferred materials for studying the coloration mechanism. We generated two sets of high-quality chromosome-level and haplotype-resolved genome of ‘Flame’ with sizes of 688.2 Mb and 675.7 Mb, and those of ‘Royalty’ with sizes of 674.1 Mb and 663.6 Mb, all anchored to 17 chromosomes and with a high BUSCO completeness score nearly 99.0%. A total of 47,833 and 47,307 protein-coding genes were annotated in the two haplotype genomes of ‘Flame’, and the numbers of ‘Royalty’ were 46,305 and 46,920 individually. The assembled high-quality genomes offer new resources for studying the origin and adaptive evolution of crabapples and the molecular basis of the accumulation of flavonoids and anthocyanins, facilitating molecular breeding of Malus plants.

Chromosome-scale genome assembly and annotation of Cotoneaster glaucophyllus

Article Open access 22 April 2024

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Article Open access 10 May 2024

Improved chromosome-level genome assembly of Indian sandalwood (Santalum album)

Article Open access 21 December 2023

Background & Summary

Malus hybrid ‘Flame’ (‘Flame’) and Malus hybrid ‘Royalty’ (‘Royalty’) are representative ornamental crabapples of the genus Malus in the rose family (Rosaceae). ‘Flame’ belongs to the ever-green leaf category, with green leaves and white flowers, while ‘Royalty’ belongs to the category of ever-red leaf, with purple-red leaves, flowers and fruits, and the fruit is fetal red¹. ‘Royalty’ and ‘Flame’ crabapples are rich in flavonoids². In ‘Royalty’, 17, 17, 15 and 9 kinds of flavonoids were detected from the leaves, flowers, peel and flesh respectively, and 15, 17, 11 and 9 types were detected from ‘Flame’ crabapple. And a putative transcription factor, MdMYB8, associated with flavonol biosynthesis was discovered by Li et al. based on transcriptome analysis of the transcriptomes of the fruit of ‘Flame’ from five continuous developmental stages³. Flavonoids are an important class of natural organic compounds with a wide range of biological activities. Previous studies had shown that flavonoids exhibit strong antioxidant activity and possess various pharmacological functions such as antibacterial, anti-inflammatory, anti-tumor, and anti-diabetic effects^4,5,6. Therefore, ‘Royalty’ and ‘Flame’ as natural carriers for the synthesis and accumulation of flavonoids have significant utilization value and strong development potential^7,8. Studying their genomes contributes to research on the pathways of flavonoid accumulation.

In addition, ‘Royalty’ and ‘Flame’ are the preferred materials for studying plant coloration mechanism due to the significant differences in the colors of diverse tissues. For example, as the key anthocyanin regulator, McMYB10 was identified in leaves and petals of crabapple and relatd to anthocyanin accumulation in ‘Royalty’, a crabapple cultivar with red-colored leaves and flowers^9,10. Then, the targeted gene McF3’H¹⁰, McDFR1 promoter¹¹ and specific ubiquitin E3 ligases McCOP1-1 and McCOP1-2¹² of McMYB10 were found through the investigation of leaf development in the two crabapples, besides that transcription factor McMYB12 promoting the accumulation of proanthocyanidins was discovered¹³. Furthermore, the endogenous McCHS gene was proved to be a critical factor during petal coloration by comparing content of flavonoids and anthocyanin of three typical crabapple cultivars with different petal colors¹⁴. Thus, the genomic data obtained in this study lays the foundation for subsequent investigations using multi-omics analysis strategy to explore the molecular mechanisms of anthocyanin synthesis, which is of great significance for a deep understanding of this important trait of coloring and improving the color breeding of these important ornamental crabapples.

In this study, we present high-quality genomes for Malus hybrid ‘Royalty’ and Malus hybrid ‘Flame’ using PacBio, Illumina, and Hi-C technologies. The results of k-mer analysis showed that the heterozygosity of ‘Flame’ was 2.89% and the genome size was ~691.2 Mb, while the heterozygosity of ‘Royalty’ was 1.78% and the genome size was ~685.4 Mb, which confirmed that both ‘Royalty’ and ‘Flame’ were highly heterozygous diploids (Fig. 1). The maximum assembled genome of ‘Flame’ (hapA) had a size of 688.2 Mb with a contig N50 of 31.6 Mb and the other was 675.7 Mb with a contig N50 of 35.6 Mb. The two haplotype genomes of ‘Royalty’ were 674.1 Mb with a contig N50 of 23.7 Mb (hapA) and 663.6 Mb with a contig N50 of 28.7 Mb (hapB) (Table 1). The assembled contigs were all further anchored to 17 pseudo-chromosomes, with an anchoring rate of 93.4% in ‘Flame’-hapA, 96.4% in ‘Flame’-hapB, 92.2% in ‘Royalty’-hapA and 95.4% in ‘Royalty’-hapB (Table 1, Fig. 2). The two haplotype genome of ‘Flame’ both had 5 chromosomes assembled into single-ended telomeres, 11 chromosomes assembled into double-ended telomeres, and only 1 chromosome not assembled into telomeres. There were 4 chromosomes assembled into single-ended telomeres and the rest were assembled into double-ended telomeres of ‘Royalty’-hapA, while 8 chromosomes of ‘Royalty’-hapB were equiped with single-ended telomeres and the other 9 chromosomes were with double-ended telomeres (Fig. 3). A total of 47,833 and 47,307 protein-coding genes were identified and almost fully annotated in the two haplotype genomes of ‘Flame’, respectively. All the 46,305 and 46,920 protein-coding genes of the two haplotype genome of ‘Royalty’ in each could be functionally annotated (Tables 2, 3). The quality of the final genomic assembly was assessed to be high gene completeness (‘Royalty’: 98.9% - hapA and 99.0% - hapB; ‘Flame’: 98.9% - hapA and 99.0% - hapB). The assembled high-quality genome of Malus hybrid ‘Royalty’ and Malus hybrid ‘Flame’ should be a valuable resource for future conservation genomics studies and flavonoid accumulation and anthocyanin synthesis investigations.

Table 1 Summary of Malus hybrid ‘Flame’ and Malus hybrid ‘Royalty’ genome assembly data.

Full size table

Table 2 Overview of genome assembly and annotation.

Full size table

Table 3 Summary of Malus hybrid ‘Flame’ and Malus hybrid ‘Royalty’ genome annotations.

Full size table

Methods

Sample preparation and DNA sequencing

Fresh leaves of Malus hybrid ‘Royalty’ and Malus hybrid ‘Flame’ were collected, which were located in Beijing University of Agriculture (36°49′N 128°37′E; 575-m altitude), Beijing, China. DNA was isolated from the samples using Cetyltrimethylammonium bromide (CTAB) method and purified by the AMPure PB beads (PacBio 100-265-900) to obtain high-molecular-weight genomic DNA (gDNA, ≥100 ng/μl and ≥10 μg) for subsequent library construction.

For HiFi sequencing, the SMRTbell DNA libraries were constructed using the following steps, according to the PacBio HiFi library construction protocol: (i) gDNA (extracted 10 μg and exceed 40 Kb in majority) target size shearing (15 Kb) using Megaruptor (Diagenode, B06010001) and then concentrated using AMPure® PB Beads (PacBio 100-265-900); (ii) DNA damage repair; (iii) blunt-end ligation with hairpin adapters; and (iv) and enzyme digestion using the SMRTbell® Express Template Prep Kit 2.0 (PacBio, PN 101-853-100); (v) size-selection using the SageELF (Sage Science ELF000) or the BluePippin Size Selection System (Sage Science BLU0001). Subsequently, HiFi sequencing was performed on a PacBio Sequel II platform (PacBio, CA, USA) for 30 hours.

For Hi-C, leaves were fixed in 1% (vol/vol) formaldehyde for library construction. The Hi-C library construction schedule including cell lysis, chromatin digestion, proximity-ligation treatments, DNA recovery and subsequent DNA manipulations were performed according to a previously described method¹⁵. DpnII was used as the restriction enzyme in chromatin digestion. The Hi-C library was sequenced on the Illumina NovaSeq. 6000 sequencing platform for 150 bp paired-end reads.

Genome survey and analysis

A total of 36 Gb and 26 Gb high-quality HiFi reads for ‘Flame’ and ‘Royalty’, respectively, were obtained by PacBio Sequel II platform and utilized for genome size and ploidy analysis. The Jellyfish (v2.2.10)¹⁶ software was performed for k-mer counting of reads from the two genomes, respectively. The reads were cut into 21-base sequences, the total number of 21-mers and the frequency of each 21-mer were counted and the distribution of 21-mers frequencies was plotted. The obtained matrix after 21-mer counting was then used to calculate the haplotype genome size and heterozygosity of ‘Flame’ and ‘Royalty’, as well as the prediction of ploidy, using Genomescope (v2.0)¹⁷ software. The genome size of ‘Flame’ and ‘Royalty’ were estimated to be 691,264,141 bp and 685,420,647 bp respectively. And the rate of heterozygosity were estimated to be 2.89% and 1.78%, respectively. The K-mer analysis indicated that both ‘Flame’ and ‘Royalty’ were highly heterozygous diploids (Fig. 1).

Genome assembly

Contigs were de novo assembled from PacBio HiFi reads to generate a phased assembly graph and then HiC reads were ultilized to link unitigs that share mapped fragments by hifiasm (v0.16.1)¹⁸ with parameters (–hom-cov 34–n-weight 6 -s 0.45 -O 2). Following that, contigs were anchored into 34 chromosomes in total using the software Juicer¹⁹ and the 3D-DNA²⁰ (-m haploid -r 0) based on Hi-C interaction data (‘Flame’: 60 Gb, ~100×; ‘Royalty’: 80 Gb, ~133×) (Fig. 2). Subsequently, the assembled genome was manually corrected with JucieBox²¹, including correcting chromosome boundaries, rejoining misjoins, and addressing inversions and translocations, and the final genome was generated using agp2fa mode of RagTag²² based on AGP format file recording contigs of each chromosome. The total length of two chromosome-level haplotype-resolved genomes of ‘Flame’ was 642.9 Mb (hapA) with a contig N50 of 31.6 Mb and 651.8 Mb (hapB) with a contig N50 of 35.6 Mb, of which of ‘Royalty’ was 628.5 Mb (hapA) with a contig N50 of 23.7 Mb and 637.4 Mb (hapB) with a contig N50 of 28.7 Mb, achieving anchoring rate of all haploid genomes higher than 92% (Table 1).

The telomere sequences were detected with the software TRF²³ and most of the chromosomes are assembled to telomeres. For examples, a total of 5 chromosomes assembled into single-ended telomeres, 11 chromosomes assembled into double-ended telomeres, and only 1 chromosome not assembled into telomeres of ‘Flame’, while 4 chromosomes assembled into single-ended telomeres and 13 chromosomes assembled into double-ended telomeres of ‘Royalty’-hapA, confirmed the high genomic integrity and continuity of the assembled genomes (Fig. 3).

Genome annotation

Repeat sequences were annotated using de-novo approaches, by constructing a database of repeat sequences using the software RepeatModeler (v1.0.11)²⁴ with setting parameters (-database -pa 5). Subsequently, the constructed database was imported to RepeatMasker (v4.1.2)²⁵ to identify transposons or low-complexity repeats in the DNA sequences, and then the TRF (v4.09)²³ was used to identify tandem repeats. It had been found that both of ‘Royalty’ and ‘Flame’ genomes were highly repetitive, of which 64.76% were repetitive sequences in ‘Flame’, and the major portion of the repetitive sequences was the retransposon LTR at a percentage of about 37.74%. In ‘Royalty’, 64.59% were repetitive sequences, and the repetitive sequences that accounted for the most part of the repetitive sequences were also LTRs about 34.62%.

To annotate a complete and accurate gene structure, a strategy incorporating transcriptome, protein-based homology, and ab initio prediction was employed²⁶. For transcript-based prediction, two sets of published transcriptome data (‘Flame’ was assisted by BioProject PRJNA546094²⁷ and ‘Royalty’ was assisted by BioProject PRJNA546107²⁸) were mapped to the assembled genomes, respectively, by HISAT2 (v2.2)²⁹. The mapped reads were assembled by StringTie (v1.3)³⁰ to retain the longest transcripts as EST evidence. As for protein-based homology, the protein sequences of sequenced apple genomes of ‘Golden Delicious’ (Malus domestica cv. Golden Delicious), ‘Hanfu’ (Malus domestica cv. Hanfu), ‘Gala’ (Malus domestica cv. Gala), European wild apple (Malus sylvestris) and wild apple (Malus sieversii) were utilized to perform homology prediction by Exonerate³¹. For the ab initio prediction, the assemblies were hard masked according to the repeat annotation, and then Augustus (v3.4)³² and BRAKER2³³ were performed to train a gene prediction model based on the transcripts. At last, protein coding genes were predicted using BRAKER2 with the trained model. Finally, the predictions generated by the above methods were integrated to generate the final of the annotation file by using the Maker (v3.1)³⁴. Comparison of the protein-coding genes with single-copy homologous conserved gene databases using BUSCO (v4.1)³⁵ analysis showed that the two sets of haplotype genome sequences of ‘Flame’ contained complete homologous conserved genes in about 99.0% and 98.8% of plants, and those of ‘Royalty’ were about 98.9% and 99.0% respectively (Table 2).

The functional annotation was performed following a standard workflow based on above annotated protein-coding genes: (i) Diamond (v2.0)³⁶ was run with an E-value threshold of 1e-4 against GenBank-NR³⁷, Swiss-Prot³⁸, TrEMBL³⁹ and the Arabidopsis protein database⁴⁰; (ii) InterProScan (v5.59)^41,42 was performed to identify functional protein structural domains against the InterPro⁴² database; (iii) aligned results from the GenBank-NR database were combined with identified functional domains of InterPro proteins for GO (the gene Ontology Consortium)⁴³ annotation using the Blast2GO (v2.2)⁴⁴ program; (iv) the annotation results of SwissProt, TrEMBL, and Arabidopsis protein database were combined with AHRD (v3.3) program; (v) the Kyoto Encyclopedia of Genes and Genomes (KEGG)⁴⁵ database was also consulted for KEGG functional annotations in Blast2GO (v2.2); (vi) Prediction of transcription factors (TF), transcriptional regulators (TR) and protein kinases (PK) for protein-coding genes using iTAK⁴⁶ software.

The final annotation results showed that the hapA and hapB genomes of ‘Flame’ contain 47,833 and 47,307 genes respectively. For ‘Royalty’, 46,305 genes were annotated in the hapA genome and 46,920 in the hapB genome (Table 2). For the functional annotations of ‘Falme’, the protein-coding genes were compared with the GenBank-NR, SwissProt, Arabidopsis protein database, and TrEMBL databases, and of each was annotated 94,621, 66,487, 76,886, and 91,313 genes, respectively. A total of 44,046 genes were matched to GO database and 41,447 genes were linked with pathway annotations. 6.58% of genes were indentified as TFs/TRs and 3.16% were labeled as PK. As for ‘Royalty’, there were 93,225, 66,215, 76,165 and 90,160 genes matched with the GenBank-NR, SwissProt, Arabidopsis protein database, and TrEMBL databases, separately. Additionally, 43,776 and 41,123 genes were annotated by GO and KEGG in each. The total number of predicted transcription factors and transcriptional regulators was similar to the Flame’s, but the identified protein kinases were 276 over than Flame’s, counting for 3.58% of total genes (Table 3).

Data Records

The raw data (PacBio HiFi reads, and Hi-C sequencing reads) used for genome assembly were deposited in the NCBI database under BioProject accession PRJNA1026659⁴⁷. The chromosomal assembly and dataset of gene annotation have been deposited at Figshare (https://doi.org/10.6084/m9.figshare.24276916)⁴⁸. The assembled diploid genome of ‘Flame’ was deposited in GenBank database (accession number: GCA_036218565.1⁴⁹ for hapA and GCA_036220445.1⁵⁰ for hapB). The assembly genome files of ‘Royalty’ were stored under the accession GCA_036320615.1 (hapA)⁵¹ and GCA_036320635.1 (hapB)⁵², respectively.

Technical Validation

Firstly, the Hi-C heatmap exhibits the accuracy of genome assembly, with relatively independent Hi-C signals observed between the 17 pseudo-chromosomes (Fig. 2). Furthermore, the completeness of the genomes was evaluated using the BUSCO pipeline based on the embryophyta_odb10 database. BUSCO assessment of the final hapA genome of ‘Flame’ found that 99.0% of the 1,614 highly conserved orthologs were present as complete genes, including 61.3% single-copy BUSCOs and 37.7% duplicated BUSCOs and the completeness score of hapB genome was 98.8% with 62.1% single-copy BUSCOs. The final hapA genome of ‘Royalty’ found that 98.8% of the 1,614 highly conserved orthologs were present as complete genes, including 63.3% single-copy BUSCOs and 35.5% duplicated BUSCOs, and the hapB genome had a similar performance. Also, It had been showed by chromosome telomere location map that the assembled genome is assembled to telomeres except for the 13 chromosome of ‘Flame’, and most of the chromosomes were assembled into double-ended telomeres (Fig. 3).

Code availability

There is no custom code was used during this study. All software and pipelines were executed according to the manual and protocols of the published bioinformatics tools. The version and code/parameters of software have been detailed and described in Methods.

References

Wang, Z., Wang, W., Zhang, J., Song, T. & Yao, Y. Genetic diversity and phylogenetic relationships analysis of major ornamental crabapple species. Journal of Fruit Science 31, 1005–1016 (2014).
CAS Google Scholar
Tian, J. et al. The Balance of Expression of Dihydroflavonol 4-reductase and Flavonol Synthase Regulates Flavonoid Biosynthesis and Red Foliage Coloration in Crabapples. Sci Rep 5, 12228 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Li, H. et al. MdMYB8 is associated with flavonol biosynthesis via the activation of the MdFLS promoter in the fruits of Malus crabapple. Hort. Res. 7 (2020).
He, X. & Liu, R. H. Phytochemicals of apple peels: isolation, structure elucidation, and their antiproliferative and antioxidant activities. J Agr Food Chem 56, 9905–9910 (2008).
Article CAS Google Scholar
Boyer, J. & Liu, R. H. Apple phytochemicals and their health benefits. Nutr. J. 3, 1–15 (2004).
Article Google Scholar
Lu, Y. et al. Flavonoid accumulation plays an important role in the rust resistance of Malus plant leaves. Front Plant Sci 8, 1286 (2017).
Article PubMed PubMed Central Google Scholar
Liu, F., Wang, M. & Wang, M. Phenolic compounds and antioxidant activities of flowers, leaves and fruits of five crabapple cultivars (Malus Mill. species). Sci. Hortic. 235, 460–467 (2018).
Article CAS Google Scholar
Wang, Y.-R. et al. Different coloration patterns between the red-and white-fleshed fruits of malus crabapples. Sci. Hortic. 194, 26–33 (2015).
Article CAS Google Scholar
Jiang, R., Tian, J., Song, T., Zhang, J. & Yao, Y. The Malus crabapple transcription factor McMYB10 regulates anthocyanin biosynthesis during petal coloration. Sci. Hortic. 166, 42–49 (2014).
Article CAS Google Scholar
Tian, J. et al. Mc MYB 10 regulates coloration via activating McF3’H and later structural genes in ever‐red leaf crabapple. Plant Biotechnol. J. 13, 948–961 (2015).
Article CAS PubMed Google Scholar
Tian, J. et al. Characteristics of dihydroflavonol 4-reductase gene promoters from different leaf colored Malus crabapple cultivars. Hort. Res. 4 (2017).
Li, K.-T. et al. McMYB10 modulates the expression of a Ubiquitin Ligase, McCOP1 during leaf coloration in crabapple. Front Plant Sci 9, 704 (2018).
Article PubMed PubMed Central ADS Google Scholar
Tian, J. et al. McMYB12 transcription factors co-regulate proanthocyanidin and anthocyanin biosynthesis in Malus crabapple. Sci. Rep. 7, 43715 (2017).
Article PubMed PubMed Central ADS Google Scholar
Tai, D., Tian, J., Zhang, J., Song, T. & Yao, Y. A Malus crabapple chalcone synthase gene, McCHS, regulates red petal color and flavonoid biosynthesis. PLoS One 9, e110570 (2014).
Article PubMed PubMed Central ADS Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central ADS Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764-770 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. Nat. Biotechnol 40, 1332–1335 (2022).
Article CAS PubMed Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Robinson, J. T. et al. Juicebox. js provides a cloud-based visualization system for Hi-C data. Cell Syst. 6, 256–258. e251 (2018).
Article CAS PubMed PubMed Central Google Scholar
Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Article CAS PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Article CAS PubMed Google Scholar
Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).
Article PubMed PubMed Central ADS Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP200472 (2019).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP200468 (2019).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinform. 6, 31 (2005).
Article Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR genomics and bioinformatics 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).
Article Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Article CAS PubMed PubMed Central Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article PubMed Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Consortium, U. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Article Google Scholar
Mergner, J. et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 579, 409–414 (2020).
Article CAS PubMed ADS Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol Biol. 396, 59–70 (2007).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Zheng, Y. et al. iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant 9, 1667–1670 (2016).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP465516 (2023).
Peng, H.-X. Haplotype-resolved genome assembly and annotation of Malus hybrid cultivar Flame and Malus hybrid cultivar Royalty. figshare https://doi.org/10.6084/m9.figshare.24276916.v1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036218565.1 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036220445.1 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036320615.1 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036320635.1 (2024).

Download references

Acknowledgements

This work was supported by grants from the Beijing University of Agriculture (Start-up fund) to Y.Z., Young Teachers’ Research and Innovation Capacity Enhancement Program QJKC2022044 and Beijing Municipal Education Commission Scientific Research Plan Project KM202310020010 to S.W.

Author information

These authors contributed equally: Hua Li, Xuyang Zhai.

Authors and Affiliations

Beijing Key Laboratory for Agriculture Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing, 102206, China
Hua Li, Xuyang Zhai, Haixu Peng, You Qing, Yulin Deng, Shijie Zhou, Ji Tian, Jie Zhang, Yujing Hu, Xiaoxiao Qin, Yanfen Lu, Yuncong Yao, Sen Wang & Yi Zheng
Bioinformatics Center, Beijing University of Agriculture, Beijing, 102206, China
Hua Li, Xuyang Zhai, Haixu Peng, You Qing, Yulin Deng, Shijie Zhou, Tairui Bei, Sen Wang & Yi Zheng

Authors

Hua Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuyang Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Haixu Peng
View author publications
You can also search for this author in PubMed Google Scholar
You Qing
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tairui Bei
View author publications
You can also search for this author in PubMed Google Scholar
Ji Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yujing Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yanfen Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuncong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Sen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.Z. and S.W. conceived the project and designed the experiments; X.Z. collected and prepared the samples; X.Z., H.P., Y.Q., Y.D., S.Z., T.B., J.T., J.Z., Y.H., X.Q., Y.L., H.L. and S.W. performed the genome assembly, gene annotation, and other bioinformatics analyses; H.L. drafted the manuscript; H.L., Y.Z., S.W. and Y.Y. revised the manuscript. All authors contributed to the article and approved the submitted version.

Corresponding authors

Correspondence to Sen Wang or Yi Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, H., Zhai, X., Peng, H. et al. Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty. Sci Data 11, 201 (2024). https://doi.org/10.1038/s41597-024-03049-x

Download citation

Received: 07 November 2023
Accepted: 05 February 2024
Published: 13 February 2024
DOI: https://doi.org/10.1038/s41597-024-03049-x
Springer Nature Limited

Associated content

Genomics data for plant ecology, conservation and agriculture

Collection 20 January 2023

Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Abstract

Similar content being viewed by others

Chromosome-scale genome assembly and annotation of Cotoneaster glaucophyllus

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Improved chromosome-level genome assembly of Indian sandalwood (Santalum album)

Background & Summary

Methods

Sample preparation and DNA sequencing

Genome survey and analysis

Genome assembly

Genome annotation

Data Records

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Genomics data for plant ecology, conservation and agriculture

Navigation

Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty

Abstract

Similar content being viewed by others

Chromosome-scale genome assembly and annotation of Cotoneaster glaucophyllus

The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species

Improved chromosome-level genome assembly of Indian sandalwood (Santalum album)

Background & Summary

Methods

Sample preparation and DNA sequencing

Genome survey and analysis

Genome assembly

Genome annotation

Data Records

Technical Validation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation