Haplotype-resolved chromosome-scale genomes of the Asian and African Savannah Elephants

Shi, Minhui; Chen, Fei; Sahu, Sunil Kumar; Wang, Qing; Yang, Shangchen; Wang, Zhihong; Chen, Jin; Liu, Huan; Hou, Zhijun; Fang, Sheng-Guo; Lan, Tianming

doi:10.1038/s41597-023-02729-4

Haplotype-resolved chromosome-scale genomes of the Asian and African Savannah Elephants

Data Descriptor
Open access
Published: 11 January 2024

Volume 11, article number 63, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Haplotype-resolved chromosome-scale genomes of the Asian and African Savannah Elephants

Download PDF

Minhui Shi^1,2,3^na1,
Fei Chen^4,5^na1,
Sunil Kumar Sahu ORCID: orcid.org/0000-0002-4742-9870²^na1,
Qing Wang²,
Shangchen Yang⁶,
Zhihong Wang^4,5,
Jin Chen^7,8,
Huan Liu ORCID: orcid.org/0000-0003-3909-0931^1,2,7,
Zhijun Hou⁹,
Sheng-Guo Fang⁶ &
…
Tianming Lan ORCID: orcid.org/0000-0002-6934-0439^1,2,9

2087 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The Proboscidea, which includes modern elephants, were once the largest terrestrial animals among extant species. They suffered mass extinction during the Ice Age. As a unique branch on the evolutionary tree, the Proboscidea are of great significance for the study of living animals. In this study, we generate chromosome-scale and haplotype-resolved genome assemblies for two extant Proboscidea species (Asian Elephant, Elephas maximus and African Savannah Elephant, Loxodonta africana) using Pacbio, Hi-C, and DNBSEQ technologies. The assembled genome sizes of the Asian and African Savannah Elephant are 3.38 Gb and 3.31 Gb, with scaffold N50 values of 130 Mb and 122 Mb, respectively. Using Hi-C technology ~97% of the scaffolds are anchored to 29 pseudochromosomes. Additionally, we identify ~9 Mb Y-linked sequences for each species. The high-quality genome assemblies in this study provide a valuable resource for future research on ecology, evolution, biology and conservation of Proboscidea species.

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

Article Open access 25 October 2022

Contrasting new and available reference genomes to highlight uncertainties in assemblies and areas for future improvement: an example with monodontid species

Article Open access 20 November 2023

Chromosome-scale genome assemblies of Himalopsyche anomala and Eubasilissa splendida (Insecta: Trichoptera)

Article Open access 05 March 2024

Background & Summary

In recent decades, there has been a growing interest in the body size of proboscideans, as it is closely associated with a variety of biological functions due to its high correlation with mass¹. Currently, there are two families within Proboscidea, comprising three species: the Asian elephant, the African savannah elephant, and the African forest elephant (Loxodonta cyclotis). The population of proboscis animals has been rapidly decreasing due to factors like poaching and hunting. As a result, they are now classified as critically endangered and endangered on the IUCN red list (https://www.iucnredlist.org/). People’s preference for ivory has also caused some unique evolutionary changes in proboscis animals, such as a substantial increase in the proportion of female African elephants without tusks and a gradual decrease in the size of tusks in male African elephants². In addition, the swift expansion of economic crop cultivation areas has led to habitat fragmentation, emerging as a significant peril to wild populations³. A growing quantity of elephants are coming out of the forest and regularly exploring villages and residential areas. An increasing number of elephants are coming out of the forest and frequently venturing into villages and residential areas. As a result, there have been occasional occurrences of crop damage, as well as harm to humans and animals. The escalating human-elephant conflict poses a significant challenge to conservation efforts and is detrimental to the healthy development of the elephant population. Additionally, variations in the population of large mammals exert a greater impact on other animals within their habitat. Therefore, the protection and conservation of elephants has become a focus of ecological diversity efforts. In the era of transitioning from conservation genetics to conservation genomics^4,5,6,7, high-quality reference genome is of vital importance to improve the evaluation of the full spectrum of genomic diversity, inbreeding and outbreeding depression, local adaptation and genetic loads^8,9,10,11. Furthermore, this genome assembly will provide a valuable resource for studying the ecology and evolution of specific species^12,13.

Rapid advances in high-throughput sequencing technologies over the past decade have opened new avenues for addressing the genetic basis of natural population adaptation and speciation¹⁴. The use of genetic data has proven valuable in delineating taxa that cannot be identified based on morphology alone^15,16,17. In the case of endangered animals, the analysis of haplotype can assist in detecting hidden signals of inbreeding depression, providing crucial insights for conservation initiatives¹⁸. Therefore, obtaining high-quality elephant genomes will be important for elucidating the genetic mechanisms underlying the species’ distinct biological characteristics and complexity, as well as for informing conservation strategies aimed at preserving these species. Although the draft genomes of the two elephants have been released before^19,20, the recent HiFi sequencing technology greatly improves the genome quality and supplies haplotype-resolved reference genome^20,21,22.

In this study, we generated two chromosome-level and haplotype-resolved genome assemblies of the Asian Elephant and African Savannah Elephant using PacBio HiFi long-reads, DNBSEQ short-reads, and Hi-C sequencing data. The assembled genome sizes were 3.38 Gb and 3.31 Gb for the Asian elephant and African savanna elephant, with the N50 length of 130 Mb and 122 Mb, respectively. These results are significantly improved compared to the published genomes^14,15. Approximately 97% of the assembled sequences were anchored to 29 pseudochromosomes. The collinearity analysis of the chromosome-level genomes of the two species is consistent with the results of published karyotype studies²³, which verifies the accuracy of genome assembly in this study. Using a combination of de novo prediction, homology-based search, and transcriptome-assisted method, we annotated 22,177 and 22,142 protein-coding genes in genomes of the Asian elephant and African savanna elephant, respectively. Additionally, we identified ~ 9 Mb of Y-linked sequences from both of the two elephant genomes by combining the sex-determining region (SRY) and chromosomal synteny evidence. the two high-quality elephant reference genomes produced in this study are a valuable resource for future research on the ecology, evolution, biology, and conservation of Proboscidea species. The two high-quality elephant reference genomes in this study are a valuable resource for future research on ecology, evolution, biology and conservation for Proboscidea species. The genomes hold the potential to delve into a diverse array of subjects, offering an opportunity to enhance our comprehension of these incredible creatures and bolster efforts for their conservation.

Methods

Sample collection and ethics statement

Blood samples from E. maximus and tissue samples from L. africana were provided by the Asian Elephant Research Center of National Forestry and Grassland Administration of China and Harbin North Forest Zoo, Heilongjiang Province, China. A portion of the fresh sample (blood sample from an Asian elephant, and muscle tissue sample from an African savannah elephant) was taken out and treated with formaldehyde for the cross-linking of the chromatin, and then stored at −80 °C for Hi-C sequencing. The remaining sample was immediately frozen in liquid nitrogen for 30 min and then transferred to the −80 °C refrigerator for PacBio sequencing, DNBSEQ sequencing and RNA-seq sequencing. Sample collection, follow-up experiments and research design in this study were all approved by the Institutional Review Board of BGI (BGI-IRB E22017).

Nucleic acid extraction, library construction and sequencing

Total genomic DNA was extracted using a Dneasy Blood and Tissue Kit (Qiagen, USA) for whole genome sequence (WGS) library. Total RNA from blood and muscle tissue were extracted using Trlzol reagent (Invitrogen, USA), and cDNA libraries were reverse-transcribed using 200–400 bp RNA fragments (Supplementary table 1). The concentration of nucleic acid was detected by Qubit 2.0 Fluorometer (Life Technologies, USA), and RNA integrity was evaluated using an Agilent 2100 Bioanalyzer System (Agilent, USA). These two types of libraries were subjected to paired-end sequencing using a DNBSEQ-T1 sequencer (MGI tech, Shenzhen, Guangdong, China). A 15k library was constructed by using high-quality DNA samples (main band > 30 kb) and sequenced with a Pacbio Sequel II platform (Novogene, Tianjin, China). Low-quality reads and sequencing-adaptor-contaminated reads were removed. Finally, a total of ~100 GB clean data were used to assemble the two genomes (Table 1). Cross-linked samples were prepared with dnpII restriction endonuclease for Hi-C library and PE-sequenced by Illumina Hiseq.

Table 1 Sequencing stats.

Full size table

Genome assembly

To estimate the genome size, a total of ~100 Gb DNBSEQ short reads were used for analysis by kmerfreq (v5.0)²⁴. The final estimated genome size is 3.44 Gb for E. maximus and 3.50 Gb for L. africana (Supplementary Fig. 1). The heterozygous and haplotype draft genomes of the two elephants were assembled by using Hi-C and PacBio sequencing data in hifiasm (v0.16.1)²⁵. In the genome polishing stage, minimap2 (v2.17)²⁶ and NextPolish (v1.4.0)²⁷ were mainly used to improve the accuracy of single bases by three rounds of HiFi reads and two rounds of DNBSEQ reads. Redundancy removal of genomes was performed by Purge_dups (v1.2.5)²⁸. The burrows-Wheeler Aligner (BWA, v0.7.17) mem algorithm²⁹ was used for Hi-C sequencing reads mapping to the primary genome. The Juicer (v1.5)³⁰ was used for Hi-C data quality control, and the 3d-DNA pipeline (v190716)³¹ was finally used to concatenate and review the scaffolds to the chromosome-scale genome. Finally, two hybrid genomes composed of 29 pseudo-chromosomes and two sets of haplotigs composed of 28 pseudo-chromosomes were obtained, and the average Hi-C mounting rate reached 97.28 ± 0.60% (Fig. 1, Supplementary Tables 3, 4). Basic assembly statistics, reaching 130 Mb and 122 Mb for Scaffold N50, show a significant improvement over published Elephant genomes (Table 2, Supplementary table 4)^14,15.

Table 2 Comparison of the assembly statistics among the genomes assembled in this study (EmaxG and LafrG) and the previously published elephant genomes^19,20.

Full size table

By identifying the sex-determining region of Y-chromosome (SRY) and examining the chromosomal synteny between species using (MUMmer, v4.0.0rc1)³², we also discovered two Y-linked regions of ~9 Mb each, which were verified on the DNBSEQ reads depth distribution (Supplementary Fig. 2).

Repeat regions prediction

Transposable elements (TEs) and other repetitive elements were identified using a combination of homology-based and de novo approaches. For the homology-based approach at both the DNA and protein levels, the genome assembly was aligned to the known repeat database REPBASE (v21.01) using RepeatMasker³³ (v4.0.5), RepeatProteinMask³³ and Tandem Repeats Finder (TRF)³⁴ (v4.07b). For the de novo-based approach, RepeatModeler³⁵ (v2.0) and LTR_retriever³⁴ were used to construct a de novo repeat library. We found that the Asian elephant and African savanna elephant genomes contained 69.16% and 70.32% TEs, respectively, with the proportions of each type being similar across these two species (Table 3, Supplementary Tables 5, 6). Long Interspersed Nuclear Elements (LINEs) accounted for most TEs, occupying about ~54% of the genome. All repetitive elements were masked for gene annotation.

Table 3 Statistics of the repeat elements.

Full size table

Annotation of protein-coding genes

We combined homology-based, de novo and transcriptome-based methods to predict assembled gene content. In a homology-based approach, GeneWise³⁶ (v2.4.1) was used to map 14 closely related or high-quality protein sequences, including Homo sapiens, Mus musculus, Suncus etruscus, Equus caballus, Felis catus, Phyllostomus discolor, Sus scrofa, Choloepus didactylus, Dasypus novemcinctus, Trichechus manatus latirostris, Orycteropus afer afer, Elephantulus edwardii, Echinops telfairi, and Chrysochloris asiatica, available in the NCBI database, to two assembled genomes with an E-value cutoff of 1e-5. In the de novo method, we run the repeat-masked genome using Augustus³⁷ (v3.0.3). In the transcriptome-based method, transcripts were assembled using StringTie³⁸ (v1.3.3b) based on clean RNA-seq data. The final protein-coding gene set was generated using the MAKER pipeline³⁹ (v3.01.03) by combining high-quality homology-based, de novo and RNA-seq supported genes. Based on the above methods, 22177 genes were annotated in the Asian elephant genome, while 22142 genes were annotated in African elephant genome (Table 4).

Table 4 Protein-coding gene statistics.

Full size table

Annotation of gene function

Functional annotations of protein-coding genes were carried out using BLAST (e-value cut-off of 1e-5) against publicly available databases, including the Swiss-Prot, TrEMBL, Gene ontology (GO) terms and KEGG database. InterProScan⁴⁰ (v5.52–86.0) was used to predict domains and motifs. 99.81% of the genes in the gene sets of both elephant species were fully annotated in the five above-mentioned databases (Fig. 2a,b, Supplementary Table 7). In addition, noncoding RNA (ncRNA) genes, including miRNA, tRNA, snRNA and rRNA, were predicted in the assembled genome. tRNA genes were identified using tRNAscan-SE⁴¹ (v1.3.1). snRNA and miRNA genes were detected by searching the reference genome sequences against the content of the Rfam database (Release 12.0) using BLAST (Supplementary Table 8).

Phylogenetic comparative analysis

We performed a comparative genomic analysis between the E. maximus, L. africana and 14 reference species used in the previous step, among which Homo sapiens was set as an outgroup. First, the longest transcript of each gene from each species was used to perform all-to-all BLAST⁴² (v2.2.26) analysis with the parameter “-p blastp -m8 -e 1e-5 -F F”. Then, genes were clustered using Treefam⁴³ (v1.4) pipeline with hierarchical clustering on a sparse graph. Finally, 2365 single-copy genes were identified (Supplementary Fig. 3). These single-copy genes were used to construct a Maximum-Likelihood (ML) phylogenetic tree using IQTREE⁴⁴ (v1.6.12), with the best-fit evolutionary substitution model (GTR + F + R4) using ModelFinder⁴⁵. To estimate the divergence time between C. versicolor and the other 14 species, we used MCMC Tree⁴⁶ (v4.5) implemented in the PAML package. Sequences for 2365 single-copy genes were used as the input file for MCMC Tree, and multiple fossil times were u from Timetree (http://www.timetree.org/). The Markov chain Monte Carlo (MCMC) process was run for 1,500,000 iterations of 150 after a burn-in of 500,000 iterations with a sampling frequency (Fig. 2c).

Data Records

The chromosome-scale genome sequences of two elephant species are available at the NCBI GenBank under the accession number GCA_033060105.1⁴⁷ (EmaxG) and GCA_033060095.1⁴⁸ (LafrG), and the haplotype-resolved genome sequences are also available at NCBI (EmaxH1: GCA_032718755.1⁴⁹, EmaxH2: GCA_032718585.1⁵⁰, LafrH1: GCA_032717405.1⁵¹, LafrH2: GCA_032717415.1⁵²). The annotation files generated in the current study are available in the figshare database⁵³. The raw data that support the findings in this study have been deposited into National Genomics Data Center (NGDC)⁵⁴ Genome Sequence Archive (GSA)⁵⁵ database with the accession number CRA012221⁵⁶ under the BioProject accession number PRJCA018778. All the above sequencing and analysis data in this study is also available in CNGB Sequence Archive (CNSA)⁵⁷ of China National GeneBank DataBase (CNGBdb)⁵⁸ with accession number CNP0004258.

Technical Validation

The completeness of the elephant genomes was evaluated by the BUSCO⁵⁹ (v5.2.2) analysis with mammalia_odb10 data set, scoring at 95.1 ± 1.1% (Table 5). The Merqury⁶⁰ (release 20200430) k-mer analysis and PacBio long reads’ alignments (genome regions with PacBio long-read coverage over 10× were considered as accurate assembled regions⁶¹) were used for evaluating the genome assembly accuracy of this genome (Table 5, Supplementary Table 9). The completeness of the genome and gene set was also evaluated using the database of mammalia_odb10 through BUSCO. The two chromosome-level genomes scored 96.3% and 95.2%, respectively (Supplementary Table 10). The NUCmer program from the MUMmer³² (v4.0.0rc1) was performed for Syntenic blocks screening, and these identified syntenic blocks were filtered by using the delta-filter program from the MUMmer³² (v4.0.0rc1) with parameters “-i 90 -l 5000”, to assist in demonstrating the haplotype effect (Supplementary Fig. 4).

Table 5 Summary of genome quality assessments.

Full size table

Code availability

No specific script was used in this work. The codes and pipelines used in data processing were all executed according to the manual and protocols of the corresponding bioinformatics software. The specific versions of software have been described in Methods.

References

Larramendi, A. Shoulder height, body mass, and shape of proboscideans. Acta Palaeontologica Polonica 61, 537–574 (2015).
Google Scholar
Campbell-Staton, S. C. et al. Ivory poaching and the rapid evolution of tusklessness in African elephants. Science 374, 483–487 (2021).
Article ADS CAS PubMed Google Scholar
Dai, Y. The overlap of suitable tea plant habitat with Asian elephant (Elephus maximus) distribution in southwestern China and its potential impact on species conservation and local economy. Environmental Science and Pollution Research 29, 5960–5970 (2022).
Article PubMed Google Scholar
Supple, M. A. & Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biology 19, 131 (2018).
Article PubMed PubMed Central Google Scholar
Ouborg, N. J., Pertoldi, C., Loeschcke, V., Bijlsma, R. K. & Hedrick, P. W. Conservation genetics in transition to conservation genomics. Trends in Genetics: TIG 26, 177–187 (2010).
Article CAS PubMed Google Scholar
Primmer, C. R. From conservation genetics to conservation genomics. Annals of the New York Academy of Sciences 1162, 357–368 (2009).
Article ADS CAS PubMed Google Scholar
Formenti, G. et al. The era of reference genomes in conservation genomics. Trends in Ecology & Evolution 37, 197–202 (2022).
Article CAS Google Scholar
Zhang, L. et al. Chromosome-scale genomes reveal genomic consequences of inbreeding in the South China tiger: A comparative study with the Amur tiger. Molecular Ecology Resources 23, 330–347 (2022).
Article PubMed PubMed Central Google Scholar
Yang, S. et al. Genomic investigation of the Chinese alligator reveals wild-extinct genetic diversity and genomic consequences of their continuous decline. Molecular Ecology Resources 23, 294–311 (2022).
Article PubMed PubMed Central Google Scholar
Wang, Q. et al. Whole-genome resequencing of Chinese pangolins reveals a population structure and provides insights into their conservation. Communications Biology 5, 821 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dussex, N. et al. Population genomics of the critically endangered kākāpō. Cell Genomics 1, 100002 (2021).
Article CAS PubMed PubMed Central Google Scholar
Guang, X. et al. Chromosome-scale genomes provide new insights into subspecies divergence and evolutionary characteristics of the giant panda. Science Bulletin 66, 2002–2013 (2021).
Article ADS CAS PubMed Google Scholar
Lan, T. et al. The chromosome-scale genome of the raccoon dog: Insights into its evolutionary characteristics. iScience 25, 105117 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Vijay, N. et al. Evolution of heterogeneous genome differentiation across multiple contact zones in a crow species complex. Nature communications 7, 1–10 (2016).
Article Google Scholar
Spinks, P. Q. & Shaffer, H. B. Range‐wide molecular analysis of the western pond turtle (Emys marmorata): cryptic variation, isolation by distance, and their conservation implications. Molecular Ecology 14, 2047–2064 (2005).
Article CAS PubMed Google Scholar
Rodríguez, A. et al. Cryptic differentiation in the Manx shearwater hinders the identification of a new endemic subspecies. Journal of Avian Biology 51 (2020).
Wenner, T. J., Russello, M. A. & Wright, T. F. Cryptic species in a Neotropical parrot: genetic variation within the Amazona farinosa species complex and its conservation implications. Conservation Genetics 13, 1427–1432 (2012).
Article Google Scholar
Miller, W. et al. Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). Proceedings of the National Academy of Sciences 108, 12348–12353 (2011).
Article ADS CAS Google Scholar
Palkopoulou, E. et al. A comprehensive genomic history of extinct and living elephants. Proceedings of the National Academy of Sciences 115, E2566–E2574 (2018).
Article ADS CAS Google Scholar
Tollis, M. et al. Elephant genomes reveal accelerated evolution in mechanisms underlying disease defenses. Molecular Biology and Evolution 38, 3606–3620 (2021).
Article CAS PubMed PubMed Central Google Scholar
Flicek, P. et al. Ensembl 2014. Nucleic Acids Research 42, D749–D755 (2014).
Article CAS PubMed Google Scholar
Sahu, S. K. & Liu, H. Long-read sequencing (method of the year 2022): the way forward for plant omics research. Molecular Plant 16, 791–793 (2023).
Article CAS PubMed Google Scholar
Yang, F. et al. Reciprocal chromosome painting among human, aardvark, and elephant (superorder Afrotheria) reveals the likely eutherian ancestral karyotype. Proceedings of the National Academy of Sciences 100, 1062–1066 (2003).
Article ADS CAS Google Scholar
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint arXiv:1308.2012 (2013).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics, (2020).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology 14, e1005944 (2018).
Article PubMed PubMed Central Google Scholar
Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 5, 4.10. 11–14.10. 14 (2004).
Article Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).
Article ADS CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Research 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Research 32, W309–W312 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER‐P. Current Protocols in Bioinformatics 48, 4.11. 11–14.11. 39 (2014).
Article Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Mount, D. W. Using the basic local alignment search tool (BLAST). Cold Spring Harbor Protocols 2007, pdb. top17 (2007).
Article PubMed Google Scholar
Li, H. et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Research 34, D572–D580 (2006).
Article CAS PubMed Google Scholar
Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32, 268–274 (2015).
Article CAS PubMed Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., Von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Methods 14, 587–589 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_033060105.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_033060095.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_032718755.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_032718585.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_032717405.1 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_032717415.1 (2023).
Shi, M. Annotation files for two elephant genome assemblies. Figshare https://doi.org/10.6084/m9.figshare.23641053 (2023).
Database resources of the national genomics data center, china national center for bioinformation in 2022. Nucleic Acids Research 50, D27-D38 (2022).
Chen, T. et al. The genome sequence archive family: toward explosive data growth and diverse data types. Genomics, Proteomics & Bioinformatics 19, 578–583 (2021).
Article Google Scholar
NGDC Genome Sequence Archive https://bigd.big.ac.cn/gsa/browse/CRA012221 (2023).
Guo, X. et al. CNSA: a data repository for archiving omics data. Database 2020 (2020).
Chen, F. et al. CNGBdb: China National GeneBank DataBase. Hereditas (Beijing) 42, 799–809 (2020).
Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. arXiv preprint arXiv:2106.11799 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 1–27 (2020).
Article Google Scholar
Qi, W. et al. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. GigaScience 11, giac028 (2022).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was supported by the International Cooperation Fund Project of the National Forestry and Grassland Administration: A Study on Population Structure and Genetic Traits of Asian Elephants (hudonghan[2021]No.126), Scientific Research Project of the National Forestry and Grassland Administration: Research on the Driving Factors for the Northward Movement of Asian Elephants in Yunnan, China/Research on the Investigation, Monitoring and Evaluation of Asian Elephant Resources (2021-252), the Fundamental Research Funds for the Central Universities (2572020DR10) and the Guangdong Provincial Key Laboratory of Genome Read and Write (grant No. 2017B030301011). This work was also supported by China National GeneBank (CNGB).

Author information

These authors contributed equally: Minhui Shi, Fei Chen, Sunil Kumar Sahu.

Authors and Affiliations

BGI Life Science Joint Research Center, Northeast Forestry University, Harbin, 150040, China
Minhui Shi, Huan Liu & Tianming Lan
State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
Minhui Shi, Sunil Kumar Sahu, Qing Wang, Huan Liu & Tianming Lan
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
Minhui Shi
Southwest Survey and Planning Institute of National Forestry and Grassland Administration, Kunming, 650031, China
Fei Chen & Zhihong Wang
Asian Elephant Research Center of National Forestry and Grassland Administration, Kunming, 650031, China
Fei Chen & Zhihong Wang
MOE Key Laboratory of Biosystems Homeostasis & Protection, State Conservation Centre for Gene Resources of Endangered Wildlife, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
Shangchen Yang & Sheng-Guo Fang
Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China
Jin Chen & Huan Liu
China National GeneBank, BGI Research, Shenzhen, 518083, China
Jin Chen
College of Wildlife and Protected Area, Northeast Forestry University, Harbin, 150040, China
Zhijun Hou & Tianming Lan

Authors

Minhui Shi
View author publications
You can also search for this author in PubMed Google Scholar
Fei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Kumar Sahu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shangchen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhijun Hou
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Guo Fang
View author publications
You can also search for this author in PubMed Google Scholar
Tianming Lan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.L. and S.G.F. designed the project. F.C., Z.W. and Z.H. collected the elephant samples. M.S., Q.W., S.Y. and J.C. led and finished the DNA and RNA extraction, library preparation, and genome sequencing. M.S., S.K.S. and Q.W. performed the bioinformatics analysis and interpreted the data. M.S. and S.K.S. wrote the manuscript. T.L., H.L. and S.G.F. revised the manuscript. All authors have read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Sheng-Guo Fang or Tianming Lan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, M., Chen, F., Sahu, S.K. et al. Haplotype-resolved chromosome-scale genomes of the Asian and African Savannah Elephants. Sci Data 11, 63 (2024). https://doi.org/10.1038/s41597-023-02729-4

Download citation

Received: 10 July 2023
Accepted: 07 November 2023
Published: 11 January 2024
DOI: https://doi.org/10.1038/s41597-023-02729-4
Springer Nature Limited

Haplotype-resolved chromosome-scale genomes of the Asian and African Savannah Elephants

Abstract

Similar content being viewed by others

Chromosome-scale genome assembly of the brown anole (Anolis sagrei), an emerging model species

Contrasting new and available reference genomes to highlight uncertainties in assemblies and areas for future improvement: an example with monodontid species

Chromosome-scale genome assemblies of Himalopsyche anomala and Eubasilissa splendida (Insecta: Trichoptera)

Background & Summary