Background

The genus Pulmonaria L. (Boraginaceae, sensu [1]) is a taxonomically complex group of species in which the rather similar morphology contrasts with striking karyological variation. In total, 16 different somatic chromosome counts ranging from 2n = 14 to 2n = 38 are currently reported in the genus [Kobrlová unpubl.], with x = 7 proposed as the basic chromosome number [e.g. 26]. According to the traditional morphology- and karyology-based taxonomy [cf. 3, 6], about 30 taxa are recognized (16 species and 14 subspecies), all found in Europe, with the exception of the P. mollis group, which extends as far as north-eastern and eastern Asia [5, 7,8,9]. Although almost nothing is known about the mechanisms of the chromosomal evolution of the genus, such a karyological diversity is indicative of extensive chromosomal rearrangements, i.e. the hypothesis of ancient episodes of polyploidization/hybridization and subsequent dysploidal chromosome reorganizations [cf. 6, 10].

To date, only a few attempts have been made to explore the evolutionary history of the genus Pulmonaria, focusing on a limited or specific group of species [10,11,12,13,14]. These studies have highlighted the significant role of hybridization (e.g. the recent hybrid origin of P. helvetica Bolliger [14] or the P. hirta complex [10]), and provided evidences for the involvement of introgression and dysploidy in the speciation process [10, 13]. In horticulture, Pulmonaria species are known to be able to cross (no geographical isolation, flowering synchrony, similar pollinator preferences), giving rise to a wide variety of cultivars [see 15]. Even in the contact zones of some taxa, intermediate chromosome numbers have rarely been documented [2, 3, 16, 17]. However, most of the evolutionary relationships are still unknown, and the question remains as to what lies behind the observed variability in their morphology and chromosome number.

Along with chromosome number, genome size is another relevant biodiversity trait that can indicate significant genomic events and evolutionary changes [18, 19]. Diversity in genome size is driven by multiple evolutionary processes such as polyploidy, the proliferation of repetitive DNA sequences and/or drastic reduction of non-genic DNA [18, 20,21,22]. Typically, post-polyploid diploidization is thought to be associated with extensive loss of DNA or ‘genome downsizing’ [23,24,25,26,27,28,29]. However, the mechanisms, rates and selection pressures driving these changes in DNA content remain unknown [cf. 28]. Although considerable karyological variation is evident in several Boraginaceae genera [e.g. 3034], there have been almost no complex analyses of genome size variation and the evolutionary pathways behind the observed diversity [but see 3537].

Our study is focused on the P. officinalis group, the most widespread European species complex [3, 38, 39]. This group contains two morphologically relatively distinct species differing in chromosome numbers, namely P. obscura Dumort. (2n = 14) and P. officinalis L. (2n = 16 [e.g. 3, 17, 4042]). Within the latter, two subspecies are sometimes recognized [43,44,45]. As the P. officinalis complex has been cultivated for medicinal and ornamental purposes for a very long time [46, 47], it has probably become even more widespread. In some regions it may also be non-native as a result of frequent cultivation and possible garden escapes [42, 47]. “Pure” Pulmonaria species have often been crossed in cultivation to produce more attractive varieties [cf. 15], which are rarely found in nature. Many of these plants are listed under the name P. saccharata in the horticultural trade. Their origin is not clear, but it can be assumed that some of them are derived from P. officinalis (esp. plants with distinctly white-spotted leaves, cordate at the base). However, they are not “true” P. saccharata sensu Miller, whose taxonomic status has long been debated, and are most likely related to the P. hirta complex [cf. 10, 48, 49].

Despite the morphological and karyological differences observed, the origin of this species group remains unknown. To uncover the differences in the karyotype evolution, we used complex methodological approaches involving genome size estimation, genome skimming followed by bioinformatic analysis and characterization of repeatomes in the P. officinalis group. Specifically, we have analyzed repeatomes and karyotypes of P. obscura and P. officinalis s.str. individuals from pure separate populations, their putative hybrid accessions with 2n = 15, which have been collected in a mixed population where P. obscura and P. officinalis grew together, and three populations of ornamental cultivars, morphologically similar to P. officinalis, that have escaped into the wild where they were collected (here listed as P. saccharata-like). We aimed to identify a new set of chromosome-specific cytogenetic markers that can be used to identify individual chromosomes, and to compare karyotypes of closely related species and their inter-specific hybrids. We also studied the impact of repetitive DNA sequences on genome size in the P. officinalis group. This pilot study can serve as a springboard for future cytogenetic and genomic studies, to understand the role of chromosomal rearrangements in the evolution of this genus.

Results

Genome size and chromosome number variation in the P. officinalis group

A total number of 196 accessions from 65 populations of the Pulmonaria officinalis group were collected throughout Europe (Fig. 1A). The flow cytometry analyses resulted in good quality histograms with distinct peaks and low coefficients of variation (below 5%) for both PI and DAPI staining (Supplementary Table 1). The flow cytometry data were accompanied by chromosome counts, some of which were obtained de novo (see below), but previously published reports were also revised. A total of 758 published chromosome records were found and revised for the P. officinalis complex (Supplementary Table 2). Of these, 289 chromosome reports with 2n = 14 were related to P. obscura and 451 with 2n = 16 to P. officinalis s.str., respectively (Fig. 1A). Three records of 2n = 16 have been published for the subspecies P. officinalis subsp. marzolae G.Astuti, Peruzzi, Cristof. & P.Pupillo. In one case 2n = 17 have been reported for P. officinalis. In addition to this, there are even karyological records (2n = 15) that refer to hybrids of P. obscura and P. officinalis (Supplementary Table 2)

Fig. 1
figure 1

Distribution map of the Pulmonaria officinalis group. (A) Distribution map of the P. officinalis group sampled and analyzed in this study (large dots; see Supplementary Table 1), including published chromosome reports (758 records in total, small dots; see Supplementary Table 2): P. obscura in yellow (2n = 14) and P. officinalis s.str. (2n = 16) in red. Illustrative images of (B) P. obscura (B473.1), (C) P. officinalis s.str. (B470.1) and (D) P. saccharata-like accession (B15.1)

Pulmonaria obscura (2n = 14; mean 2C = 2.92 ± 0.10 pg) and P. officinalis s.str. (2n = 16; mean 2C = 3.21 ± 0.13 pg) differed in holoploid genome size (2C value: ANOVA, F = 93.69, P < 0.001, Fig. 2A) as well as in the genomic GC content (Fig. 2B). Their DNA amount ranges are listed in Table 1. The genome size value and GC content of the presumed hybrid plants (B481) with 2n = 15 (2C = 3.08 ± 0.02 pg and 35.42%, respectively) were between those of the supposed parents (i.e., within the theoretically expected intermediate genome size and GC content based on the genome sizes of P. obscura and P. officinalis; Table 1). Among P. saccharata-like populations, B465.1 with 2n = 16 had the genome size value (2C = 3.13 pg) in the range of P. officinalis. The remaining two populations (B15: 2C = 3.74 ± 0.03 pg; B472.1: 2C = 3.63 pg) had larger genome sizes, but differ in the number of chromosomes (B15: 2n = 15 vs. B472.1: 2n = 16) and GC content (Table 1).

Fig. 2
figure 2

Genome size and GC content variation in Pulmonaria obscura and P. officinalis. (A) Variation in absolute genome size (2C-value); and (B) genomic GC content detected in the P. officinalis group: P. obscura (POBS), P. officinalis (POFF). Rectangles define the 25th and 75th percentiles, horizontal lines show median values, whiskers are 10–90 percentiles

Table 1 Summary of flow cytometric analyses of the Pulmonaria officinalis group. (see Supplementary Table 1 for more details). N pop: number of populations analyzed; N ind: number of individuals analyzed; 2n: number of chromosomes; Mean 2C [pg]: 2C DNA content, i.e. mean of the nuclear DNA content (pg/2C); Min/Max: minimum and maximum of the nuclear DNA content; 1C [Mbp]: 1C value in Mbp, 1 pg = 978 Mb; C/n [pg]: average chromosome sizes, calculated by dividing total somatic DNA (2C) by somatic chromosome number (2n); GC [%]: genomic GC content

Genome-wide repeatome analysis

To identify the major types of repetitive sequences and to compare their genome representation in P. officinalis group, the comparative mode of RepeatExplorer2 pipeline was used. The analysis was performed on genome skimming data of P. obscura (2n = 14; B473.1, Fig. 1B), P. officinalis (2n = 16; B470.3, Fig. 1C), their putative interspecific hybrid (2n = 15; B481.1) and three P. saccharata-like accessions (B15.1; 2n = 15, Fig. 1D and B465.1, B472.1; both 2n = 16). The identified repetitive sequences accounted for 58.48% of the P. obscura genome and 47.86% of the P. officinalis genome. A similar and overall highest proportion of repetitive sequences was found in repeatomes of the putative interspecific hybrid B481.1 (64.10%) and P. saccharata-like accession B15.1 (64.53%), while the repetitive DNA content of the other two P. saccharata-like accessions (B465.1 and B472.1) was around 50% (Table 2; Fig. 3, Supplementary Table 3).

Fig. 3
figure 3

Repeatome composition in analyzed Pulmonaria accessions. Genome proportion of individual repeat type was obtained as the ratio of reads specific to the individual repeat type to all reads used for the clustering analysis by the RepeatExplorer2 pipeline. P. obscura (POBS; B473.1); P. officinalis (POFF; B470.3); an interspecific natural hybrid P. obscura × P. officinalis (HYBR; B481.1); P. saccharata-like accessions (PSAC1–PSAC3; B465.1; B472.1; B15.1)

Table 2 Proportion of repetitive DNA sequences identified de novo in Pulmonaria taxa. POBS: P. obscura (2n = 14; B473.1), POFF: P. officinalis (2n = 16; B470.3), HYBR: P. obscura x P. officinalis, putative interspecific hybrid (2n = 15; B481.1), PSAC: P. saccharata-like accessions, i.e. PSAC1 (2n = 16; B465.1), PSAC2 (2n = 16; B472.1) and PSAC3 (2n = 15; B15.1)

In all studied taxa, LTR retroelements were the most abundant repeats, accounting for 46.68% of P. obscura genome, 34.52% of P. officinalis genome and 33.80-49.96% of the other genomes. The genome proportion of Ty3/Gypsy elements, which were mainly represented by Tekay lineage, ranged from 23.90% in P. saccharata-like (B472.1) to 35.06% in putative interspecific hybrid (B481.1). Ty3/Gypsy elements were nearly twice abundant as Ty1/Copia superfamily, which was mostly represented by elements from SIRE and Angela clades (Table 2; Fig. 3). DNA transposons and long interspersed nuclear elements (LINE elements) were found in low copy numbers in all analyzed groups, with genome proportions ranging from 0.16 to 0.22% and from 0.14 to 0.23%, respectively (Table 2). rDNA sequences accounted for about 3.00–5.00%, and other tandem organized repeats represented 1.25% to more than 4.00% of the Pulmonaria genomes (Table 2).

Variation in the Satellite DNAs (satDNAs) and rDNA clusters

Five putative satellites (tandem organized repeats) were identified by TAREAN program [50] in individual P. obscura and P. officinalis datasets. Three of them (PulTR01_29, PulTR03_308, PulTR05_70) were shared by both species, whereas PulTR02_305 was found only in P. obscura, and PulTR04_420 was only detected in the P. officinalis genome. All identified tandem repeats produced chromosome-specific signals and were used together with rDNA sequences to create karyotypes of the Pulmonaria accessions studied.

The Pulmonaria 45S rDNA unit varied from 9.1 kb to 9.6 kb in length and was found to be highly conserved at the sequence level within all accessions studied (Supplementary Fig. 1). The intergenic spacer (IGS) region contained two different tandemly organized repeats (with repetitive units of 117 and 150-nt long) which were identical in all the accessions studied, with the exception of a P. saccharata-like plant B472.1, which contained tandem regions with shorter repetitive units (79 and 150-nt long). A higher variability was found in the IGS, which contained two relatively large INDEL regions that differed between P. obscura and P. officinalis (Supplementary Fig. 2A). To support this observation, we performed read mapping of Pulmonaria accessions to the assembled P. obscura 45S rDNA reference (Supplementary Fig. 2B). The differences in read coverage along the 45S rDNA unit are evident in the IGS, indicating the variability at the sequence level. It should be pointed out that the assembly of the 45S rDNA unit was obtained from short Illumina sequences, so the resulting consensus sequence may represent the most abundant sequence type.

The graph-based clustering algorithm of the RepeatExplorer2 pipeline allowed complete reconstruction of the 5S rDNA unit, even in the comparative analysis (Fig. 4). The graph region representing the 5S rRNA gene was shared by all Pulmonaria accessions studied (Fig. 4B). Three variable graph loops emanating from this conserved graph region correspond to three types of 5S rDNA unit, differing in the length of their IGS (Fig. 4A, B). Mapping of sample-specific sequencing reads onto the graph topology revealed the presence of different 5S rDNA units between P. obscura and P. officinalis (Fig. 4C, D). The cluster layout of the presumed hybrid between P. obscura and P. officinalis (B481.1), was represented by all three 5S rDNA graph loops specific to its putative progenitors (Fig. 4E). A similar situation was observed for the P. saccharata-like accession B15.1 (Fig. 4F). The other P. saccharata-like individuals (B465.1 and B472.1) shared the same graph layout as P. officinalis (i.e. B470.3; Fig. 4G, H).

Fig. 4
figure 4

Graph structure of 5S rDNA sequence reads from the comparative analysis of RepeatExplorer2. (A) Graph structure obtained from all sequence reads homologous to the 5S rDNA. (B) The position of the 5S genic region on the graph topology is highlighted in yellow. (CH) Cluster graph with annotated read origin: P. obscura in green (C); P. officinalis in orange (D); reads specific for P. obscura × P. officinalis (B481.1) in blue (E); reads specific for P. saccharata-like accession B15.1 are highlighted in purple (F), B465.1 in pink (G), and B472.1 in red (C)

The analysis of 5S rDNA specific graph layouts observed by clustering analysis of individual accessions confirmed the observation of the comparative analysis. All Pulmonaria individuals studied were composed of at least two different types of 5S rRNA genes, which differed in the IGS (Supplementary Fig. 3).

Comparative karyotyping in the P. officinalis group

Molecular karyotyping was performed using newly identified satellites (PulTR01_29, PulTR02_305, PulTR03_308, PulTR04_420, and PulTR05_70) and 5S and 45S rDNA sequences. In general, in situ hybridization confirmed the results obtained by repeatome analysis. FISH analysis with the probes for rDNAs and four satDNAs resulted in well visible cluster signals on specific chromosomes in the genome of the analyzed plants of the P. officinalis complex.

P. obscura, P. officinalis and their putative natural interspecific hybrid

FISH analysis of P. obscura plants (all 2n = 14) collected from three different populations (B467, B469, B473; Supplementary Table 1) showed highly consistent results, with only minor differences observed in plants B473.1 - in the number of satDNA PulTR01_29 (Fig. 5A). The 45S rDNA was located into terminal NOR regions on four chromosome pairs, while 5S rDNA signals were detected on two chromosome pairs in pericentromeric regions (Figs. 5A, B and 6A, B, F and G). One chromosome pair contained signals of 5S, 45S rDNA and satellites PulTR02_305 and PulTR05_70 (Figs. 5A, B and 6B, D, F and G). In B473.1, the probe for PulTR01_29 provided very strong signals in subtelomeric regions only on one chromosome pair (Fig. 6E, G). The same chromosome pair also contained a signal of PulTR03_308 in the pericentromeric region. In B467.2 and B469.1, one additional subtelomeric signal of PulTR01_29 was found, located on a chromosome arm with 5S rDNA cluster (Figs. 5B and 6E and G). As expected from the results of the RepeatExplorer2 analysis, signals of PulTR04_420 were not detected in any of the P. obscura plants analyzed.

Fig. 5
figure 5

Idiograms of analyzed Pulmonaria accessions

Fig. 6
figure 6

Chromosomal localization of newly identified satDNAs and rDNA sequences in P. obscura (2n = 14). (A, B, C, D) P. obscura (B473.1) with probes for: (A) 45S rDNA (yellow), PulTR01_29 (red) and 5S rDNA (green); (B) 45S rDNA (yellow), PulTR02_305 (red) and 5S rDNA (green); (C) PulTR01_29 (red), PulTR03_308 (orange), and 5S rDNA (green); (D) PulTR02_305 (yellow), 5S rDNA (red) and PulTR05_70 (green): yellow arrows indicate signals of PulTR02_305 and green arrows indicate signals of PulTR05_70. (E, F, G, H) P. obscura (B467.2) with probes for: (E) 45S rDNA (yellow), PulTR01_29 (red) and 5S rDNA (green): red arrow indicate subtelomeric signals of PulTR01_29; (F) 45S rDNA (yellow), PulTR02_305 (red) and 5S rDNA (green): red arrows indicate signals of PulTR02_305; (G) PulTR01_29 (red), PulTR03_308 (orange), and 5S rDNA (green); and (G) PulTR02_305 (yellow), 5S rDNA (red) and PulTR05_70 (green): yellow arrows indicate signals of PulTR02_305 and green arrows indicate signals of PulTR05_70. White arrows indicate 5S rDNA in all figures. Chromosomes were counterstained with DAPI (blue). Bars = 5 μm

The genome of P. officinalis contained one additional pair of chromosomes (i.e. 2n = 16) compared to P. obscura. The molecular karyotype was studied in individuals from two different populations (B100, B470; Supplementary Table 1) and, in contrast to P. obscura, a higher variation in the chromosomal distribution of the probes was found even between individuals within the same population (Fig. 5C, D; Supplementary Fig. 4). Terminal NORs (45S rDNA) were found only on three chromosome pairs in both analyzed individuals (B100.2, B470.1). These terminal 45S rDNA loci were often fragile and distended from the chromosomes (Fig. 7A, H, I). Interstitial 45S rDNA clusters were detected on one chromosome pair, on the same arm which also contained signals of satDNA PulTR04_420, PulTR01_29 and PulTR05_70 (Figs. 5C, D and 7B, E, H and I). Additional signals of PulTR05_70 were detected on two chromosome pairs that also contained 45S rDNA loci (Fig. 7E, I). The genome of B100.2 contained an additional weak signal of interstitial 45S rDNA (Figs. 5D and 7F, G and H). Variation in the number of signals specific to 5S rDNA was detected. While signals of 5S rDNA were identified in pericentromeric regions on seven chromosomes of B470.3 (Fig. 7A, E), only six chromosomes were bearing these signals in B100.2 (Fig. 7G, H). The signal of PulTR03_308 was detected on one NOR bearing chromosome pair in both individuals with various combinations of co-localization with PulTR04_420 and 5S rDNA (Figs. 5C, D and 7C, D and G). Signals of PulTR02_305 were not detected on chromosomes of P. officinalis, supporting the results of comparative repeatome analysis by RepeatExplorer2.

Fig. 7
figure 7

Chromosomal localization of newly identified satDNAs and rDNA sequences in P. officinalis (2n = 16). (A, B, C, D, E) P. officinalis (B470.3) with probes for: (A) 45S rDNA (yellow), PulTR01_29 (red) and 5S rDNA (green); (B) 45S rDNA (yellow), PulTR01_29 (red) and PulTR04_420 (green): green arrows indicate PulTR04_420 and; (C, D) the same plate with the signals for (C) 45S rDNA (red) and PulTR03_308 (green), and (D) co-localization of PulTR03_308 (green) and PulTR04_420 (red): green arrows indicate PulTR03_308, and red arrows indicate PulTR04_420; (E) 45S rDNA (yellow), PulTR05_70 (red), 5S rDNA (green): red arrows indicate signals of PulTR05_70. (F, G, H, I) P. officinalis (B100.2) with probes for: (F) 45S rDNA (yellow) and PulTR01_29 (red); (G) 45S rDNA (yellow), PulTR03_308 (red) and 5S rDNA (green): red arrows indicate PulTR03_308; and (H) 45S rDNA (yellow), PulTR04_420 (red), and 5S rDNA (green): red arrows indicate PulTR04_420; (I) 45S rDNA (yellow), PulTR05_70 (red) and PulTR03_308 (green): red arrows indicate signals of PulTR05_70 and green arrows point to PulTR03_308. White arrows indicate signals of 5S rDNA and yellow arrows point at interstitial 45S rDNA clusters. Chromosomes were counterstained with DAPI (blue). Bars = 5 μm

Karyotype analysis of the putative natural hybrid between P. obscura and P. officinalis (B481.1; Supplementary Table 1), confirmed the expected chromosome number 2n = 15., and the presence and the distribution pattern of P. obscura and P. officinalis species-specific satDNAs (Fig. 5E). In this case, eight 45S rDNA clusters were found, seven in terminal chromosomal regions and one in an interstitial position (Figs. 5E and 8A and C). 5S rDNA clusters were detected on five chromosomes in pericentromeric regions (Fig. 8A, B). One chromosome pair contained signals of PulTR01_29 and 5S rDNA (Fig. 8B, C), another individual chromosome contained 45S and 5S rDNA, PulTR02_305 (found only as one single locus in the genome) and PulTR05_70 (Fig. 8A, D). Additional two chromosomes contained one or two signals of 5S rDNA, respectively (Figs. 5E and 8A and B). Five signals of the most abundant satellite PulTR01_29 were detected on five chromosomes (Fig. 8B, C), one of which co-localized with the 45S rDNA locus, one signal was detected on the chromosome bearing weak interstitial signal of 45S and PulTR05_70 and one with signal of PulTR03_308 (Fig. 8B) and PulTR04_420 (Fig. 8C), respectively. One chromosome pair with the remaining two signals of PulTR01_29 also contained a signal of 5S rDNA (Fig. 8B). SatDNAs PulTR03_308 and PulTR04_420 co-localized on one chromosome pair, containing also PulTR01_29, or joint signal of 45S rDNA and PulTR01_29 (Fig. 8C). Finally, two additional signals of PulTR05_70 were found on chromosomes with terminal 45S rDNA clusters (Fig. 8D).

Fig. 8
figure 8

Chromosomal localization of new satDNAs and rDNA sequences in natural hybrid P. obscura × P. officinalis (B481.1; 2n = 15). (A) Probes for 45S rDNA (yellow), PulTR02_305 (red) and 5S rDNA (green): red arrow points at the locus of PulTR02_305; (B) PulTR01_29 (red), PulTR03_308 (orange) and 5S rDNA (green): orange arrows indicate PulTR03_308; (C) 45S rDNA (yellow), PulTR01_29 (red) and PulTR04_420 (green): green arrows indicate PulTR04_420; red arrow points to PulTR01_29 and yellow arrow points at 45S rDNA locus indicating co-localization of these two probes; (D) 45S rDNA (yellow), PulTR05_70 (red) and 5S rDNA (green): red arrows indicate signals of PulTR05_70. White arrows indicate 5S rDNA signals. Chromosomes were counterstained with DAPI (blue). Bars = 5 μm

Ornamental garden escapes, morphologically similar to P. Officinalis

Karyotype analysis of P. saccharata-like plants from three populations (B15, B465, B472; Supplementary Table 1) revealed variability in chromosome number, as well as variation in localization of satDNAs and rDNA sequences (Fig. 5F, G, H). While B465.1 and B472.1 were characterized by 2n = 16 and have very similar karyotypes resembling P. officinalis (Figs. 5F, G and 9A – I), 2n = 15 was detected in B15.1 with a pattern of cluster signals similar to that of the putative interspecific hybrid (Figs. 5H and 9J – M). The karyotypes of B465.1 and B472.1 were very similar, containing terminal NORs on three chromosome pairs and one interstitial 45S rDNA cluster on one additional chromosome pair (Fig. 9A, C, F – I). Two chromosome pairs were bearing PulTR01_29 in subtelomeric regions and other two chromosome pairs contained 5S rDNA clusters in pericentromeric regions (Fig. 9A, B, F). The signal of PulTR04_420 was found on two chromosome pairs (Fig. 9A, D, F, G). One of these chromosome pairs also contained signals of PulTR01_29, an interstitial signal of 45S rDNA and PulTR05_70 (Fig. 9E, I), while the other chromosome pair bearing terminal 45S rDNA and the PulTR04_420 co-localized with PulTR03_308 (Figs. 5F, G and 9A and D – G). Additional two chromosome pairs with terminal loci of 45S rDNA also contained signals of PulTR05_70 (Fig. 9E, I). However, these two accessions differed in the presence of an additional signal of 5S rDNA and PulTR01_29 (Figs. 5F, G and 9).

Fig. 9
figure 9

Chromosomal localization of new satDNAs and rDNA sequences in P. saccharata-like accessions. (A, B, C, D, E) P. saccharata-like accession B465.1 (2n = 16) with probes for: (A) 45S rDNA (yellow), PulTR01_29 (red) and PulTR04_420 (green): green arrows indicate PulTR04_420; (B) PulTR01_29 (red) and 5S rDNA (green); (C) 45S rDNA (red) and PulTR03_308 (green): green arrows point at PulTR03_308; and (D) PulTR03_308 (green) and PulTR04_420 (red): green arrows point at PulTR03_308 and red arrows indicate PulTR04_420; (E) 45S rDNA (yellow), PulTR05_70 (red) and 5S rDNA (green): red arrows point to signals of PulTR05_70. (F, G, H, I) P. saccharata-like accession B472.1 (2n = 16) with probes for: (F) 45S rDNA (yellow), PulTR01_29 (green) and PulTR04_420 (red): red arrows point at PulTR04_420; (G) 45S rDNA (yellow), PulTR03_308 (green) and PulTR04_420 (red): green arrows point at PulTR03_308, and red arrows indicate PulTR04_420; (H) 45S rDNA (yellow), PulTR03_308 (red) and 5S rDNA (green); (I) 45S rDNA (yellow), PulTR05_70 (red) and PulTR03_308 (green): red arrows indicate signals of PulTR05_70 and green arrows point to PulTR03_308. (J, K, L, M) P. saccharata-like accession B15.1 (2n = 15) with probes for: (J) 45S rDNA (yellow), PulTR02_305 (red) and 5S rDNA (green): red arrow points at a signal of PulTR02_305; (K) 45S rDNA (yellow), PulTR01_29 (red) and PulTR04_420 (green): green arrows point at PulTR04_420; (L) PulTR03_308 (red) and 5S rDNA (green); (M) 45S rDNA (yellow), PulTR05_70 (red) and 5S rDNA (green): red arrows point to signals of PulTR05_70. White arrows indicate 5S rDNA signals and yellow arrows point at interstitial 45S rDNA clusters. Chromosomes were counterstained with DAPI (blue). Bars = 5 μm

The karyotype of B15.1 was similar to that of the putative interspecific hybrid (B481.1) with the same number of chromosomes (Fig. 5H). 45S rDNA clusters were located as strong signals in terminal regions of three chromosome pairs, one additional chromosome contained a weak terminal signal and another contained an interstitial signal (Fig. 9J, K). Odd number of signal localizations was detected for 5S rDNA (five chromosomes with interstitial signals) (Fig. 9J, K), PulTR01_29 (three chromosomes with the signals in terminal regions), PulTR04_420 (three chromosomes with interstitial signals), and for PulTR02_305 (found only on one chromosome) (Fig. 9H – J). PulTR03_308 provided signals on one chromosome pair co-localizing with PulTR04_420 (Figs. 5H and 9K and L). Finally, four signals of PulTR05_70 were found on three chromosomes with terminal 45S rDNA loci (on one of them co-localizing with 5S rDNA and PulTR02_305), and one chromosome with interstitial signal of 45S rDNA, PulTR04_420 and PulTR01_29 (Fig. 9M).

Discussion

The genus Pulmonaria is karyologically highly variable [e.g. 3, 6], but the origin and evolutionary consequences of genome size and karyotype variation remain unexplored [but see 10]. It appears that chromosomal rearrangements have played an important role in the evolution of this genus [cf. 10, 14, 51], but how and to what extent has never been clearly demonstrated. Therefore, we performed a pilot analysis of genome size and a comparative analysis of the repeatomes in the P. officinalis species group.

Impact of DNA repeats dynamics on genome size

Genome size can reflect some aspects of the evolutionary history of taxa by allowing us to understand the influence of DNA gain/loss between related species [e.g. 52, 53]. Our study represents the first large-scale investigation of interspecific genome size variation in Pulmonaria. As already shown in a pilot study by Kobrlová & Hroneš [31], genome size is effective in delimiting morphologically similar taxa of the Boraginaceae, which is also true for the P. officinalis group. The suitability of using flow cytometry to revise the distribution of the P. officinalis group (i.e. relative genome size) has already been documented in the Bohemian Forest and adjacent foothills [42].

So far, the genome size has only been estimated for eight Pulmonaria taxa, including P. officinalis and P. obscura, ranging from 2.27 to 4.27 pg (i.e. very small/small genomes according to the categories of Leitch et al. [54], Supplementary Table 5). Only minor differences were observed when comparing previously analyzed genome sizes of P. obscura and P. officinalis with our data, most likely due to different methodologies used (i.e. nuclei isolation buffer, reference standard, plant organ [cf. 31, 51]). The only exception is the study by Šmarda et al. [55], where almost the same 2C values are presented for P. obscura and P. officinalis, probably as a consequence of taxa misidentification.

There is enormous variation in the size of plant genomes, with much of this diversity driven by differences in the abundance of DNA transposons [e.g. 22, 56,57,58,59,60]. We found that most of the repetitive elements in the genomes of the Pulmonaria taxa studied were dispersed repeats represented by LTR retrotransposons [cf. 22, 58], with higher proportion of Ty3/Gypsy elements, which were twice more abundant than Ty1/Copia. Ty3/Gypsy elements represent one of the major classes of LTR retrotransposons and are dominant in many plant groups, such as the family Poaceae [e.g. 59, 61,62,63] or the tribe Fabeae [22, 64]. Unfortunately, a genome-wide analysis of DNA repeats and their impact on genome size has not been performed for any member of the Boraginaceae family. However, the higher proportion of Ty3/Gypsy retroelements have also been found in genera of the closely related Solanaceae family, such as Solanum, Nicotiana and Capsicum [65,66,67,68,69,70]. In contrast, recent studies in the genus Salvia, a member of the closely related Lamiaceae family, have shown that the nuclear genomes of different species contain different proportions of Ty3/Gypsy and Ty1/Copia retroelements [71,72,73], indicating a proliferation of different types of DNA repeats during the evolution of individual species. In comparison, the studied Pulmonaria species contained a similar proportion of the repeat lineages and individual clusters were represented by reads from all specimens analyzed. This indicates a high degree of genome homology within the P. officinalis complex, suggesting that the evolution of this species group was not accompanied by a dramatic diversification of DNA transposons, as previously shown in other plant species [e.g. 59]. To better understand the proliferation of DNA repeats during genome evolution and its impact on genome size variation and speciation, analysis of a larger data set of Pulmonaria species from different phylogenetic groups is required.

Satellite DNAs and their use in comparative karyotyping

The almost identical cytogenetic pattern of satDNAs and rDNA sequences in P. obscura (2n = 14), collected from three different populations, suggests karyotype stability in this diploid species. In comparison, the chromosome structure in P. officinalis appears to be more dynamic, as individuals from two different populations differ slightly in the cytogenetic pattern of the satDNAs and rDNA sequences. Odd number of signals of some satDNAs, rDNA sequences, and interstitial 45S rDNA clusters were found in both diploid accessions (2n = 16), indicating chromosomal structural changes involved in the origin and evolution of P. officinalis. It is generally accepted that n = 7 is the basic chromosome number in Pulmonaria [e.g. 2, 3, 4, 5, 6], which raises the question of how the species represented by different chromosome numbers arose. Unfortunately, the available data do not allow us to answer this question. Only a robust phylogeny of the whole genus and comparative analysis of genome structure would give a clearer idea of the chromosomal changes that could explain the origin of Pulmonaria species with different chromosome numbers.

Evidence of hybridization within the P. officinalis complex

Several molecular studies have been published highlighting the important role of hybridization and introgression in the evolution of the genus Pulmonaria [10, 13, 14]. Some species groups exhibit weak ecological and geographic isolation [see 56, 75], near-synchronous phenology and pollinator sharing, all of which may facilitate the hybridization [cf. 13]. So far, however, natural hybrids have only occasionally been identified by chromosome counting [3, 16] or distinguished on the basis of intermediate morphology [e.g. 74]. This is particularly true for the P. officinalis complex, which is widespread in Europe and therefore often in secondary contact with other Pulmonaria species [cf. 38, 75]. As the ranges of P. obscura and P. officinalis partly overlap (Fig. 1A), the co-occurrence of both species in the same habitat can be expected. Some authors have occasionally reported mixed populations, with morphological intermediates rarely observed [16, 17]. However, the extent of hybridization between these two species is still controversial and has not been clearly confirmed. Nevertheless, several karyological data referring to as P. obscura × P. officinalis with an intermediate number of chromosomes (i.e. 2n = 15), may provide convincing evidence of an ongoing hybridization between these two species [13, 16, 17].

In this study, we analyzed presumed hybrids from a mixed population (B481) of P. obscura and P. officinalis. Chromosome counting in all three analyzed individuals confirmed 2n = 15, and their hybrid origin was also supported by their genome sizes, halfway between those of the parents (Table 1). The cytogenetic mapping of the set of satDNAs and rDNA sequences also supports their hybrid origin, by the presence of P. obscura and P. officinalis species-specific satDNAs in haploid state, as well as their pattern on chromosomes, which was further supported by a detailed analysis of 5S rDNA sequences. As recently shown, graph-based clustering of the RepeatExplorer pipeline enables reconstruction of complete 5S rDNA sequences from genome skimming data and provides clues to the evolutionary history of interspecific hybrids and allopolyploids [76, 77].

Origin of ornamental cultivars morphologically similar to P. officinalis

As a valuable medicinal and ornamental plant, P. officinalis is represented in horticulture by several cultivars and has also been used to generate new artificial hybrids [cf. 15]. This seems to be the case for plants with distinctly white-spotted leaves, cordate at the base, which are sometimes offered commercially as P. saccharata. However, they are not “true” P. saccharata sensu Miller [see 10, 48, 49]. These plants often escape into the wild and are sometimes confused with P. officinalis. The origin of these cultivars is unknown, they only resemble P. officinalis complex in their morphology.

Our cytogenetic analysis and detailed examination of the reconstructed 5S rDNA sequence indicate that two analyzed P. saccharata-like accessions with 2n = 16 (B465.1 and B472.1) are probably derived from the P. officinalis.

On the other hand, an interesting cytogenetic pattern was observed in the third P. saccharata-like plant analyzed (B15.1). The karyotype and reconstructed 5S rDNA units of this plant were similar to that of the interspecific hybrid B481.1, with the same chromosome number. In contrast, the genome size of B15.1 (the whole population, respectively) was the largest in the whole data set presented (Table 1). However, unlike the population B481, population B15 was collected in the area where only P. obscura occurs naturally and where no population of P. officinalis has been confirmed (Kobrlová, pers. obs.). The morphology of the plants was also typical for cultivated P. saccharata-like plants. Their origin therefore requires further investigation, although the cytogenetic data presented suggest a hybrid origin between P. obscura and P. officinalis (e.g. phylogenetic revision and analysis of a larger data set of Pulmonaria species from different phylogenetic groups). As this population is a garden escape, its geographical origin is unclear and it cannot be ruled out that it was originally collected from a mixed population of both species.

Conclusions

Our study provides comprehensive information on genome size variability and repeatome dynamics of the two morphologically similar species of the P. officinalis group. Large-scale genome size analysis using flow cytometry confirmed a significant difference in DNA content between P. obscura and P. officinalis, corresponding to the number of chromosomes. Genome skimming of six accessions, including putative natural hybrid of P. obscura and P. officinalis, and ornamental garden escapes resembling P. officinalis, showed that a large proportion of their genomes is represented by various types of DNA transposons, with Ty3/Gypsy elements being the most abundant. Comparative analysis of the repeatomes revealed no species-specific retrotransposons or striking differences in their copy number between the species, suggesting a common evolutionary history. Comparative karyotyping supported the hybrid origin of putative hybrids with 2n = 15, collected from a mixed population of P. obscura and P. officinalis, and also outlined the origin of ornamental garden escapes morphologically similar to the P. officinalis complex. Finally, databases of repeats were created, and can be used for repeat identification (or masking) in future sequencing projects.

Materials and methods

Plant material

A total of 196 plants from 65 populations of the Pulmonaria officinalis group (Fig. 1A), representing typical populations of P. obscura and P. officinalis s. str. (Figure 1B and C), including their potential hybrids and several garden escapes of cultivars morphologically similar to P. officinalis (in horticulture often referred to as P. saccharata, here listed as P. saccharata-like, Fig. 1D), were included in this study (see Supplementary Table 1). An Italian taxon, P. officinalis subsp. marzolae, was not part of this study and is therefore not discussed further, only mentioned in the karyological review (see Supplementary Table 2). These samples were collected between 2014 and 2023 from natural populations across Europe, some of which were cultivated in the experimental garden of Palacký University in Olomouc, Czech Republic, or deposited in the Herbarium of Palacký University in Olomouc (OL).

Flow cytometry: genome size and GC content

Estimation of the nuclear DNA content, i.e. absolute genome size (AGS [78]), and DNA base composition (GC content [79]) were estimated using Partec PAS and Partec ML instruments, with PI (propidium iodide) and DAPI (4,6-diamidino-2-phenylindole) staining. The same methodology as in Kobrlová and Hroneš [31] was followed, using fresh, rarely silica dried, leaves for sample preparation. Pisum sativum L. ‘Ctirad’ (2C = 9.09 pg [80]; GC content = 38.5% [81]) was selected as a primary internal standard, since it has non-overlapping genome size with neither G1 and nor G2 phase of all studied samples. The conversion from picograms (pg) to base pairs (bp) followed Doležel et al. [82], using 1 pg DNA = 978 Mbp. DNA base content was estimated using the protocol and GC content calculation tool of Šmarda et al. [79]. One-way ANOVA was used to test for differences between population means of genome size/GC content of P. obscura and P. officinalis. The data analyses were performed using the NCSS 9 statistical software [83].

DNA extraction and sequencing

Genomic DNA was isolated using alkyltrimethylammonium bromide (MATAB) lysis: after sorbitol washes, the ground plant material was incubated in 2% (w/v) MATAB for 20 min at 65 °C, immediately after the incubation, the same volume of chloroform: isoamyl alcohol (24:1) was added, gently but thoroughly mixed and centrifuged at 10,000 g for 3 min at 4 °C. After centrifugation, aqueous upper phase was collected to a new tube and this step was repeat until the upper phase was clear. Genomic DNA was precipitated by adding 0.7 volume of isopropanol, centrifuged at 10,000 g for 3 min at 4 °C. Finally, the pellet was washed by cold 70% and 96% ethanol, air dried and diluted in TE buffer, pH 8.

Genomic DNA was sheared by Bioruptor Plus (Diagenode, Liege, Belgium) to achieve an insert size of about 500 bp. Libraries for sequencing were prepared from 2 µg of fragmented DNA using TruSeq® DNA PCR-free kit (Illumina) and sequenced on a NovaSeq 6000 (Illumina), producing 2 × 100-bp or 2 × 150-bp paired-end reads to achieve at least 3Gb of nucleotide sequence per each genotype. Raw data were trimmed for low-quality bases and adapter sequences and to the same length using fastp v.0.20.1 [84].

Analysis and characterization of DNA repeats

Random datasets corresponding to 0.1× coverage (Supplementary Table 6) of the individual accessions were used for reconstruction and characterization of DNA repeats using RepeatExplorer2 (long queue was used for comparative analysis as well as for single species clustering: -l select = 1:ncpus = 16:mem = 112gb: scratch_local = 50gb -l walltime = 336:00:00 -q elixirre@pbs.elixir-czech.cz -v TAREAN_MAX_MEM = 64000000,TAREAN_CPU = 150) [85], that includes TAREAN analysis tool for identification of tandemly organized repeats [50]. RepeatExplorer2 and TAREAN analyses were also used to perform comparative analysis of Pulmonaria repeatomes on a merged dataset containing all studied individuals (1 mil. reads per accession; Supplementary Table 6), marked by specific prefixes. In both cases, the resulting clusters of repeats were characterized by various tools, including BLASTN and BLASTX, and phylogenetic analysis of the repetitive elements’ coding domains [86, 87]. The presence of tandemly organized repeats within the clusters identified by TAREAN was confirmed with Dotter [88].

The results of the clustering were then used to create repetitive databases. Databases of Illumina reads were deposited in the Sequence Read Archive (project number: PRJNA1076467). Assembled contigs from different types of repetitive DNA elements are publicly available online (https://olomouc.ueb.cas.cz/en/content/dna-repeats). The sequences of newly identified tandem organized repeats and 5S rDNA which were used as cytogenetic markers were deposited in GenBank (accessions: PP457292–PP457296). Cluster graphs of 5S rDNA sequences were visualized using SeqGrapheR visualization tool [85]. The reconstruction of the whole 45S rDNA unit was performed according to Kapustová et al. [89].

Preparation of chromosome spreads

Mitotic metaphase chromosome spreads were prepared from root meristems by a dropping method according to Šimoníková et al. [90]. Briefly, actively growing root tips of Pulmonaria were collected and pre-treated in 0.05% (w/v) colchicine for three hours at room temperature, fixed in 3:1 ethanol: acetic acid fixative overnight at 4 °C and stored in 70% ethanol at − 20 °C. Chromosome preparations were prepared using the drop technique according to Kato et al. [91, 92], with minor modifications: After washing in 75 mM KCl and 7.5 mM EDTA (pH 4), root tip segments were digested in a mixture of 2% (w/v) cellulase and 2% (w/v) pectinase in 75 mM KCl and 7.5 mM EDTA (pH 4) for 45 min at 37 °C. The cell suspension was dropped onto glass slides in a box lined with wet paper towels and let dried.

Probe design and fluorescence in situ hybridization

Consensus sequences of TAREAN analysis which contained tandemly organized repeats were used for specific primer design using the Primer3 program [93]. PCR products were sequenced using the BigBye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, USA) according to the manufacturer’s instructions and run on ABI 3730xl DNA analyzer (Applied Biosystems) to confirm the accuracy of the sequences used for FISH analyses. Sequences are publicly available online (https://olomouc.ueb.cas.cz/en/content/dna-repeats). Probes for newly identified tandem repeats were labeled by PCR either directly with Cy5 fluorochrome (Thermo Fisher Scientific), DEAC (Jena Biosciences, Jena, Germany), or indirectly with biotin-dUTP or digoxigenin-dUTP (Sigma Aldrich/Roche Applied Science, Mannheim, Germany) using primers listed in Supplementary Tables 4 and P. obscura DNA as template. The 25 µl of PCR mix contained 30 ng of genomic DNA, 200 µM dNTPs including directly- or indirectly-labeled dUTP, 1 µM primers and 0.5 U of Q5 High-Fidelity DNA polymerase and appropriate reaction buffer (New England Biolabs, Massachusetts, USA). Plasmid pTa71 (45 S rDNA) containing 9-kb fragment from Triticum aestivum with 18S-5.8S-26S rDNA and intergenic spacers [94] was labeled by nick translation (Sigma Aldrich) using Cy5 fluorochrome (Thermo Fisher Scientific).

Hybridization mixture containing 50% (v/v) formamide, 10% (w/v) dextran sulfate in 2 × SSC and 10 ng/µl of labeled probes was added onto slide and denatured for 30s at 80 °C, followed by overnight hybridization performed in a humid chamber at 37 °C. If the chromosome structure was damaged after the denaturation step, the slides with chromosome spreads were post-fixed in 4% (v/v) formaldehyde in 2 × saline-sodium citrate (SSC) for 10 min at room temperature, washed in 2 × SSC for 2 × 5 min, and dehydrated using ethanol series. The sites of digoxigenin- and biotin-labeled probes were detected using anti-digoxigenin-FITC (Sigma Aldrich/Roche Applied Science) and streptavidin-Cy3 (Thermo Fisher Scientific/Invitrogen, Carlsbad, CA, USA), respectively. Chromosomes were counterstained with DAPI and mounted in Vectashield Antifade Mounting Medium (Vector Laboratories, Burlingame, CA, USA).

Microscopic and image analysis

Slides were examined using Axio Imager Z.2 Zeiss microscope (Zeiss, Oberkochen, Germany) equipped with a Cool Cube 1 camera (Metasystems, Altlussheim, Germany) and appropriate optical filters, and a PC running ISIS software 5.4.7 (Metasystems). The final image adjustment was performed in Adobe Photoshop CS5, and idiograms and final pictures were created in Adobe Photoshop CS5 and GIMP (GNU Image Manipulation Program) v2.10.34. A minimum of ten preparations with mitotic metaphase chromosome spreads and different probe combinations were used for the final karyotype reconstruction of each genotype.