Transcriptome variation in human populations and its potential application in forensics

Daca-Roszak, P.; Zietkiewicz, E.

doi:10.1007/s13353-019-00510-1

Transcriptome variation in human populations and its potential application in forensics

Human Genetics • Review
Open access
Published: 10 August 2019

Volume 60, pages 319–328, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of Applied Genetics Aims and scope Submit manuscript

Transcriptome variation in human populations and its potential application in forensics

Download PDF

3006 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

This review presents the state-of-the-art in the forensic application of genetic methods driven by the research in population transcriptomics. In the first part of the review, the constraints of using classical genomic markers are shortly reviewed. In the second part, the developments in the field of inter-population diversity at the transcriptomic level are presented. Subsequently, a potential of population-specific transcriptomic markers in forensic science applications, including ascertaining population affiliation of human samples and cell mixtures separation, are presented.

Analysis of Genotyping-by-Sequencing (GBS) Data

From Hemogenetics to Forensic Genomics

Single-cell transcriptome sequencing allows genetic separation, characterization and identification of individuals in multi-person biological mixtures

Article Open access 20 February 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Genetics in forensic identification

Genetics has been long recognized and adopted as an efficient and reliable approach to forensic identification (FID) of human samples. Genetic-based FID may be perceived from different perspectives, depending on the goal of the investigation. These goals vary considerably and may concern:

determination of family links
identification of individuals from whom forensic biological traces derive
assessment of the ancestral contribution and of individual’s affiliation with continental/ethnic groups
finding clues about the inherited or acquired phenotype
identification of the tissue source of a biological material

Each of the goals in FID requires using appropriate genetic markers and specific methodology, to counteract numerous and variable constrains associated with the analysis (See Fig. 1). Besides the intrinsic constraints associated with the limited information of genetic markers, there are practical concerns in genetic FID, related to any of the following: lack of reference samples, scarcity and/or degradation of the genetic material in forensic traces, and non-homogeneous character of the material (mixed samples).

There is ample literature describing the application of classical, DNA-based genetic markers (microsatellites, SNPs, and haplotypes) for resolving family links (paternity, family relations) and individual’s identity as compared with the reference (for the review, see for example Zietkiewicz et al. 2012).

Considerable progress has also been achieved in using DNA markers to assess ancestral contribution to the genome make-up of individuals and thus population affiliation of unknown samples (e.g., Elhaik et al. 2014; Santos et al. 2016). Rapid development of the analysis of human transcriptome and epigenome variability has opened further perspectives, both in the context of tissue identification, and determination of phenotypes; these aspects have been extensively applied in forensic studies (e.g., Frumkin et al. 2011; Xu et al. 2014; Kader and Ghai 2015; Park et al. 2016; Zubakov et al. 2016). Importantly, both transcriptomic and epigenomics markers may become useful in those of FID endeavors, which require determination of population affinity of the forensic material.

The aim of this review is to present the state-of-the art in the forensic application of genetic methods driven by the research in population genomics and transcriptomics, in the context of some of the major FID problems: the lack of reference samples and non-homogeneity of the biological material.

Genome diversity of human population in FID

The majority of genetic methods used in FID rely on the comparison of a material under investigation with reference samples (from suspected individuals, forensic archives, family members, etc.); sometimes, the information stored in a variety of specialized genetic databases can be used. However, the reference data are often unavailable to an investigator. In such cases, an alternative tactic may be applied: to assign an unidentified biological sample to a specific human population by comparing it with population-specific data. While indirect, it allows narrowing the focus of the investigation. Consequently, ascertaining population/ethnic affiliation of human samples based on DNA profiling has recently become an important goal in many forensic fields: e.g., crime perpetrator detection, identification of mass disasters or terrorist attack victims (Zietkiewicz et al. 2012; Chakraborty et al. 1999; Budowle et al. 2005; Phillips et al. 2009; Mamedov et al. 2010; Bamshad et al. 2003). The basic shortcoming of population-differentiating genomic markers is related to the low diversity of human species: the majority of genetic variance is shared by all human groups, reflecting the relatively young evolutionary age of our species and/or the recent gene flow (admixture) among extant populations (e.g., Shriver 1997; Zietkiewicz and Labuda 2001; Tishkoff and Williams 2002). In consequence, what actually differentiate human populations are different allele frequencies rather than the presence or absence of marker alleles. Markers with significant frequency differences between human populations are often referred as ancestry informative markers (AIMs) (Frudakis et al. 2003; Shriver and Kittles 2004; Nassir et al. 2009). Relatively few AIMs are needed for differentiating populations that have diverged a long time ago (e.g., continental groups). However, in case of closely related populations, which share very recent evolutionary history, the sufficient discrimination power can be achieved only by analyzing a very large number of markers distributed over the whole genome (Lao et al. 2006). These analyses usually rely on microarray-based technology (e.g., Novembre et al. 2008; Barbosa et al. 2017).

The majority of population-differentiating markers are selectively neutral, and they mostly reflect the demographic history that shaped the present-day population diversity. AIMs used to infer the ethnic origin of individuals are usually selected from a variety of genomic SNPs, SNP-based haplotypes or CNVs (copy number variants); they may be diploid or haploid (mtDNA, Y-chromosome). Of note, fast-mutating microsatellites (simple tandem repeats (STRs)), which are the most informative markers for the identification of individuals compared with the reference samples, are rarely used as AIMs. This is due to the fact that distinguishing alleles identical by descent (IBD) from those identical by state (IBS) is a challenging task, and conclusions on the population affiliation or the ancestry of the sample are not straightforward.

Many effective population-specific tests have been designed based on markers linked with the genes subjected to selection, e.g., involved in the metabolism of xenobiotics, immune response, fertility, or pigmentation (e.g., Phillips et al. 2007; Rogalla et al. 2015). While these markers can be successfully used to differentiate populations, it has to be remembered that some of the allele frequency similarities, rather than reflecting common ancestry, could be a result of polyphyletic mechanisms that depend on the environment and act in multiple populations independently.

The aforementioned constraints seriously limit the efficiency of DNA-based markers in applications, which require population-based discrimination of the biological material. Recently, intense efforts have been directed to search for non-DNA markers that would exhibit population specificity.

Transcriptome variation among populations

The application of expression microarrays (from Affimetrix or Illumina) targeting thousands of gene transcripts has allowed exploration of the transcriptional variation in humans at the unprecedented scale. First, the levels of gene expression have been shown to differ not only among cells/tissue types, but also among individuals (Cheung et al. 2003; Monks et al. 2004; Morley et al. 2004; Stranger and Dermitzakis 2005). Soon, numerous studies have demonstrated that, while the bulk of variation in the expression level is observed between individuals, significant differences across continental populations also exist (Spielman et al. 2007; Stranger et al. 2007; Storey et al. 2007; Price et al. 2008; Zhang et al. 2008; Ye et al. 2014; Armengol et al. 2009; Fan et al. 2009; Lappalainen et al. 2013; Yin et al. 2014; Mele et al. 2015; Dimas et al. 2009; Li et al. 2014a).

LCL-based studies

The majority of data supporting the notion of the inter-population differences in gene expression are based on the model of lymphoblastoid cell lines (LCLs) (EBV-immortalized human B-lymphocytes). The majority of LCLs, commercially available from Coriell depository and previously used in the International HapMap Project, represent ethnically homogeneous continental populations: CEU—Utah individuals of European ancestry, CHB—Han Chinese, JPT—Japanese, YRI—Nigerians,, and AA—admixed African Americans. In spite of the common source of the cell lines used in human transcriptome diversity studies, the direct comparison of the results is difficult for several reasons. First, not all the studies compared the same populations; most often, only limited pairwise comparisons were performed (CEU-CHB; CEU-YRI; YRI-CHB, etc.), and the numbers of individuals representing the populations differed. Second, different methodologies have been applied to determine the expression level (various microarray platforms from Affymetrix and Illumina, or next-generation sequencing (NGS)). Third, estimating and reporting the significance of the results was not uniform (e.g., different statistical models were used; the fold-difference was not always reported; not all the studies provided the names of best-differentiating genes; etc.).

In the seminal study of Spielman, Affymetrix HG-Focus microarray addressing over 4000 genes expressed in lymphobastoid cells was used to compare expression in Europeans (60 CEU) and Asians (41 CHB and 41 JPT) (Spielman et al. 2007). Over 1000 genes (25%) were found to be differentially expressed between Europeans and Asians (t test, p < 0.05), while only 27 genes differentiated Chinese and Japanese. Among 35 genes displaying at least 2-fold expression difference between Europeans and Asians, the best were: UGT2B17 and ROBO1 (with 22- and 4-fold higher expression in CEU, respectively) and CLECSF2 (with 4-fold higher expression in YRI).

In another study using Affymetrix microarray (addressing 5190 genes expressed in LCLs), expression levels in Europeans and Africans (16 CEU and YRI) were compared using models accounting for differences between individuals as well as populations (Storey et al. 2007). Approximately 17% of the genes were differentially expressed in the two populations; the differences in 50 genes were significant at FDR < 20%, with the average fold-change of 1.65. Many of the differentially expressed genes were associated with the immunological response (e.g., gene-encoding cytokines and chemokine receptors: CCL22, CCL5, CCR2, CXCR3).

The levels of expression in Europeans and Africans were further analyzed in a larger sample of LCLs (30 CEU and 30 YRI family trios) using Affymetrix Human Exome array (over 9100 transcript clusters) and two independent statistical approaches taking into account the presence of SNPs in the probes (Zhang et al. 2008). About 4.2% of the transcript clusters displayed significantly different expression between the CEU and YRI (247 and 136 with higher expression in YRI and CEU, respectively), with the average fold-change of 1.3. Biological processes found to be enriched in the differential transcripts included ribosome biogenesis and antimicrobial humoral response, as well as cell-cell adhesion, mRNA catabolism, and tRNA processing. Nine of the genes (DPYSL2, CTTN, PLCG1, SS18, SH2B3, CPNE9, CMAH, CXCR3, and MRPS7) were earlier reported among the top 50 genes differentially expressed in CEU and YRI (Storey et al. 2007).

The impact of SNPs and CNV on transcriptome variation has been extensively studied using Illumina whole-genome array in 270 LCLs from CEU, CHB, JPT, and YRI populations (Stranger et al. 2007). Over 5300 genes exceeded the threshold of 16% difference in the median expression in one or more of the population pairs; assuming about 12,000 genes expressed in LCLs, the fraction of genes with significant expression differences between any two populations was estimated between 17 and 29%.

In another study, variation in gene expression was explored in 210 LCLs from four ethnic groups (CEU, CHB, JAP, and AFR), using Illumina microarray addressing more than 11,000 transcripts (Fan et al. 2009). Expression of 427 genes was characterized by higher inter-ethnic than inter-individual variance. Ten of these genes were characterized by the overall variance in expression > 8% (CXXC4, KIF21A, LOC376138, RGS20, TBC1D4, TUBB, UGT2B11, UGT2B17, UGT2B7, and UTS2); two of these genes (UGT2B17, RGS20) have been earlier reported as differentially expressed in Asians and Europeans (Spielman et al. 2007).

After the initial studies based on microarray analysis of transcriptome variability, the dynamic development of high-throughput NGS techniques resulted in a number of studies that analyzed even more transcripts. Besides confirming population differences in the level of expression of a considerable number of genes, they also shed more light upon the mechanisms underlying these differences.

In one of the NGS-based studies (Lappalainen et al. 2013) transcriptomic variation was characterized in over 460 LCLs from Africans (YRI) and four European subpopulations (CEU, FIN, GBR, and TSI). The inter-population differences accounted for only a small fraction (3%) of the total variation in expression. In spite of this, the number of genes displaying significant expression differences between Africans and Europeans was impressively high, ranging from ~ 1300 to 4300 (depending on which European subpopulation was compared with YRI). The much lower number of differentially expressed genes was seen when European subpopulations were compared.

In another NGS-based study, expression was examined in 20 LCLs from CEU and CHB (Li et al. 2014a). Over 400 differentially expressed genes were identified (including 132 and 291 with higher and lower expression in CHB, respectively); the magnitude of expression differences was modest (with the median of 2 and 0.4 for the genes up- and downregulated in CHB, respectively). Interestingly, new ethnic-specific isoforms of the known transcripts were revealed in over 200 genes (199 in CHB and 28 in CEU); eleven of those were found in the genes characterized by differential expression in the examined populations (CLEC2B, ARL4C, ZBP1, ITM2B, c11orf21, UTS2, VCAN, CACNA1E, EFNA5, NR2F2, and MGLL). Ethnic-specific splice junctions were found in only eight genes (NASP, MTIF3, CCDC47, and TBCA in CHB and ITGB7, CRTAP, ERO1LB, and NSUN2 in CEU) (Li et al. 2014a).

In an RNA-sequencing analysis of 45 LCLs from seven non-European populations (Namibian San, Mbuti Pygmies, Algerian Mozabites, Pakistan, East Asia, Siberia, Mexico), 44 differently expressed genes were identified, the vast majority representing immunity pathways. The highest inter-population gene expression variation was obtained for THNSL2, DRP2, VAV3, IQUB, BC038731, RAVER2, SYT2, LOC100129055, AK126080, and TTN genes (Martin et al. 2014).

The inter-population differences in the expression level have been repeatedly shown to be heritable and linked to the variation across the human genome. Potential mechanisms include INDELs or copy number variations (CNV) (Spielman et al. 2007; Armengol et al. 2009), SNPs (e.g., Stranger et al. 2007;Storey et al. 2007; Zhang et al. 2008) or alternative splicing (Zhang et al. 2008; Lappalainen et al. 2013; Li et al. 2014a).

Genetic variants in the cis- or trans-acting regulatory elements that affect transcript abundance can be mapped as expression quantitative trait loci (eQTLs). The inter-population variation in these genes’ expression are often associated with the population differences in the allele frequencies at eQTLs (Albert and Kruglyak 2015; Kelly et al. 2017; Park et al. 2018). For example, the spectacular difference of UGT2B17 expression difference between Asians and Europeans was shown to be associated with the higher frequency of the gene deletion in Asians (Spielman et al. 2007). The differential expression of UGT2B17 locus among populations has also been demonstrated in the study aiming to characterize population differences in the copy number variation (CNV) (Armengol et al. 2009).

Non-LCL studies

All the examples discussed above concerned population differences in gene expression studied in LCLs. In the last few years, several studies have demonstrated that population differences, similar to those reported in LCLs, are also observed in the cell types other than immortalized B lymphocytes.

In one of these studies, population patterns of gene expression were examined in epidermal samples from 30 individuals representing three continental populations (Yin et al. 2014). Microarray analysis has revealed 14 genes with more than 1.5-fold expression differences between Africans, Caucasians, and Asians. Not surprisingly, the strongest effect was seen between Africans and non-Africans, with 15- and 9-fold differences in the transcription of two best-discriminating genes (CCL18 and ADRA2C, respectively). The differences between Caucasians and Asians were less pronounced, with only one gene, NINL, displaying a 3-fold difference in the expression level (Altshuler et al. 2012; Yin et al. 2014).

In another study, focused on population differences in the transcriptional responses of the CD4+ T lymphocytes to the conditions that mimic activation through the antigen-specific receptors (Ye et al. 2014), the set of 236-transcripts was analyzed by Nanostring profiling in 348 donors of African, European, and Asian origin. A trend towards the higher T cell activation in donors of African ancestry has been found to be associated with population differences in the mean expression of a number of genes. For example, expression of IL2RA (cytokine receptor) in activated T cells from Africans was approximately 15% higher than in Europeans; other differentially responsive genes included IL17 family cytokines (over-induced in Africans) and IFNG (over-induced in non-Africans).

To exclude the influence of environmental factors (e.g., donor age, time of sampling) on gene expression patterns, the inter-population gene expression variation in placenta was examined in samples from four human populations: African Americans, European-Americans, South Asian Americans, and East Asian Americans (Hughes et al. 2015). The analyses revealed approximately 8% of variation in gene expression among the studied groups. African and South Asian populations had the highest inter-population variation in gene expression (> 140 genes). Genes characterized by the highest inter-population variation were mainly involved in pathways related to immune response, cell signaling, and metabolism (Hughes et al. 2015).

In the recent study on the multi-tissue transcriptomic patterns in Caucasians and Africans, population differences in the expression of over 220 protein-coding genes and 150 lncRNAs (long non-coding RNA) were reported (Mele et al. 2015). However, some of these differentiating markers were specific to individual tissue types. This is consistent with the earlier study, where the direct comparison of gene expression profiles in three types of cells (LCLs, T cells, and primary fibroblasts) has revealed that the majority (80–90%) of genomic variants affecting gene regulation act in a cell type–specific manner (Dimas et al. 2009).

The latter studies have indicated that further surveys are needed to elucidate whether any of the reported population differences in gene expression is common to different cell types. Expression profiling aiming to distinguish ethnic affiliation of the forensic samples would therefore require that the levels of transcripts are compared in the corresponding tissues.

Tissue Expression project (GTEX) may overcome the scarcity of expression data from different human tissues, other than LCLs. GTEX catalogs gene expression variation in major tissues and, additionally, provides an information regarding genetic background underlying this variation (Lonsdale et al. 2013). So far, gene expression profiles for more than 50 human tissues have been cataloged and made publicly available in GTEX database. Hitherto, GTEX project encompasses only data gathered from the Caucasian cohort, which limits applicability of the data to the global population context (Lonsdale et al. 2013).

The application of NGS in human population studies, besides revealing differences in gene expression patterns between distinct human groups (discussed above), provided the knowledge about the diversity of mRNA isoforms in human populations (Park et al. 2018; Djebali et al. 2012; Vaquero-Garcia et al. 2016). It is well known that the vast majority of human genes are subjected to alternative splicing, and a number of isoforms from a single gene may be generated (Pan et al. 2008; Wang et al. 2008; Djebali et al. 2012; Vaquero-Garcia et al. 2016; Park et al. 2018). Various mRNA isoforms have distinct stability and biological function. All these differences in the quantitative and qualitative composition of mRNA isoforms may be adopted as potential population markers.

The latest achievements in the research on alternative splicing variation in human populations have been summarized in the review by Park et al. (2018). The landscape of alternative splicing in relation to the genetic variation has been investigated in a few studies (e.g., Martin et al. 2014; Lappalainen et al. 2013; Battle et al. 2014). Most of the studies were conducted on LCL samples, and they concentrated on the mechanisms underlying formation of RNA isoforms (e.g., Montgomery et al. 2010; Pickrell et al. 2010).

For example, over 170 genes with transcript isoforms changes were identified in the European population (Kwan et al. 2008). Another study has shown that the majority (75 ± 22%) of population-specific variance in gene expression levels observed among seven global human populations can be explained by the variation in gene expression, while only the minor part is caused by the alternative splicing (Martin et al. 2014). This observation was also confirmed in the study of lymphoblastic cells from 69 Yoruban and 60 Caucasian individuals, where RNA sequencing identified 44 genes, for which the ratios of splicing isoforms were similar within each population but more different when comparing populations (Gonzalez-Porta et al. 2012).

The literature data presented above clearly indicate that a significant variation in the expression level across populations exists and that it is at least partially caused by the genomic variation.

The use of specific mRNA transcripts allows efficient differentiation of samples that originate from human populations. Our recent study has revealed two such population-discriminating transcripts: UTS2 and UGT2B17 (Daca-Roszak et al. 2018). These mRNA markers exhibited significant population differences in the expression level in both B cell lines and in the peripheral blood and enabled differentiation of Caucasian and Chinese cohorts with high specificity (> 90%) and sensitivity (> 76%) (Daca-Roszak et al. 2018).

Population-specific transcriptome variation—prospects for FID applications

The aforementioned data indicate that carefully chosen population-specific transcriptomic markers can be used in FID applications in a similar way to the DNA-based AIMs, to indicate the population origin of a forensic sample. On the other hand, transcriptomic markers are, just like population-specific DNA markers, more quantitative than qualitative. Moreover, they are expected to be even more susceptible to environmental influences (age, diet). On the other hand, as discussed below, population-specific transcriptomic markers harbor an important, new potential, which may be prospectively used to solve the great problem of FID application, that of sample mixtures.

The effective use of population-specific genetic markers in FID is often hampered by the non-homogeneity of a forensic material. While deconvolution of allelic profiles obtained from mixed samples is possible (e.g., Hu et al. 2014; van der Gaag et al. 2016), it remains a difficult task and often requires using sophisticated mathematical models (e.g., Bille et al. 2014; Bieber et al. 2016). In practice, identification of multiple contributors by genotyping DNA markers in forensic samples is challenging or not feasible if the reference DNA profiles are not available (Fregeau et al. 2003; Westen et al. 2009). All these features limit the direct use of genetic markers for the analysis of evidentiary samples, which often contain mixed genetic material of unknown origin. Mixtures of cells from the same tissue type, originating from different individuals, are often encountered in the forensic evidence; in the absence of the reference samples, distinguishing the population origin of the individuals, who are the source of such material, poses a serious problem in the FID practice (e.g., Bieber et al. 2016; Gill et al. 2006).

Physical separation of DNA mixtures can be used to address complex DNA mixture problem. In fact various single cell separation technologies have been used before, mainly in sexual offense cases (e.g., Li et al. 2014b; Williamson et al. 2018) and other examples of tissue separation. However, such idea is brand new in a population-discrimination aspect.

A new perspective is related to a potential application of transcripts characterized by population-specific differences in the expression level. The idea relies on the combination of two techniques: labeling transcripts with population-specific probes and separation of the labeled cells.

The cells from donors of different ethnic background could be “barcoded” with the FISH probes that specifically hybridize to transcripts characterized by differential expression in the relevant populations. In the next step, specifically labeled cells could be separated based on using the cell sorters, laser capture microdissection (LCM) technology, or any other cell separation technique (e.g., Fend et al. 1999; Datta et al. 2015) (see Fig. 2).

Population affiliation of the separated cell pools can be then confirmed by using markers appropriate for the analyzed populations. Markers may be chosen from among transcriptomic probes or genomic eQTLs (SNPs or INDELs) that underlie or associate with the differential expression; population-specific genomic markers not associated with the expression differences (e.g., Daca-Roszak et al. 2016) could be also used for this purpose. The homogenized cell pools can be further used for the downstream profiling using individual-specific or phenotype-specific markers (Vidaki et al. 2013; Zubakov et al. 2010; Zbiec-Piekarska et al. 2015; Koch and Wagner 2011; Bocklandt et al. 2011; Hannum et al. 2013; Weidner et al. 2014).

In most of the forensic cases, DNA/RNA co-isolation from the biological material is possible. The feasibility of combined DNA and RNA profiling of body fluids and contact traces, providing information about both the cell type and sample donor identity, has been reported (Lindenbergh et al. 2012). One can envision that the simultaneous analysis of two transcriptomic markers, one differentiating populations and the other—tissues, combined with any cell separation technique, e.g., LCM technology, could be the way to examine forensic cell mixtures.

Prospects of combining application of distinct markers (transcriptomic, genomic, and epigenetic) for FID purposes, however tempting, are not without flaws. From a technical point of view, the application of transcriptional probes in LCM-based separation of forensic mixtures is, at the present moment, time-consuming and expensive and requires highly qualified and experienced staff. When the amount of a material in the mixed sample is large enough, LCM can be replaced by cell sorters; however, the cell-sorter technology is predictably less suitable for the forensic purposes, where typically only a scarce amount of the evidentiary material is available.

So far, the use of population-specific transcriptomic markers and probes has not been tested in practical forensic applications. The majority of studies were performed in LCLs, cultured under specific laboratory conditions and derived from a limited set of, mostly continental, populations (e.g., Spielman et al. 2007; Stranger et al. 2007; Storey et al. 2007). Further studies are therefore required to assess sensitivity, specificity, and stability of population-specific transcriptomic markers in real-life samples, which may contain different cell types, like full blood, epithelium, sperm, and hair. Furthermore, additional search for transcripts differentiating more closely related and/or admixed human groups have to be performed. Other problems are related to the non-uniform biological basis of the transcriptome variance (e.g., some transcripts’ levels depend on the environmental factors). Therefore, when selecting transcriptomic markers to be used in the assessment of population affiliation, it will be important to exclude the genes whose expression is known to depend on gender and environmental conditions (diet, stress, etc.).

All the limitation notwithstanding, further exploration of the population-specific transcriptome variation should be the goal of research aiming to improve the applicative prospects in the field of forensic identification.

Abbreviations

FID:: Forensic identification
AIM:: Ancestry Informative marker
LCLs:: Lymphoblastoid cell lines
EBV:: Epstein-Barr virus
IBD:: Markers identical by identical by descent
IBS:: Markers identical by state
LCM:: Laser capture microdissection

References

Albert FW, Kruglyak L (2015) The role of regulatory variation in complex traits and disease. Nat Rev Genet 16(4):197–212. https://doi.org/10.1038/nrg3891
Article CAS PubMed Google Scholar
Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422):56–65. https://doi.org/10.1038/nature11632
Article CAS Google Scholar
Armengol L, Villatoro S, Gonzalez JR, Pantano L, Garcia-Aragones M, Rabionet R et al (2009) Identification of copy number variants defining genomic differences among major human groups. PLoS One 4(9). https://doi.org/10.1371/journal.pone.0007230
Article PubMed PubMed Central Google Scholar
Bamshad MJ, Wooding S, Watkins WS, Ostler CT, Batzer MA, Jorde LB (2003) Human population genetic structure and inference of group membership. Am J Hum Genet 72(3):578–589. https://doi.org/10.1086/368061
Article CAS PubMed PubMed Central Google Scholar
Barbosa FB, Cagnin NF, Simioni M, Farias AA, Torres FR, Molck MC et al (2017) Ancestry informative marker panel to estimate population stratification using genome-wide human Array. Ann Hum Genet 81(6):225–233. https://doi.org/10.1111/ahg.12208
Article CAS PubMed Google Scholar
Battle A, Mostafavi S, Zhu XW, Potash JB, Weissman MM, McCormick C et al (2014) Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res 24(1):14–24. https://doi.org/10.1101/gr.155192.113
Article CAS PubMed PubMed Central Google Scholar
Bieber FR, Buckleton JS, Budowle B, Butler JM, Coble MD (2016) Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion. BMC Genet 17. https://doi.org/10.1186/s12863-016-0429-7
Bille TW, Weitz SM, Coble MD, Buckleton J, Bright JA (2014) Comparison of the performance of different models for the interpretation of low level mixed DNA profiles. Electrophoresis 35(21–22):3125–3133. https://doi.org/10.1002/elps.201400110
Article CAS PubMed Google Scholar
Bocklandt S, Lin W, Sehl ME, Sanchez FJ, Sinsheimer JS, Horvath S et al (2011) Epigenetic predictor of age. PLoS One 6(6). https://doi.org/10.1371/journal.pone.0014821
Article CAS PubMed PubMed Central Google Scholar
Budowle B, Bieber FR, Eisenberg AJ (2005) Forensic aspects of mass disasters: strategic considerations for DNA-based human identification. Leg Med (Tokyo) 7(4):230–243. https://doi.org/10.1016/j.legalmed.2005.01.001
Article CAS Google Scholar
Chakraborty R, Stivers DN, Su B, Zhong YX, Budowle B (1999) The utility of short tandem repeat loci beyond human identification: implications for development of new DNA typing systems. Electrophoresis 20(8):1682–1696
Article CAS PubMed Google Scholar
Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M et al (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33(3):422–425. https://doi.org/10.1038/ng1094
Article CAS PubMed Google Scholar
Daca-Roszak P, Pfeifer A, Zebracka-Gala J, Jarzab B, Witt M, Zietkiewicz E (2016) EurEAs_Gplex-A new SNaPshot assay for continental population discrimination and gender identification. Forensic Sci Int Genet 20:89–100. https://doi.org/10.1016/j.fsigen.2015.10.004
Article CAS PubMed Google Scholar
Daca-Roszak P, Swierniak M, Jaksik R, Tyszkiewicz T, Oczko-Wojciechowska M, Zebracka-Gala J et al (2018) Transcriptomic population markers for human population discrimination. BMC Genet 19:54. https://doi.org/10.1186/s12863-018-0663-2
Article CAS PubMed PubMed Central Google Scholar
Datta S, Malhotra L, Dickerson R, Chaffee S, Sen CK, Roy S (2015) Laser capture microdissection: big data from small samples. Histol Histopathol 30(11):1255–1269
CAS PubMed PubMed Central Google Scholar
Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H et al (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325(5945):1246–1250. https://doi.org/10.1126/science.1174148
Article CAS PubMed PubMed Central Google Scholar
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108. https://doi.org/10.1038/nature11233
Article CAS PubMed PubMed Central Google Scholar
Elhaik E, Tatarinova T, Chebotarev D, Piras IS, Calo CM, De Montis A et al (2014) Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nat Commun 5. https://doi.org/10.1038/ncomms4513
Fan HPY, Di Liao C, Fu BY, Lam LCW, Tang NLS (2009) Interindividual and interethnic variation in genomewide gene expression: insights into the biological variation of gene expression and clinical implications. Clin Chem 55(4):774–785. https://doi.org/10.1373/clinchem.2008.119107
Article CAS PubMed Google Scholar
Fend F, Emmert-Buck MR, Chuaqui R, Cole K, Lee J, Liotta LA et al (1999) Immuno-LCM: laser capture microdissection of immunostained frozen sections for mRNA analysis. Am J Pathol 154(1):61–66. https://doi.org/10.1016/s0002-9440(10)65251-0
Article CAS PubMed PubMed Central Google Scholar
Fregeau CJ, Brown KL, Leclair B, Trudel I, Bishop L, Fourney RM (2003) AmpFl STR (R) profiler PIUS (TM) short tandem repeat DNA analysis of casework samples, mixture samples, and nonhuman DNA samples amplified under reduced PCR volume conditions (25 mu L). J Forensic Sci 48(5):1014–1034
Article CAS PubMed Google Scholar
Frudakis T, Venkateswarlu K, Thomas MJ, Gaskin Z, Ginjupalli S, Gunturi S et al (2003) A classifier for the SNP-based inference of ancestry. J Forensic Sci 48(4):771–782
Article CAS PubMed Google Scholar
Frumkin D, Wasserstrom A, Budowle B, Davidson A (2011) DNA methylation-based forensic tissue identification. Forensic Sci Int Genet 5(5):517–524. https://doi.org/10.1016/j.fsigen.2010.12.001
Article CAS PubMed Google Scholar
Gill P, Brenner CH, Buckleton JS, Carracedo A, Krawczak M, Mayr WR et al (2006) DNA commission of the International Society of Forensic Genetics: recommendations on the interpretation of mixtures. Forensic Sci Int 160(2–3):90–101. https://doi.org/10.1016/j.forsciint.2006.04.009
Article CAS PubMed Google Scholar
Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R (2012) Estimation of alternative splicing variability in human populations. Genome Res 22(3):528–538. https://doi.org/10.1101/gr.121947.111
Article CAS PubMed PubMed Central Google Scholar
Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S et al (2013) Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell 49(2):359–367. https://doi.org/10.1016/j.molcel.2012.10.016
Article CAS PubMed Google Scholar
Hu N, Cong B, Li S, Ma C, Fu L, Zhang X (2014) Current developments in forensic interpretation of mixed DNA samples (review). Biomed Rep 2(2049–9434 (Print)):309–316
Article CAS PubMed PubMed Central Google Scholar
Hughes DA, Kircher M, He ZS, Guo S, Fairbrother GL, Moreno CS et al (2015) Evaluating intra- and inter-individual variation in the human placental transcriptome. Genome Biol 16. https://doi.org/10.1186/s13059-015-0627-z
Kader F, Ghai M (2015) DNA methylation and application in forensic sciences. Forensic Sci Int 249:255–265. https://doi.org/10.1016/j.forsciint.2015.01.037
Article CAS PubMed Google Scholar
Kelly DE, Hansen MEB, Tishkoff SA (2017) Global variation in gene expression and the value of diverse sampling. Curr Opin Syst Biol 1:102–108. https://doi.org/10.1016/j.coisb.2016.12.018
Article PubMed PubMed Central Google Scholar
Koch CM, Wagner W (2011) Epigenetic-aging-signature to determine age in different tissues. Aging-Us 3(10):1018–1027
Article CAS Google Scholar
Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P et al (2008) Genome-wide analysis of transcript isoform variation in humans. Nat Genet 40(2):225–231. https://doi.org/10.1038/ng.2007.57
Article CAS PubMed Google Scholar
Lao O, van Duijn K, Kersbergen P, de Knijff P, Kayser M (2006) Proportioning whole-genome single-nucleotide-polymorphism diversity for the identification of geographic population structure and genetic ancestry. Am J Hum Genet 78(4):680–690. https://doi.org/10.1086/501531
Article CAS PubMed PubMed Central Google Scholar
Lappalainen T, Sammeth M, Friedlander MR, t Hoen PAC, Monlong J, Rivas MA et al (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501(7468):506–511. https://doi.org/10.1038/nature12531
Article CAS PubMed PubMed Central Google Scholar
Li JW, Lai KP, Ching AKK, Chan TF (2014a) Transcriptome sequencing of Chinese and Caucasian population identifies ethnic-associated differential transcript abundance of heterogeneous nuclear ribonucleoprotein K (hnRNPK). Genomics 103(1):56–64. https://doi.org/10.1016/j.ygeno.2013.12.005
Article CAS PubMed Google Scholar
Li XB, Wang QS, Feng Y, Ning SH, Miao YY, Wang YQ et al (2014b) Magnetic bead-based separation of sperm from buccal epithelial cells using a monoclonal antibody against MOSPD3. Int J Legal Med 128(6):905–911. https://doi.org/10.1007/s00414-014-0983-3
Article PubMed Google Scholar
Lindenbergh A, de Pagter M, Ramdayal G, Visser M, Zubakov D, Kayser M et al (2012) A multiplex (m)RNA-profiling system for the forensic identification of body fluids and contact traces. Forensic Sci Int Genet 6(5):565–577. https://doi.org/10.1016/j.fsigen.2012.01.009
Article CAS PubMed Google Scholar
Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S et al (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45(6):580–585. https://doi.org/10.1038/ng.2653
Article CAS Google Scholar
Mamedov IZ, Shagina IA, Kurnikova MA, Novozhilov SN, Shagin DA, Lebedev YB (2010) A new set of markers for human identification based on 32 polymorphic Alu insertions. Eur J Hum Genet 18(7):808–814. https://doi.org/10.1038/ejhg.2010.22
Article CAS PubMed PubMed Central Google Scholar
Martin AR, Costa HA, Lappalainen T, Henn BM, Kidd JM, Yee M-C et al (2014) Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS Genet 10(8). https://doi.org/10.1371/journal.pgen.1004549
Article PubMed PubMed Central Google Scholar
Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M et al (2015) The human transcriptome across tissues and individuals. Science 348(6235):660–665. https://doi.org/10.1126/science.aaa0355
Article CAS PubMed PubMed Central Google Scholar
Monks SA, Leonardson A, Zhu H, Cundiff P, Pietrusiak P, Edwards S et al (2004) Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75(6):1094–1105. https://doi.org/10.1086/426461
Article CAS PubMed PubMed Central Google Scholar
Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J et al (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464(7289):773–U151. https://doi.org/10.1038/nature08903
Article CAS PubMed Google Scholar
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS et al (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430(7001):743–747. https://doi.org/10.1038/nature02797
Article CAS PubMed PubMed Central Google Scholar
Nassir R, Kosoy R, Tian C, White PA, Butler LM, Silva G et al (2009) An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels. BMC Genet 10. https://doi.org/10.1186/1471-2156-10-39
Article PubMed PubMed Central Google Scholar
Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A et al (2008) Genes mirror geography within Europe. Nature 456(7218):98–U5. https://doi.org/10.1038/nature07331
Article CAS PubMed PubMed Central Google Scholar
Pan Q, Shai O, Lee LJ, Frey J, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40(12):1413–1415. https://doi.org/10.1038/ng.259
Article CAS PubMed Google Scholar
Park J-L, Kim JH, Seo E, Bae DH, Kim S-Y, Lee H-C et al (2016) Identification and evaluation of age-correlated DNA methylation markers for forensic use. Forensic Sci Int Genet 23:64–70. https://doi.org/10.1016/j.fsigen.2016.03.005
Article CAS PubMed Google Scholar
Park E, Pan ZC, Zhang ZJ, Lin L, Xing Y (2018) The expanding landscape of alternative splicing variation in human populations. Am J Hum Genet 102(1):11–26. https://doi.org/10.1016/j.ajhg.2017.11.002
Article CAS PubMed PubMed Central Google Scholar
Phillips C, Salas A, Sanchez JJ, Fondevila M, Gomez-Tato A, Alvarez-Dios J et al (2007) Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet 1(3–4):273–280. https://doi.org/10.1016/j.fsigen.2007.06.008
Article CAS PubMed Google Scholar
Phillips C, Prieto L, Fondevila M, Salas A, Gomez-Tato A, Alvarez-Dios J et al (2009) Ancestry analysis in the 11-M Madrid bomb attack investigation. PLoS One 4(8). https://doi.org/10.1371/journal.pone.0006583
Article PubMed PubMed Central Google Scholar
Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E et al (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772. https://doi.org/10.1038/nature08872
Article CAS PubMed PubMed Central Google Scholar
Price AL, Butler J, Patterson N, Capelli C, Pascali VL, Scarnicci F et al (2008) Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 4(1). https://doi.org/10.1371/journal.pgen.0030236
Article PubMed PubMed Central Google Scholar
Rogalla U, Rychlicka E, Derenko MV, Malyarchuk BA, Grzybowski T (2015) Simple and cost-effective 14-loci SNP assay designed for differentiation of European, east Asian and African samples. Forensic Sci Int Genet 14:42–49. https://doi.org/10.1016/j.fsigen.2014.09.009
Article CAS PubMed Google Scholar
Santos C, Phillips C, Fondevila M, Daniel R, van Oorschot RAH, Burchard EG et al (2016) Pacifiplex: an ancestry-informative SNP panel centred on Australia and the Pacific region. Forensic Sci Int Genet 20:71–80. https://doi.org/10.1016/j.fsigen.2015.10.003
Article CAS PubMed Google Scholar
Shriver MD, Smith MW, Jin J, Marcini A, Akey JM, Deka R, Ferrell RE (1997) Ethnic-affiliation estimation by use of population-specific DNA markers. American Journal of Human Genetics, 60:957–964
Shriver MD, Kittles RA (2004) Genetic ancestry and the search for personalized genetic histories. Nat Rev Genet 5(8):611–6U3. https://doi.org/10.1038/nrg1405
Article CAS PubMed Google Scholar
Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39(2):226–231. https://doi.org/10.1038/ng1955
Article CAS PubMed PubMed Central Google Scholar
Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, Akey JM (2007) Gene-expression variation within and among human populations. Am J Hum Genet 80(3):502–509. https://doi.org/10.1086/512017
Article CAS PubMed PubMed Central Google Scholar
Stranger BE, Dermitzakis ET (2005) The genetics of regulatory variation in the human genome. Hum Genomics 2(2):126–131
Article CAS PubMed PubMed Central Google Scholar
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315(5813):848–853. https://doi.org/10.1126/science.1136678
Article CAS PubMed PubMed Central Google Scholar
Tishkoff SA, Williams SM (2002) Genetic analysis of African populations: human evolution and complex disease. Nat Rev Genet 3(8):611–621. https://doi.org/10.1038/nrg865
Article CAS PubMed Google Scholar
van der Gaag KJ, de Leeuw RH, Hoogenboom J, Patel J, Storts DR, Laros JFJ et al (2016) Massively parallel sequencing of short tandem repeats-population data and mixture analysis results for the PowerSeq (TM) system. Forensic Sci Int Genet 24:86–96. https://doi.org/10.1016/j.fsigen.2016.05.016
Article CAS PubMed Google Scholar
Vaquero-Garcia J, Barrera A, Gazzara MR, Gonzalez-Vallinas J, Lahens NF, Hogenesch JB et al (2016) A new view of transcriptome complexity and regulation through the lens of local splicing variations. Elife 5. https://doi.org/10.7554/eLife.11752
Vidaki A, Daniel B, Court DS (2013) Forensic DNA methylation profiling-potential opportunities and challenges. Forensic Sci Int Genet 7(5):499–507. https://doi.org/10.1016/j.fsigen.2013.05.004
Article CAS PubMed Google Scholar
Wang ET, Sandberg R, Luo SJ, Khrebtukova I, Zhang L, Mayr C et al (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476. https://doi.org/10.1038/nature07509
Article CAS PubMed PubMed Central Google Scholar
Weidner CI, Lin Q, Koch CM, Eisele L, Beier F, Ziegler P et al (2014) Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol 15(2). https://doi.org/10.1186/gb-2014-15-2-r24
Article PubMed PubMed Central Google Scholar
Westen AA, Matai AS, Laros JFJ, Meiland HC, Jasper M, de Leeuw WJF et al (2009) Tri-allelic SNP markers enable analysis of mixed and degraded DNA samples. Forensic Sci Int Genet 3(4):233–241. https://doi.org/10.1016/j.fsigen.2009.02.003
Article CAS PubMed Google Scholar
Williamson VR, Laris TM, Romano R, Marciano MA (2018) Enhanced DNA mixture deconvolution of sexual offense samples using the DEPArray (TM) system. Forensic Sci Int Genet 34:265–276. https://doi.org/10.1016/j.fsigen.2018.03.001
Article CAS PubMed Google Scholar
Xu Y, Xie JH, Cao Y, Zhou HG, Ping Y, Chen LK et al (2014) Development of highly sensitive and specific mRNA multiplex system (XCYR1) for forensic human body fluids and tissues identification. PLoS One 9(7). https://doi.org/10.1371/journal.pone.0100123
Article PubMed PubMed Central Google Scholar
Ye CJ, Feng T, Kwon H-K, Raj T, Wilson M, Asinovski N et al (2014) Intersection of population variation and autoimmunity genetics in human T cell activation. Science 345(6202):1311. https://doi.org/10.1126/science.1254665
Article CAS Google Scholar
Yin L, Coelho SG, Ebsen D, Smuda C, Mahns A, Miller SA et al (2014) Epidermal gene expression and ethnic pigmentation variations among individuals of Asian, European and African ancestry. Exp Dermatol 23(10):731–735. https://doi.org/10.1111/exd.12518
Article PubMed Google Scholar
Zbiec-Piekarska R, Spolnicka M, Kupiec T, Makowska Z, Spas A, Parys-Proszek A et al (2015) Examination of DNA methylation status of the ELOVL2 marker may be useful for human age prediction in forensic science. Forensic Sci Int Genet 14:161–167. https://doi.org/10.1016/j.fsigen.2014.10.002
Article CAS PubMed Google Scholar
Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, Clark TA et al (2008) Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet 82(3):631–640. https://doi.org/10.1016/j.ajhg.2007.12.015
Article CAS PubMed PubMed Central Google Scholar
Ziętkiewicz E, Labuda D (2001) Modern human origins in light of the nuclear DNA diversity in world populations. In: Donnelly P, Foley RA (eds) Genes, fossils and behaviour: an integrated approach to human evolution. IOS Press, Amsterdam, The Netherlands, pp 79–97
Zietkiewicz E, Witt M, Daca P, Zebracka-Gala J, Goniewicz M, Jarzab B et al (2012) Current genetic methodologies in the identification of disaster victims and in forensic analysis. J Appl Genet 53(1):41–60. https://doi.org/10.1007/s13353-011-0068-7
Article CAS PubMed Google Scholar
Zubakov D, Liu F, van Zelm MC, Vermeulen J, Oostra BA, van Duijn CM et al (2010) Estimating human age from T-cell DNA rearrangements. Curr Biol 20(22):R970–R971. https://doi.org/10.1016/j.cub.2010.10.022
Article CAS PubMed Google Scholar
Zubakov D, Liu F, Kokmeijer I, Choi Y, van Meurs JBJ, van Ijcken WFJ et al (2016) Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length. Forensic Sci Int Genet 24:33–43. https://doi.org/10.1016/j.fsigen.2016.05.014
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Human Genetics, Polish Academy of Sciences, Strzeszynska 32, 60-479, Poznan, Poland
P. Daca-Roszak & E. Zietkiewicz

Authors

P. Daca-Roszak
View author publications
You can also search for this author in PubMed Google Scholar
E. Zietkiewicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Daca-Roszak.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by: Michal Witt

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Daca-Roszak, P., Zietkiewicz, E. Transcriptome variation in human populations and its potential application in forensics. J Appl Genetics 60, 319–328 (2019). https://doi.org/10.1007/s13353-019-00510-1

Download citation

Received: 14 April 2019
Revised: 22 July 2019
Accepted: 24 July 2019
Published: 10 August 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s13353-019-00510-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transcriptome variation in human populations and its potential application in forensics

Abstract

Similar content being viewed by others