Abstract
Lack of efficacy or adverse drug response are common phenomena in pharmacological therapy causing considerable morbidity and mortality. It is estimated that 20–30% of this variability in drug response stems from variations in genes encoding drug targets or factors involved in drug disposition. Leveraging such pharmacogenomic information for the preemptive identification of patients who would benefit from dose adjustments or alternative medications thus constitutes an important frontier of precision medicine. Computational methods can be used to predict the functional effects of variant of unknown significance. However, their performance on pharmacogenomic variant data has been lackluster. To overcome this limitation, we previously developed an ensemble classifier, termed APF, specifically designed for pharmacogenomic variant prediction. Here, we aimed to further improve predictions by leveraging recent key advances in the prediction of protein folding based on deep neural networks. Benchmarking of 28 variant effect predictors on 530 pharmacogenetic missense variants revealed that structural predictions using AlphaMissense were most specific, whereas APF exhibited the most balanced performance. We then developed a new tool, APF2, by optimizing algorithm parametrization of the top performing algorithms for pharmacogenomic variations and aggregating their predictions into a unified ensemble score. Importantly, APF2 provides quantitative variant effect estimates that correlate well with experimental results (R2 = 0.91, p = 0.003) and predicts the functional impact of pharmacogenomic variants with higher accuracy than previous methods, particularly for clinically relevant variations with actionable pharmacogenomic guidelines. We furthermore demonstrate better performance (92% accuracy) on an independent test set of 146 variants across 61 pharmacogenes not used for model training or validation. Application of APF2 to population-scale sequencing data from over 800,000 individuals revealed drastic ethnogeographic differences with important implications for pharmacotherapy. We thus think that APF2 holds the potential to improve the translation of genetic information into pharmacogenetic recommendations, thereby facilitating the use of Next-Generation Sequencing data for stratified medicine.
Similar content being viewed by others
Introduction
Inter-individual variability in drug response that manifests as lack of efficacy or toxicity remains a common concern upon pharmacological therapy. It is estimated that only half of all patients respond to first-line antidepressant therapies [1] and only 4% to 25% of individuals taking the top ten highest-grossing drugs in the United States showed the intended drug response [2]. In addition, adverse drug reaction (ADR) account for around 6.5% of hospital admissions in the UK [3] and up to 12% in Sweden [4]. This number increases to over 15% when patients are older, take different medications at the same time or have comorbidities [5, 6]. Furthermore, ADRs are responsible for 9% of overall healthcare costs [7] and affect approximately one third of novel therapeutics after receiving regulatory approval [8]. It is estimated that 20–30% of such differential drug response stems from genetic factors, i.e., variations in genes that are involved in drug absorption, distribution, metabolism and excretion (ADME), or that encode drug targets [9, 10]. Over the last few decades, a multitude of genetic variants significantly associated with drug response have been identified using forward genetics, particularly for cytochrome P450 (CYP) genes. While these variants are promising pharmacogenomic biomarkers to improve the efficacy and safety of some drugs, increased evidence suggests that they are not sufficient to exhaustively explain the genetically encoded variability in drug response [11,12,13].
Major developments in sequencing technologies have enabled the profiling of human genomic sequences at the population scale. Whole exome and whole genome sequencing (WES and WGS, respectively) projects have so far identified more than 650 million high-quality variants across over 75,000 human genomes, more than half of which are rare with minor allele frequencies (MAFs) below 0.1% [14]. More than 70,000 variants have been identified in pharmacogenes, of which 80% were novel at the time of analysis and more than 98% were rare with global minor allele frequencies <1% [15, 16]. These rare and novel pharmacogenomic variants are suspected to explain at least part of the missing heritability in drug response [17], however the establishment of their causal effects is intrinsically difficult. For this reason, computational interpretation of rare variations remains to be of great importance in order to consider the full spectrum of pharmacogenomic variation for personalized drug response predictions.
Importantly, most of the available computational algorithms for variant effect predictions are trained on pathogenic variants and consider evolutionary conservation as a key parameter. While these models may perform well for the prioritization of variants implicated in genetic disease, their performance is overall poor when applied to pharmacogenomic variant sets [18, 19]. The reason for this phenomenon is that such algorithms are based on the hypothesis that biologically important regions should be conserved and variants occurring in these regions are thus more likely to exert deleterious effect on the function of the respective gene product. To overcome this limitation, we previously developed an ADME-optimized prediction framework (APF), an ensemble method based on pharmacogenomic reparameterization [20]. The resulting score outperformed previous methods on experimentally characterized variants pharmacogenomic variants, also on genes not included in APF training [21], and allowed quantitative estimates about the magnitude of variant effects on enzyme activity.
A major drawback of conventional prediction algorithms is the limited consideration of structural context due to the lack of high-resolution experimental structures for many pharmacogene products. Recent advances in the prediction of protein folding based on deep learning now allow for the first time to computationally model protein structures to near experimental accuracy with Cα root-mean-square deviation at 95% residue coverage of 0.96 Å [22]. By leveraging these advancements, a novel variant effect predictor termed AlphaMissense was established that integrated information about the structural consequences of missense variations with population frequency inferred labels to predict functional variant [23]. AlphaMissense performed excellently when flagging pathogenic variants; however, its performance on pharmacogenomic data remained to be evaluated.
Here, we leveraged the recent developments in the modeling of protein folding and optimized APF by integrating structural effect predictions. The resulting tool, APF2, predicts the functional impact of pharmacogenomic variants with higher accuracy than APF or AlphaMissense, which is particularly apparent for clinically relevant variations with CPIC guidelines. We furthermore validate APF2 performance on an independent test set of 146 variants with high-confidence functional annotations. By applying APF2 to newly released genomic data from 807,162 individuals, we provide an updated overview of the population-scale variability in pharmacogenes and allowed the refinement of population-specific risk of non-response or adverse drug events for medications with established pharmacogenomic guidelines.
Methods
Variant data sources
In total, we generated three non-overlapping variant sets for training, validation and testing of APF2.
Set 1: All missense variants with phenotypic annotations by the Clinical Pharmacogenetics Implementation Consortium (CPIC) were extracted as high evidence set (n = 145 variants across 10 genes; Supplementary Table 1). Variants with decreased function or loss-of-function were considered as deleterious, all other variants as functionally neutral.
Set 2: Variants with high-quality experimental characterization data were extracted from the literature. After excluding variants that were already part of Set 1, a total of 385 unique missense variants across 45 pharmacogenes were collected (Supplementary Table 2). Variants with a measured intrinsic clearance <50% and ≥50% of the wild-type enzyme were considered deleterious and functionally neutral, respectively.
Set 3: As test set, we extracted variants from the PharmGKB and ClinVar databases (n = 146; Supplementary Table 3). For PharmGKB, variants were annotated as deleterious that impacted drug response or pharmacokinetics with an evidence level of 1 or 2. For ClinVar, we included variants with “drug response” annotation that were reviewed by an expert panel or included in a practice guideline (3 or 4 stars). Variants in pharmacogenes without evidence for an impact on pharmacokinetics or pharmacodynamics were considered functionally neutral.
Computational variant effect predictions using available algorithms
Variant prediction was performed using a total of 28 computational algorithms. AlphaMissense scores were extracted from https://console.cloud.google.com/storage/browser/dm_alphamissense. Scores for SIFT [24], PolyPhen-2 [25], PROVEAN [26], MutationTaster [27], MutationAssessor [28], VEST [29], ClinPred [30], MutPred [31], APF [20], MetaRNN [32], MetaSVM [33], MetaLR [33], FATHMM [34], FATHMM-MKL [35], CADD [36], DANN [37], REVEL [38], Eigen [39], LRT [40], LIST-S2 [41], DEOGEN2 [42], MVP [43], M-CAP [44], PrimateAI [45], MPC [46], fitCons [47] and GenoCanyon [48] were computed using ANNOVAR [49].
Definitions
We considered 208 pharmacogenes for consistency with the previous literature [15]. Performance of each of the algorithms was evaluated based on the number of true positives (TP; number of deleterious variants that was correctly identified as deleterious by the respective algorithm), true negatives (TN; number of neutral variants that was correctly identified as neutral by the respective algorithm), false positives (FP; number of neutral variants that was erroneously identified as deleterious by the respective algorithm) and false negatives (FN; number of deleterious variants that was incorrectly identified as neutral by the respective algorithm). The following key metrics were calculated for each algorithm and defined as:
Youden’s J was defined on the basis of a receiver operating characteristic (ROC) curve as J = maxx{Sensitivity + Specificity – 1}. In addition, the area under the ROC curve (AUC) was calculated for each algorithm in R studio (version 2023.09.1).
Population-scale sequencing data
Genetic variability data for pharmacogenes was extracted from gnomAD v4.0 [50] from 730,947 whole exome and 76,215 whole genome sequences. The data set encompassed information from 37,545 Africans and African Americans, 30,019 Admixed Americans, 14,804 Ashkenazi Jews, 22,448 East Asians, 32,026 Finns, 590,031 non-Finnish Europeans, 45,546 South Asians and 3031 Middle Easterners. The gnomAD dataset was aggregated from different projects and reprocessed using uniform pipelines for increased consistency. The overall population is balanced between males and females. Variant carrier frequency was calculated by aggregating variant frequencies using the Hardy-Weinberg equation. We considered variants with MAFs ≥1% as common and variants with MAFs <1% as rare.
Results
Benchmarking pharmacogenomic variant predictions
To evaluate the performance of available computational effect predictors on pharmacogenomic data, we first benchmarked 28 commonly used algorithms on 145 variants with pharmacogenomic annotations from CPIC (Set 1) and 385 variants with high-quality experimental functionality data (Set 2; see Methods). The best performing methods were AlphaMissense and APF (Fig. 1; Table 1), which achieved the overall highest AUC. Notably, AlphaMissense exhibited excellent specificity (94%) and PPV (85%) while its sensitivity was relatively low (33%). In contrast, APF was more balanced with an overall higher sensitivity (79%) than specificity (66%). MutPred, a method that models a broad range of structural and functional properties, including secondary structure, transmembrane topology and macromolecular binding also performed well; however, 49% of variants could not be predicted. Among the algorithms integrated into APF (MutationAssessor, PROVEAN, VEST, CADD and LRT), the former three were among the top performing methods (AUC ≥ 0.75), whereas predictions of CADD (AUC = 0.7) and LRT (AUC = 0.67) were less accurate when using standard parameters. Combined, these results suggested that incorporation of structural information based on AlphaMissense might improve algorithmic performance, particularly by increasing test specificity.
Developing APF2 by integrating structural information into the established ADME Prediction Framework
The incorporation of structural predictions into a new prediction framework was conducted in two steps - first, we optimized algorithm parametrization for pharmacogenomic variations and second, we aggregated the predictions from optimized algorithms into a unified ensemble score (Fig. 2). Based on the benchmarking results, we selected the five top performing algorithms, i.e. AlphaMissense, PolyPhen-2, PROVEAN, MutationAssessor and VEST, for parameter optimization. MutPred was not considered due to its large number of missing predictions. To leverage the full potential of the benchmarking set, we performed 5-fold cross-validation by randomly partitioning the combined set (combined Set 1 and Set 2 with a total of n = 530 variants) into five subsets and iterated the training on four partitions and validation on the remaining partition.
For threshold optimization, we used Youden’s J that quantifies the likelihood of making an informed decision (see Methods) and selected the value for which J was maximal (Fig. 3). In the training sets, informedness improved considerably for AlphaMissense (ΔJ = 0.19) and PROVEAN (ΔJ = 0.09), whereas only marginal improvements were observed for PolyPhen-2, VEST and MutationAssessor (ΔJ ≤ 0.03). Similar results were obtained when the optimized algorithms were applied to the entire combined set (Table 2). These findings suggest that the optimization approach was robust and improved the predictive accuracy of individual algorithms for pharmacogenomic predictions. We then aggregated the weighted prediction results of the optimized algorithms into the new ensemble score, APF2. The APF2 score can adopt values in the interval [0, 1]. For binary classifications, variants with APF2 scores ≤0.367 were classified as deleterious and variants with a score >0.367 as functionally neutral.
Benchmarking of APF2
To test APF2 performance, we first compared classification accuracy to the original APF that does not consider structural information. Notably, APF2 consistently outperformed all individual algorithms (Fig. 4A). The improved accuracy was particularly apparent for clinically relevant variants with CPIC annotations. Trailing APF2 were the original APF, PolyPhen-2 and VEST, whereas algorithms developed on the basis of evolutionary conservation, such as PrimateAI and fitCons, performed very poorly on pharmacogenomic data. Next, we tested the quantitative association between APF2 scores and enzyme function. Importantly, we find that ensemble scores significantly correlate with measured enzyme activity (R2 = 0.907, p = 0.003; Fig. 4B). These results suggest that APF2 can provide useful information about the magnitude of functional effects of a given pharmacogenetic missense variant.
To demonstrate the molecular basis for the improved performance, we analyzed variations that were correctly identified by APF2, but not by the original APF algorithm that does not consider structural information. CYP2C9*13 (rs72558187; NM_000771.4:c.269T>C) encodes a p.L90P amino acid exchange that abrogates enzyme activity of the respective gene product [51]. Position 90 is upstream of a B-C loop (residues 106–108), which interacts with CYP2C9 substrates, such as S-warfarin. The replacement of leucine by proline distorts this B-C loop and causes side chain turnover of p.A106 and p.R108, thereby reducing substrate interactions and reducing enzyme function (Fig. 4C; and ref. [52]). Another example is the p.T450P amino acid exchange in CYP21A2, which results in a loss of 17-hydroxyprogesterone clearance in vitro [53] and is associated with congenital adrenal hyperplasia due to 21-hydroxylase deficiency [54]. The exchange of a threonine to a proline affects the respective β-sheet structure that is likely required to stabilize the adjacent loop, resulting in destabilization and drastically reduced protein activity (Fig. 4D). Combined, these examples lend support to the importance of structural information for the functional interpretation of pharmacogenetic variations.
Lastly, we tested APF2 on an independent variant set as the gold-standard for evaluating predictive performance. To this end, we obtained pharmacogenomic variants with high level of evidence (n = 146) from PharmGKB and ClinVar (Set 3; see Methods). Compared to the commonly used algorithms as well as the best performing tools on the benchmarking set, APF2 was most accurate (92%) with a balanced sensitivity (91%) and specificity (93%) and the highest degree of informedness (J = 0.84). In contrast, the two trailing methods, APF and AlphaMissense, predicted this testing set with a lower accuracy of 85% (Table 3). Overall, these results demonstrated the robustness of APF2 in predicting the functional impact of variations and we thus suggest that it is suitable for application to pharmacogenomic variants with uncertain significance.
Interrogating functionality of pharmacogenomic variant from population-scale sequencing data
Next, we utilized APF2 to evaluate the functional impact of genetic variants across the human pharmacogenome. To this end, we mined variant data of 208 genes from 730,947 whole exome and 76,215 whole genome sequences provided by gnomAD v4.0. Overall, APF2 predicted over 100,000 variants to be deleterious (Supplementary Table 4), resulting in each individual harboring on average a total of 35.1 pharmacogenomic variants, of which 10.6% (3.7) were rare variants with a minor allele frequency (MAF) below 1% (Fig. 5A). Notably, compared to a previous exome data set of 60,706 individuals [55], the aggregated frequencies of deleterious variants increased only modestly at the global scale. When we further stratified the analysis by different human populations, it was revealed that Africans on average carry the highest number of functional pharmacogenomic variants (40.4 per individual), followed by Ashkenazi Jews (36.7 per individual) and Middle Easterners (36 per individual), whereas numbers were more than 20% lower in European individuals (31.8 functional variants; Fig. 5B). Importantly, the functional contribution from rare variant was very high in Africans (32%), possibly due to the large number of rare variants derived from population admixture [56]. These results emphasize the increased variability of the African superpopulation and underscore the importance of genetic profiling with high ethnogeographic resolution.
We then utilized the population-specific variant data to estimate the number of deleterious variants per individual for actionable pharmacogenes (Fig. 5C). Owing to the globally high prevalent splicing defect variant CYP3A5*3, all populations harbored on average at least one deleterious CYP3A5 variant per individual. Similarly, more than one deleterious ABCB1 variant per individual was observed in Africans, Ashkenazi Jews, East Asians, Admixed Americans and Middle Easterners, primarily due to the synonymous variant at rs1045642 (NP_000918.2:p.Ile1145=; sometimes referred to as C3435T) for which there is some evidence of functional importance [57, 58]. In contrast, the ABC transporter BCRP, encoded by ABCG2, is >50-fold less variable. We then leveraged these genetic variation data to infer the fraction of individuals at risk based on established CPIC recommendations (Fig. 5D). Individuals at risk in this context was defined as the fraction of each ethnogeographic group who would be expected to benefit from dose adjustments or alternative medications based on their genotype. Over 70% of individuals across populations are estimated to exhibit poor response or avoidable adverse events upon treatment with standard doses of amitriptyline, aligning with clinical trial data [59]. Similarly, a large number of patients are predicted to benefit from dose adjustments of warfarin; however, the fraction of at-risk individuals differs substantially between populations from 40% in Africans and South Asians to 99% in East Asians. These findings match the 59–75% reduced doses prescribed in East Asia compared to the rest of the world (2.45 mg daily vs. 4–8.76 mg daily) [60]. For clopidogrel, our results predict that on average 30% of individuals of European ancestry taking clopidogrel would benefit from dose adjustments or alternative antiplatelet treatment, which again matches well with previous clinical findings that showed that up to 37% of patients do not respond appropriately to clopidogrel [61]. The predicted numbers of individuals at risk can differ substantially between populations. For instance, for irinotecan, the fraction differed between 0.1/1000 individuals in the Middle East compared to 280/1000 for Africans. Similarly, relative risk differed >7-fold for tacrolimus where 95/1000 European individuals would benefit from genetically guided prescriptions compared to 717/1000 Africans. These results indicate the importance of considering ethnogeographic information for drug response and the personalization of treatment.
Discussion
Over the last decades, a multitude of computational algorithms have been developed to predict the functional consequences of missense variants. These either leverage statistical methods based on sequence information or adopt machine learning on existing data to infer variant effect. Evolutionary conservation has been the predominant feature for either of these approaches. However, pharmacogenomic variations are often not highly conserved, which reduces the predictive power of conservation-based tools [62]. Moreover, as many algorithms consider variants that are very common in the general population as functionally neutral, there are multiple functionally important variations, such as the variants defining haplotypes CYP2C8*2 (MAF = 15.2% in Africans), CYP2C19*2 (MAF > 26% in Asian populations), CYP2D6*4 (MAF = 19.6% in Europeans) and CYP2D6*10 (MAF = 57.3% in East Asians) that are incorrectly assigned during training, reducing algorithm performance and accuracy [63]. As a consequence, most tools are designed to predict pathogenic rather than deleterious variations.
To overcome this issue, both gene-specific [64,65,66] and general purpose pharmacogenomic algorithms [20, 67] have been developed. However, these tools are based on supervised learning and critically rely on deleterious and neutral variant sets with accurate labels with no or only limited consideration of structural information. Here, we extended these approaches by integrating AlphaMissense scores, an approach that they did not use labeled pathogenic variants as training. Indeed, the resulting APF2 score improved predictive performance on an unseen variant set that also included variations in genes that were not part of model training. As such, high accuracy is achieved while avoiding model circularity, which constitutes a common issue known to result in inflated performance estimates [68]. The use of an ensemble method is moreover in line with current guidelines by the American College of Medical Genetics (ACMG), which recommends using a concordance-based approach that integrates predictive results from several prediction algorithms [69].
Notably, high-quality functional data from in vitro experiments are required for the training and validation of variant effect predictors. For APF2, we utilized a total of 530 variants distributed across 45 pharmacogenes for model development, including genes encoding CYPs and other phase I enzymes, phase II enzymes as well as drug transporters. While this is an increase of almost 60% compared to the original APF, the extent of pharmacogenomic training data remains limited. Notably, recent advances in deep mutational scanning drastically increase the numbers of variants with functional annotations. For instance, activity data has been generated for >6000 variants in CYP2C9 [70] as well as for >2900 in NUDT15 [71]. However, we here decided to omit these data to avoid skewing the data towards one or few genes. With the accumulation of such data sets, we anticipate that these can provide powerful resources for the further refinement of pharmacogenomic prediction algorithms.
Analysis of genome-scale sequencing data from a total of 807,162 individuals (gnomAD v4.0) revealed a total of 31.4 common and 3.7 rare pharmacogenetic variants with putative functional consequences. These numbers are similar to previous estimates based on 60,706 exomes from the legacy ExAC project derived by APF [15] or APF2, suggesting that a further increase in cohort sizes has, if at all, only minor impacts on the overall number of aggregated pharmacogenetic variations per individual. The current analyses also confirm pronounced population differences. Consequently, although estimates for the overall aggregated frequency of pharmacogenetic variants might reach saturation at the global scale, there is clearly a need for more refined genetic signatures with high ethnogeographic resolution.
While APF2 outperformed previous algorithms, multiple limitations remain. Firstly, APF2 can only predict quantitative variant effects for reduced function of the gene product, whereas gain-of-function effects cannot be predicted by the current version. Secondly, APF2, like all other algorithms, cannot identify substrate-specific effects. For instance, CYP2C8*3 is thought to increase clearance of glitazones, whereas the same allele is associated with reduced metabolism of ibuprofen and paclitaxel [72]. APF2 only provides a single score and, in the case of CYP2C8*3, identifies the underlying variants as deleterious, thus not hinting at potential differences between substrates. Furthermore, APF2 only evaluates single variants and is currently not able to provide interpretations of combinatorial variant effects. This limitation might be particularly pronounced for CYP2A6 and CYP2D6 genes where complex haplotype structures are routinely identified. Lastly, the tool is limited to missense variants, whereas regulatory variants or variations that result in splicing defects or ribosomal stalling cannot be detected.
Translating genomic sequence data into treatment recommendations faces the important challenge of how to consider variants for which no epidemiological or experimental functionality data is available. Currently, the application of computational variant interpretation is limited to research settings. In this context, it has been suggested that, when sequencing data are available, predicting the function of variants of unknown significance can act as one piece of evidence to assist clinical decision making, for instance by flagging carriers of putatively deleterious variants for high-intensity surveillance or therapeutic drug monitoring [73]. Indeed, application of APF2 to population-scale pharmacogenomic data identified that up to 90% patients of European and Ashkenazi Jewish ancestry would benefit from dose adjustments when undergoing amitriptyline therapy and, overall, 78% of African patients are at risk of poor response or adverse events when treated with efavirenz. However, stringent trials with high ethnogeographic resolution are needed to evaluate whether sequencing coupled with computational evaluation of unknown variants can improve personalized response predictions in well-defined therapeutic contexts and whether those interventions constitute a cost-effective allocation of health care resources.
In summary, we have extended the established APF algorithm by integrating structural information inferred by AlphaMissense. The resulting tool, APF2, predicts the functional impact of pharmacogenomic variants with higher accuracy, particularly for clinically relevant variations with CPIC guidelines. We furthermore demonstrate 92% accuracy with balanced sensitivity and specificity on an independent test set of 146 variants from PharmGKB and ClinVar. Combined, these findings suggest that integration of structural data provides a further step towards reliable pharmacogenetic variant effect prediction, which might facilitate the translation of personal sequencing data into personalized pharmacogenetic advice.
Data availability
Aggregated variant data is available via gnomAD (https://gnomad.broadinstitute.org/), AlphaMissense scores can be accessed at https://console.cloud.google.com/storage/browser/dm_alphamissense. Both repositories are publicly available. All relevant information from variants that were used for model training, validation and testing are provided in Supplementary Tables S1–S3.
References
Trivedi MH, Rush AJ, Wisniewski SR, Nierenberg AA, Warden D, Ritz L, et al. Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice. Am J Psychiatry. 2006;163:28–40. https://doi.org/10.1176/appi.ajp.163.1.28.
Schork NJ. Personalized medicine: Time for one-person trials. Nature. 2015;520:609–11. https://doi.org/10.1038/520609a.
Pirmohamed M, James S, Meakin S, Green C, Scott AK, Walley TJ, et al. Adverse drug reactions as cause of admission to hospital: prospective analysis of 18 820 patients. BMJ. 2004;329:15. https://doi.org/10.1136/bmj.329.7456.15.
Odar-Cederlöf I, Oskarsson P, Ohlén G, Tesfa Y, Bergendal A, Helldén A, et al. Läkemedelsbiverkan som orsak till inläggning på sjukhus. Vanliga medel står för merparten, visar tvärsnittsstudie [Adverse drug effect as cause of hospital admission. Common drugs are the major part according to the cross-sectional study]. Läkartidningen. 2008;105:890–3.
Jennings ELM, Murphy KD, Gallagher P, O’Mahony D. In-hospital adverse drug reactions in older adults; prevalence, presentation and associated drugs—a systematic review and meta-analysis. Age Ageing. 2020;49:948–58. https://doi.org/10.1093/ageing/afaa188.
Osanlou R, Walker L, Hughes DA, Burnside G, Pirmohamed M. Adverse drug reactions, multimorbidity and polypharmacy: a prospective analysis of 1 month of medical admissions. BMJ Open. 2022;12:e055551. https://doi.org/10.1136/bmjopen-2021-055551.
Gyllensten H, Hakkarainen KM, Hägg S, Carlsten A, Petzold M, Rehnberg C, et al. Economic Impact of Adverse Drug Events – A Retrospective Population-Based Cohort Study of 4970 Adults. PLoS One. 2014;9:e92061. https://doi.org/10.1371/journal.pone.0092061.
Downing NS, Shah ND, Aminawung JA, Pease AM, Zeitoun JD, Krumholz HM, et al. Postmarket Safety Events Among Novel Therapeutics Approved by the US Food and Drug Administration Between 2001 and 2010. JAMA. 2017;317:1854–63. https://doi.org/10.1001/jama.2017.5150.
Lauschke VM, Zhou Y, Ingelman-Sundberg M. Novel genetic and epigenetic factors of importance for inter-individual differences in drug disposition, response and toxicity. Pharmacol Ther. 2019;197:122–52. https://doi.org/10.1016/j.pharmthera.2019.01.002.
Pirmohamed M. Pharmacogenomics: current status and future perspectives. Nat Rev Genet. 2023;24:350–62. https://doi.org/10.1038/s41576-022-00572-8.
Young AI. Solving the missing heritability problem. PLoS Genet. 2019;15:e1008222. https://doi.org/10.1371/journal.pgen.1008222.
Ingelman-Sundberg M. The missing heritability in pharmacogenomics: role of NFIB and other factors. Pharmacogenomics. 2022;23:453–5. https://doi.org/10.2217/pgs-2022-0054.
Lauschke VM, Zhou Y, Ingelman-Sundberg M. Pharmacogenomics Beyond Single Common Genetic Variants: The Way Forward. Annu Rev Pharmacol Toxicol. 2023;64:33–51. https://doi.org/10.1146/annurev-pharmtox-051921-091209.
Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625:92–100. https://doi.org/10.1038/s41586-023-06045-0.
Ingelman-Sundberg M, Mkrtchian S, Zhou Y, Lauschke VM. Integrating rare genetic variants into pharmacogenetic drug response predictions. Hum Genomics. 2018;12:26. https://doi.org/10.1186/s40246-018-0157-3.
Wright GEB, Carleton B, Hayden MR, Ross CJD. The global spectrum of protein-coding pharmacogenomic diversity. Pharmacogenomics J. 2018;18:187–95. https://doi.org/10.1038/tpj.2016.77.
Zhou Y, Tremmel R, Schaeffeler E, Schwab M, Lauschke VM. Challenges and opportunities associated with rare-variant pharmacogenomics. Trends Pharmacol Sci. 2022;43:852–65. https://doi.org/10.1016/j.tips.2022.07.002.
Zhou Y, Lauschke VM. Computational tools to assess the functional consequences of rare and noncoding pharmacogenetic variability. Clin Pharmacol Ther. 2021;110:626–36. https://doi.org/10.1002/cpt.2289.
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet. 2022;13:981005. https://doi.org/10.3389/fgene.2022.981005.
Zhou Y, Mkrtchian S, Kumondai M, Hiratsuka M, Lauschke VM. An optimized prediction framework to assess the functional impact of pharmacogenetic variants. Pharmacogenomics J. 2019;19:115–26. https://doi.org/10.1038/s41397-018-0044-2.
Zhou Y, Hernandez CD, Lauschke VM. Population-scale predictions of DPD and TPMT phenotypes using a quantitative pharmacogene-specific ensemble classifier. Br J Cancer. 2020;123:1782–9. https://doi.org/10.1038/s41416-020-01084-0.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–9. https://doi.org/10.1038/s41586-021-03819-2.
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381:eadg7492. https://doi.org/10.1126/science.adg7492.
Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11:863–74. https://doi.org/10.1101/gr.176601.
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–9. https://doi.org/10.1038/nmeth0410-248.
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS One. 2012;7:e46688. https://doi.org/10.1371/journal.pone.0046688.
Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–2. https://doi.org/10.1038/nmeth.2890.
Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39:e118. https://doi.org/10.1093/nar/gkr407.
Carter H, Douville C, Stenson PD, Cooper DN, Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics. 2013;14:S3. https://doi.org/10.1186/1471-2164-14-s3-s3.
Alirezaie N, Kernohan KD, Hartley T, Majewski J, Hocking TD. ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants. Am J Hum Genetics. 2018;103:474–83. https://doi.org/10.1016/j.ajhg.2018.08.005.
Pejaver V, Urresti J, Lugo-Martinez J, Pagel KA, Lin GN, Nam HJ, et al. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat Commun. 2020;11:5918. https://doi.org/10.1038/s41467-020-19669-x.
Li C, Zhi D, Wang K, Liu X. MetaRNN: differentiating rare pathogenic and rare benign missense SNVs and InDels using deep learning. Genome Med. 2022;14:115. https://doi.org/10.1186/s13073-022-01120-z.
Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet. 2015;24:2125–37. https://doi.org/10.1093/hmg/ddu733.
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, et al. Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum Mutat. 2012;34:57–65. https://doi.org/10.1002/humu.22225.
Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015;31:1536–43. https://doi.org/10.1093/bioinformatics/btv009.
Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310–5. https://doi.org/10.1038/ng.2892.
Quang D, Chen Y, Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31:761–3. https://doi.org/10.1093/bioinformatics/btu703.
Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet. 2016;99:877–85. https://doi.org/10.1016/j.ajhg.2016.08.016.
Ionita-Laza I, McCallum K, Xu B, Buxbaum J. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet. 2016;48:214–20. https://doi.org/10.1038/ng.3477.
Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–61. https://doi.org/10.1101/gr.092619.109.
Malhis N, Jacobson M, Jones SJM, Gsponer J. LIST-S2: taxonomy based sorting of deleterious missense mutations across species. Nucleic Acids Res. 2020;48:W154–W161. https://doi.org/10.1093/nar/gkaa288.
Raimondi D, Tanyalcin I, Ferté J, Gazzo A, Orlando G, Lenaerts T, et al. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res. 2017;45:W201–W206. https://doi.org/10.1093/nar/gkx390.
Qi H, Zhang H, Zhao Y, Chen C, Long JJ, Chung WK, et al. MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun. 2021;12:510. https://doi.org/10.1038/s41467-020-20847-0.
Jagadeesh KA, Wenger AM, Berger MJJ, Guturu H, Stenson PD, Cooper DN, et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet. 2016;48:1581–6. https://doi.org/10.1038/ng.3703.
Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, et al. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018;50:1161–70. https://doi.org/10.1038/s41588-018-0167-z.
Samocha KE, Kosmicki JA, Karczewski KJ, O’Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, et al. Regional missense constraint improves variant deleteriousness prediction. bioRxiv. 2017;148353. https://doi.org/10.1101/148353.
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet. 2015;47:276–83. https://doi.org/10.1038/ng.3196.
Lu Q, Hu Y, Sun J, Cheng Y, Cheung KH, Zhao H. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep. 2015;5:10576. https://doi.org/10.1038/srep10576.
Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164–e164. https://doi.org/10.1093/nar/gkq603.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43. https://doi.org/10.1038/s41586-020-2308-7.
Si D, Guo Y, Zhang Y, Yang L, Zhou H, Zhong D. Identification of a novel variant CYP2C9 allele in Chinese. Pharmacogenetics. 2004;14:465–9. https://doi.org/10.1097/01.fpc.0000114749.08559.e4.
Zhou Y-H, Zheng Q-C, Li Z-S, Zhang Y, Sun M, Sun CC, et al. On the human CYP2C9*13 variant activity reduction: a molecular dynamics simulation and docking study. Biochimie. 2006;88:1457–65. https://doi.org/10.1016/j.biochi.2006.05.001.
de Paula Michelatto D, Karlsson L, Lusa ALG, Silva CD, Östberg LJ, Persson B, et al. Functional and Structural Consequences of Nine CYP21A2 Mutations Ranging from Very Mild to Severe Effects. Int J Endocrinol. 2016;2016:4209670. https://doi.org/10.1155/2016/4209670.
Baradaran‐Heravi A, Vakili R, Robins T, Carlsson J, Ghaemi N, A’rabi A, et al. Three novel CYP21A2 mutations and their protein modelling in patients with classical 21‐hydroxylase deficiency from northeastern Iran. Clin Endocrinol. 2007;67:335–41. https://doi.org/10.1111/j.1365-2265.2007.02886.x.
Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91. https://doi.org/10.1038/nature19057.
Pfennig A, Petersen LN, Kachambwa P, Lachance J. Evolutionary Genetics and Admixture in African Populations. Genome Biol Evol. 2023;15:evad054. https://doi.org/10.1093/gbe/evad054.
Kimchi-Sarfaty C, Oh JM, Kim I-W, Sauna ZE, Calcagno AM, Ambudkar SV, et al. A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science. 2007;315:525–8. https://doi.org/10.1126/science.1135308.
Saiz‐Rodríguez M, Belmonte C, Román M, Ochoa D, Jiang-Zheng C, Koller D, et al. Effect of ABCB1 C3435T Polymorphism on Pharmacokinetics of Antipsychotics and Antidepressants. Basic Clin Pharmacol Toxicol. 2018;123:474–85. https://doi.org/10.1111/bcpt.13031.
Barbui C, Hotopf M. Amitriptyline v. the rest: still the leading antidepressant after 40 years of randomised controlled trials. Br J Psychiatry. 2001;178:129–44. https://doi.org/10.1192/bjp.178.2.129.
Poller L, Taberner DA. Dosage and control of oral anticoagulants: an international collaborative survey. Br J Haematol. 1982;51:479–85. https://doi.org/10.1111/j.1365-2141.1982.00479.x.
Müller I, Besta F, Schulz C, Massberg S, Schönig A, Gawaz M. Prevalence of clopidogrel non-responders among patients with stable angina pectoris scheduled for elective coronary stent placement. Thromb Haemost. 2003;89:783–7. https://doi.org/10.1055/s-0037-1613462.
Lauschke VM, Milani L, Ingelman-Sundberg M. Pharmacogenomic Biomarkers for Improved Drug Therapy—Recent Progress and Future Developments. AAPS J. 2017;20:4. https://doi.org/10.1208/s12248-017-0161-x.
Zhou Y, Lauschke VM. The genetic landscape of major drug metabolizing cytochrome P450 genes—an updated analysis of population-scale sequencing data. Pharmacogenomics J. 2022;22:284–93. https://doi.org/10.1038/s41397-022-00288-2.
Shrestha S, Zhang C, Jerde CR, Nie Q, Li H, Offer SM, et al. Gene-Specific Variant Classifier (DPYD-Varifier) to Identify Deleterious Alleles of Dihydropyrimidine Dehydrogenase. Clin Pharmacol Ther. 2018;104:709–18. https://doi.org/10.1002/cpt.1020.
McInnes G, Dalton R, Sangkuhl K, Whirl-Carrillo M, Lee SB, Tsao PS, et al. Transfer learning enables prediction of CYP2D6 haplotype function. PLoS Comput Biol. 2020;16:e1008399. https://doi.org/10.1371/journal.pcbi.1008399.
van der Lee M, Allard WG, Vossen RH, Baak-Pablo RF, Menafra R, Deiman BALM, et al. Toward predicting CYP2D6-mediated variable drug response from CYP2D6 gene sequencing data. Sci Transl Med. 2021;13:eabf3637. https://doi.org/10.1126/scitranslmed.abf3637.
Pandi M-T, Koromina M, Tsafaridis I, Patsilinakos S, Christoforou E, van der Spek PJ, et al. A novel machine learning-based approach for the computational functional assessment of pharmacogenomic variants. Hum Genomics. 2021;15:51. https://doi.org/10.1186/s40246-021-00352-1.
Grimm DG, Azencott C-A, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat. 2015;36:513–23. https://doi.org/10.1002/humu.22768.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–23. https://doi.org/10.1038/gim.2015.30.
Amorosi CJ, Chiasson MA, McDonald MG, Wong LH, Sitko KA, Boyle G, et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am J Hum Genet. 2021;108:1735–51. https://doi.org/10.1016/j.ajhg.2021.07.001.
Suiter CC, Moriyama T, Matreyek KA, Yang W, Scaletti ER, Nishii R, et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Prac Natl Acad Sci USA. 2020;117:5394–401. https://doi.org/10.1073/pnas.1915680117.
Marcath LA, Pasternak AL, Hertz DL. Challenges to assess substrate-dependent allelic effects in CYP450 enzymes and the potential clinical implications. Pharmacogenomics J. 2019;19:501–15. https://doi.org/10.1038/s41397-019-0105-1.
Zhou Y, Lauschke VM. Next-generation sequencing in pharmacogenomics – fit for clinical decision support? Expert Rev Clin Pharmacol. 2024;17:213–23. https://doi.org/10.1080/17512433.2024.2307418.
Funding
The work received support from the Swedish Research Council [grant numbers: 2021-02801 and 2023-03015], Cancerfonden [grant 23-0763PT], the Robert Bosch Stiftung (Stuttgart, Germany), the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2180 (project number 390900677) and EXC 2064/1 (project number 390727645). SP was supported by a Twinning Grant of the German Cancer Research Center (DKFZ) and the Robert Bosch Center for Tumor Diseases (RBCT). Open access funding provided by Karolinska Institute.
Author information
Authors and Affiliations
Contributions
YZ compiled and analyzed the data, wrote the first draft of the manuscript and prepared all figures. SP conducted data curation and contributed to data visualization. VML conceived and supervised the project. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
YZ and VML are co-founders and shareholders of Shanghai Hepo Biotechnology Ltd. VML is CEO and shareholder of HepaPredict AB. SP declares no conflicts of interest.
Ethics approval and consent to participate
All data analyzed in this work was extracted from gnomAD, a repository that makes aggregated genomic data publicly available under the Fort Lauderdale Agreement. The contributing studies (https://gnomad.broadinstitute.org/about) were conducted in accordance with the Declaration of Helsinki. Details about the responsible ethics committees can be found in the descriptions of the contributing projects.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhou, Y., Pirmann, S. & Lauschke, V.M. APF2: an improved ensemble method for pharmacogenomic variant effect prediction. Pharmacogenomics J 24, 17 (2024). https://doi.org/10.1038/s41397-024-00338-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41397-024-00338-x
- Springer Nature Limited