Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2

Aljarf, Raghad; Shen, Mengyuan; Pires, Douglas E. V.; Ascher, David B.

doi:10.1038/s41598-022-13508-3

Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2

Article
Open access
Published: 21 June 2022

Volume 12, article number 10458, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2

Download PDF

Raghad Aljarf^1,2,3,
Mengyuan Shen^1,2,3,4,
Douglas E. V. Pires^1,2,3,4 &
…
David B. Ascher^1,2,3,5

3565 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

BRCA1 and BRCA2 are tumour suppressor genes that play a critical role in maintaining genomic stability via the DNA repair mechanism. DNA repair defects caused by BRCA1 and BRCA2 missense variants increase the risk of developing breast and ovarian cancers. Accurate identification of these variants becomes clinically relevant, as means to guide personalized patient management and early detection. Next-generation sequencing efforts have significantly increased data availability but also the discovery of variants of uncertain significance that need interpretation. Experimental approaches used to measure the molecular consequences of these variants, however, are usually costly and time-consuming. Therefore, computational tools have emerged as faster alternatives for assisting in the interpretation of the clinical significance of newly discovered variants. To better understand and predict variant pathogenicity in BRCA1 and BRCA2, various machine learning algorithms have been proposed, however presented limited performance. Here we present BRCA1 and BRCA2 gene-specific models and a generic model for quantifying the functional impacts of single-point missense variants in these genes. Across tenfold cross-validation, our final models achieved a Matthew's Correlation Coefficient (MCC) of up to 0.98 and comparable performance of up to 0.89 across independent, non-redundant blind tests, outperforming alternative approaches. We believe our predictive tool will be a valuable resource for providing insights into understanding and interpreting the functional consequences of missense variants in these genes and as a tool for guiding the interpretation of newly discovered variants and prioritizing mutations for experimental validation.

BRCA mutations: is everything said?

Article 06 October 2018

HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures

Article 13 March 2017

Assessment of small in-frame indels and C-terminal nonsense variants of BRCA1 using a validated functional assay

Article Open access 28 September 2022

Introduction

The breast cancer susceptibility gene 1 (BRCA1) and 2 (BRCA2) are tumour suppressor genes required in pathways responsible for repairing damaged DNA, transcriptional regulation, and maintaining genomic stability, as these are crucial mechanisms for cells to avoid apoptosis and chromosomal rearrangement¹. Consequently, variants in these genes can predispose to multiple types of cancer².

Genetic testing is widely used in the clinic to identify individuals at high risk of developing breast, ovarian, and other types of cancers and these individuals are frequently carriers of germline pathogenic variants that disrupt BRCA1 and BRCA2 DNA repair function³.

Germline variants in BRCA1 and BRCA2 contribute to 20–25% of hereditary breast and ovarian cancer⁴, while BRCA1/2 somatic variants account for 5%–7% of ovarian cancers⁵ and up to 10% of breast cancers⁶. Individuals with BRCA1/2 variants have an increased risk of developing both breast (84% increased risk) and ovarian (45% increased risk) cancers^6,7. Pathogenic variants of BRCA1/2 genes are associated with approximately 15–40% of hereditary breast cancers⁸. Individuals carrying BRCA1 pathogenic variants have a 59% elevated risk of developing breast cancer and a 34% of developing ovarian cancer by age 70. In contrast, carriers of BRCA2 pathogenic variants have a 51% risk of breast cancer and 11% risk of ovarian cancer at the age of 80 years⁹. Even though characterising a missense variant definitive pathogenicity status can better inform treatment, prevention and clinical management⁴, most missense variants identified by clinical genetic testing reported in public databases are listed as variants of uncertain significance (VUS)¹⁰. Thus, there is a need for accurate approaches to establish and predict variant pathogenicity and its impact on protein function.

Failure to precisely predict the consequences of missense variants in BRCA1 and BRCA2 genes confounds our understanding of sequencing data and impacts clinical care. To date, as only a limited number of missense variants have been functionally evaluated experimentally, the interpretation of variant pathogenicity has relied on applying in silico tools for predicting functional effects together with family-based data¹¹.

Despite significant effort dedicated over the years to the development of accurate and general computational methods capable of identifying deleterious variants at genomic scale^12,13,14,15, these have presented variable performance and reliability at a gene level^12,16,17,18. In a particular example of BRCA1/2, Ernst et al. suggested after evaluating the performance of Align-GVGD^19,20, SIFT¹², PolyPhen-2¹⁵ and MutationTaster2²¹ on a set of well-characterized BRCA1/2 variants, that the results obtained using in silico tools are insufficient to be applied as stand-alone evidence in clinical diagnostics¹⁸. Thus, the availability of experimentally characterized effects of variants would allow us to overcome this limitation by tailoring gene-specific predictive methods to uncover mutation-structure–function relationships.

With advances in bioinformatics and computational biology, several computational attempts have been made to explore the functional impacts of missense variants in BRCA1 and BRCA2 genes. Hart et al. implemented an in silico model BRCA-ML for understanding the functional impact of missense variants in BRCA1 and BRCA2 genes and VUS classification¹¹. In addition, Arshad and colleagues investigated the structural and functional consequences of BRCA1 variants on cellular mechanisms by applying well-established in silico approaches²². Finally, Ernst et al. evaluated the reliability of employing computational tools to predict the pathogenicity of BRCA1 and BRCA2 missense variants as the basis for clinical decision-making¹⁸. They analysed performance improvement effects by combining various in silico prediction approaches on a data set of well-characterized BRCA1/2 missense variants in comparison to stand-alone tools.

Here we have developed a new machine learning method capable of accurately predicting the functional effect of missense variants in the BRCA1 and BRCA2 genes and implemented a computational saturation mutagenesis approach to classify all VUSs within these genes. We believe that our predictive models could be valuable for interpreting BRCA1 and BRCA2 variants and overcoming the challenge of classifying variants of uncertain significance, in addition to improving the clinical utility of genetic testing on these genes.

Results

Variant distribution in BRCA1 and BRCA2

In order to visualize the distributions of missense variants curated from ClinVar¹⁰ BRCA1 and BRCA2, lollipop plots were generated and are depicted in Fig. 1. Most pathogenic variants observed were concentrated at well-known functional domains (BRCT and RING domains of BRCA1 and the DNA binding domain of BRCA2) of both genes, consistent with the previous findings⁴. Benign variants were uniformly distributed across both genes, covering 62% and 74% of BRCA1 and BRCA2 residues, respectively.

Exploring the functional consequences of BRCA variants using statistical analysis and feature engineering

To distinguish between pathogenic and benign variants, we performed a qualitative analysis to investigate the relationship between different molecular properties with variant consequences. These included protein stability effects upon mutation, amino acid biophysical properties, effects on post-translational modifications and evolutionary conservation. A total of 197 features were calculated (Suppl. Table 1).

We conducted a Welch Two Sample t-test to identify features that could differentiate between the two classes, pathogenic and benign, in both BRCA1 (Suppl. Figure 1) and BRCA2 (Suppl. Figure 2) genes. For BRCA1, one of the most descriptive attributes was sequence conservation given as ConSurf scores²³ (p < 2.2e-16), indicating that pathogenic variants tend to frequently occur in conserved regions, consequently leading to function impairment, in agreement with previous studies²⁴. Other features highlighting the molecular differences between the two classes include amino acid physicochemical properties²⁵. Particularly, features representing statistical potentials (KESO980102: p = < 6.6e-06, MIRL960101: p = < 1.1e-05 and MIYT 79,010: p = < 1.1e-05) presented a significant difference between benign and pathogenic variants.

For BRCA2, highly discriminating features included sequence evolutionary conservation properties (PANTHER²⁶ : p < 6.9e-13, ConSurf ²³: p < 2.3e-15), suggesting that pathogenic variants tend to occur in conserved positions, as previously observed²⁴. The stability analysis by SAAFEC-SEQ²⁷ tool (p < 0.007) revealed that pathogenic variants were likely to be highly destabilizing, as shown before²⁴. Furthermore, pathogenic variants displayed differential patterns in terms of amino acid physicochemical properties²⁵ in comparison to benign variants (MUET020101: p < 0.003). These properties highlight the importance of considering a range of properties when assessing the functional impacts of variants on protein function.

For model optimization, Welch’s t-test was also conducted on all the features used in the final model (BRCA1/2 combined) to provide biological insight into which distinct features characterize functional consequences of BRCA1 and BRCA2 upon single amino acid substitutions (Fig. 2). Among the most differentiating attributes were sequence-based conservation scores (PolyphenScore²⁸) and amino acids physicochemical properties²²: HENS920101 (represents the BLOSUM45 substitution matrix), WEIL970101 (represents amino acid comparative profiles) and LUTR910107 (represents mutation matrices for the various protein secondary structure classes²²).

Following the elimination of redundant features, a greedy feature selection approach was performed, based on Matthew’s Correlation Coefficient (MCC). Our final optimal model included 15 features (Suppl. Table 2). These representative features of the varied classes considered involved conservative scores from Provean²⁹ and PolyphenScore²⁸. In addition, MetaSVM_score, MPC-rankscore, MutationTaster_score, ClinPred-score²⁸, and physicochemical amino acid properties (AA-index)²² were included, as well as functional annotation scores from the AWESOME tool (predicting the effect of SNP on the level of the post-translational modification), including ubiquitination, acetylation and AWESOME Score³⁰.

Notably, while AA-index²² provides a measure of numerical indices that represent different physicochemical properties of amino acids, only six of these features were selected by the greedy feature selection approach: MIYS990107 and THOP960101 are representations of the amino acid pair-wise contact potentials, while LUTR910107, HENS920101, WEIL970102, and WEIL970101 denote amino acid mutation matrices.

Developing BRCA1 and BRCA2 gene-specific pathogenicity predictors

Different supervised learning algorithms were assessed to build gene-specific predictive models that can accurately identify pathogenic variants in BRCA1 and BRCA2 genes.

After greedy features selection, the best performing models were obtained using the Random Forest classifier (n_estimators = 300) for both genes. While for BRCA1-combined and BRCA2-combined (where pathogenic or likely pathogenic variants were grouped as pathogenic, and benign or likely benign variants were grouped as benign), the models with the best performances were the ensemble classifiers: Extra Trees (n_estimators = 300) and Gradient Boosting (n_estimators = 300), respectively.

BRCA1 and BRCA2 gene-specific predictors achieved a range of Matthew’s Correlation Coefficient (MCC) varying from 0.89 to 0.98 across tenfold cross-validation and comparable performance of up to 0.89 across independent, non-redundant blind tests (Table 1). Furthermore, the final classification models achieved an AUC of up to 0.99 across tenfold cross-validation (Fig. 3) and comparable performance of up to 0.98 on independent, non-redundant blind tests.

Table 1 Comparative performance of BRCA1/2 models across cross-validation and non-redundant blind test sets.

Full size table

When comparing the predictions made by BRCA1 and BRCA2 gene-specific models, the BRCA1 correctly identified 94 out of 97 pathogenic variants, and it wrongly classified 5 out of 150 benign missense variants. In contrast, we found that the BRCA2 model was more accurate in classifying benign variants; it misclassified only one benign variant as pathogenic.

Predicting the clinical significance of BRCA1/2 variants using ENIGMA data

We build gene-specific predictive models to increase the reliability and evaluate the clinical impact of BRCA1/2 missense variants. Therefore, we assessed different supervised learning algorithms to train (a binary classifier) and optimise the predictive capability of each model in classifying pathogenic variants in BRCA1 and BRCA2 genes using the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) data³¹.

After greedy features selection, the models with the best performances were obtained using the ensemble classifier Gradient Boosting (n_estimators = 300) for both genes.

BRCA1 and BRCA2 gene-specific predictors performed a range of Matthew's Correlation Coefficient (MCC) ranging from 0.82 to 0.96 across tenfold cross-validation and comparative performance of up to 1.00 across independent, non-redundant blind tests (Table 1). Similarly, the final classification models achieved an AUC of up to 0.99 across tenfold cross-validation (Suppl. Figure 3) and equivalent performance of up to 1.00 on independent, non-redundant blind tests.

When looking closely at the predictions made by BRCA1 and BRCA2 (ENIGMA) gene-specific models, the BRCA1 model accurately categorised 28 out of 29 pathogenic variants, and it incorrectly classified 1 out of 112 benign missense variants.

The misclassified variant, S1715R, is located in the BRCT domain of BRCA1 and has been previously revealed to disrupt BRCA1 interaction with Abraxas, BRIP1 and CtIP29³². It was also misclassified by other tools, including polyphen2¹⁵ and Align-GVGD^19,20, highlighting that potentially including structural information into these predictions could further improve accuracy by capturing additional molecular consequences of variants.

Developing a general BRCA1/2 pathogenicity predictor

We investigated whether a general predictive tool could be developed to accurately classify pathogenic variants in BRCA1 and BRCA2 genes by combining all missense variants of both genes.

For the general BRCA1/2 predictor (where all variants of both genes were combined), the final model with the best performance was obtained using the Random Forest classifier (n_estimators = 300). It achieved an accuracy of 0.96 on tenfold cross-validation, with an AUC of 0.96, MCC of 0.91, and precision of 0.96. This was comparable with the performance across the non-redundant blind test, achieving an AUC of 0.95, MCC of 0.76, and precision of 0.93, providing confidence in the generalizability of the final model (Table 1 and Suppl. Figure 4). When tested on all BRCA1/2 variants in the training BRCA1/2-combined combined dataset (n = 406), our initial model’s performance in classifying pathogenic and benign variants was 91% and 98%, respectively.

Table 1 shows the performance of the classification models across tenfold cross-validation and blind test sets. The performance of our BRCA1 and BRCA2 gene-specific and general pathogenicity predictors was consistent on both tenfold cross-validation and blind test sets highlighting the robustness of the predictive models, and their capability of accurately differentiating between pathogenic and benign variants.

To better guide the interpretation of novel variants, we tested the applicability of our general model to predict the likelihood of pathogenicity of the Variants of Unknown Significance (VUS, n = 5716) in BRCA1 and BRCA2. It was observed that our model predicted 13% of these as pathogenic and 87% as benign. According to the BRCA1/2- general model, the total proportion of all potential pathogenic variants in BRCA1 and BRCA2 is ~ 3% (891 out of 30,616). Nearly all of them are located in well-known functional domains (BRCT and RING domains of BRCA1 and the DNA binding domain of BRCA2), consistent with the previous findings⁴.

Interestingly, our model predicted the variant W31S located in the PALB2-binding domain of BRCA2 as pathogenic, which is consistent with a recent study finding³³. The tryptophan residue at position 31 of BRCA2 is one of the essential residues for BRCA2 interaction with PALB2, as it is known to create a polar bridge with Ser1065 of PALB2³⁴. Consequently, changing tryptophan to Serine would abolish BRCA2 binding to PALB2, as demonstrated previously in vitro^34,35.

BRCA1/2- combined predicted scores for all possible single-nucleotide variation (SNVs) are provided in Supplementary Data Set 1.

Using the molecular consequences of BRCA variants to identify distinguishing features

The main purpose of this study was to build an accurate and efficient model that can predict BRCA1/2 pathogenic variants. Therefore, identifying a set of informative features is crucial for adequate model performance and improving our understanding of the molecular basis of variant pathogenicity.

The final features acquired via greedy feature selection resembled the initial results of the qualitative analysis. To assess how each of the feature categories contributed to the final model, we trained a predictive model using different feature subsets: evolutionary conservation, missense variant prediction models from dbNSFP²⁸, physicochemical properties, changes in post-translational modifications.

MCC values representing the performance on the blind test for each subset model were compared (Suppl. Table 3). Noticeably, physicochemical properties WEIL970102 and HENS920101²⁵ (MCC = 0.76) were the main contributing features to the final model (BRCA1/2 combined), followed by other features that contributed to a moderate extent: changes in post-translational modifications³⁰ (MCC = 0.75), ClinPred_score and MutationTaster_score²⁸(MCC = 0.74).

As a final analysis, we explored the feature importance of the combined BRCA1/2 model. Suppl. Figure 5 shows that the sequence conservation feature PANTHER²⁶ is the most contributing feature followed by PolyphenScore²⁸ (a deleterious scoring method). On the other hand, most measures of conservation (SIFT¹⁰, SNAP2³⁶, and Provean²⁹) contributed to a moderate extent to the final model.

Validation of BRCA1/2 general pathogenicity predictor using Functional Data

To evaluate the robustness of the BRCA1/2-general model, several types of functional data reported by Hart^4,11, Startia³⁷, and Findlay³⁸ comprising BRCA1 and BRCA2 variants and their functional scores (with previously established cut-off points for pathogenic variants) were applied as an independent blind test set to validate our model. The combined experimental functional data contained 2,882 BRCA1 SNVs from RING and BRCT domains evaluated using different functional assays^4,37,38 and 229 BRCA2 SNVs from the DNA binding domain assessed using the HDR assay^4,11. 2,906 out of the 3,135 BRCA1/2 variants reported in the previously mentioned studies were not present in our training dataset.

Our model achieved an accuracy of 92% and F1-score of 0.93 for those variants not incorporated in the training data (2,906 variants), highlighting the robustness of our predictive model, and providing confidence in the generalizability of the final model. Figure 4 shows the confidence scores distribution of the functionally evaluated pathogenic and benign VUSs in BRCA1/2, demonstrating a good separation between classes.

To showcase the performance of our method, we have assessed two variants. P34L and T1684P are currently classified as VUSs and were predicted as pathogenic at very high probabilities (of 0.88 and 0.91, respectively). Following the present results, a previous study demonstrated that these two variants were designated non-functional, based on functional scores obtained by saturation genome editing functional assay³⁸. Furthermore, the P34L and T1684P variants are present in the Ring and BRCT domains of BRCA1, respectively. The P34L variant is predicted to destabilise the structure (-0.84 kcal/mol—mCSM-Stability³⁹), with the conversion to Leucine (Leu) altering the backbone structure, leading to loss of rigidity and steric clashes to accommodate Leu (Suppl. Figure 6a). Interestingly, the T1684P variant was also predicted to cause destabilisation of the protein (-0.23 kcal/mol—mCSM-Stability³⁹). The proline substitution could disturb the α-helical conformation by intervening intramolecular H-bonding loss of the main-side H-bond and flexibility by eliminating the amide hydrogen required for hydrogen bonding (Suppl. Figure 6b). Suppl. Figure 6 illustrates the interatomic interaction of P34L and T1684P variants.

It was illustrated in a previous study using a multiplex HDR reporter assay that (amino acids 2–96) tended to have the highest proportion of non-functional variants, as the RING domain is encoded almost totally by these positions that are involved in the stability, folding, and function of the full-length protein^37,38. Additionally, BRCA1 missense variants that are known to predispose to cancer map to either the RING or BRCT domain³⁷.

Comparison with other available methods

We compared the performance of our model (on both cross-validation and blind test sets) with well-established predictors designed to predict the functional effects of missense variants (PolyPhen-2¹⁵, SIFT¹² , Align-GVGD^19,20 , REVEL¹³ and CADD⁴⁰). Additionally, we compared the performance of our models with other available studies ^11,38,41,42. The comparative prediction performance of the classification models on cross-validation is shown in Table 2. Our models significantly outperformed alternative approaches, with the BRCA1 model obtaining an accuracy of 0.96 compared to 0.75 for MLR-CAGI⁴², while the BRCA2 model achieved 0.97. Table.3 illustrates the comparative performance of the classification models on blind test sets. Our BRCA1/2 general model obtained an AUC of 0.96 and 0.95 on cross-validation and blind test sets, respectively, outperforming PolyPhen-2¹⁵ (0.66,0.77),SIFT¹² (0.78,0.79) and Align-GVGD^19,20 (0.53,0.59) , REVEL¹³ (0.79,0.86) and CADD⁴⁰ (0.84,0.79). The predictive models show a significant improvement in the robustness and predictive power compared to previous methods in both data sets (Table2,3).

Table 2 Comparative Performance on cross-validation between BRCA1/2 classification models and other available approaches.

Full size table

Table 3 Comparative Performance on blindtest sets between BRCA1/2 classification models and other alternative predictors.

Full size table

Comparison with alternative approaches that rely on genetic data

As in our study we aim at predicting the molecular consequences of coding variants (missense variants) in BRCA1 and BRCA2, we compared the performance of our BRCA1 and BRCA2 models with other studies that solely rely on genetic data and likelihood ratios to identify pathogenic variants.

Easton et al.⁴³ built a logistic regression model to evaluate the clinical significance of 1,433 sequence variants of unknown significance (VUSs) in BRCA 1 and 2, reporting an AUC of 0.80 and 0.70 on their BRCA1 and BRCA2 models, respectively. In a similar way, many previous studies (Lindor, 2011⁹; Parsons, 2019)³¹ employed a Multifactorial Probability-Based Model (posterior probability model) for classifying VUSs in BRCA1 and BRCA2 that incorporate different forms of genetic evidence. For instance, Parsons et al.³¹ achieved an AUC of 0.78 and an accuracy of 0.80 on their BRCA1/2 model. In comparison, Lindor et al. (2011)⁹ obtained an AUC of up to 0.93 and an accuracy of up to 0.92 on their BRCA1 and BRCA2 models.

Similarly, MS et al. (2020)³ employed logistic regression to indicate carrier level based on personal and family history of cancer and calculate likelihood ratios denoting pathogenicity. By analysing ~ 138,000 individuals carrying 2,383 BRCA1/2 variants tested by multigene panel testing (MGPT), their model achieved an AUC of up to 0.83 for BRCA1 and up to 0.70 for BRCA2.

Our models significantly outperformed alternative approaches, BRCA1 model obtaining an AUC of 0.98 and an accuracy of 0.96, while the BRCA2 model achieved an AUC of 0.97 and an accuracy of 0.98. The considerably higher performance of our method highlights the necessity to consider protein information to predict pathogenic variants in BRCA1/2.

Comparison of BRCA1/2-general predictor with ACMG/AMP classification

To demonstrate the robustness of our final model (BRCA1/2 general) in classifying VUSs, we compared our final model classification results with the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pa-theology (AMP) scoring⁴⁴, by applying a bioinformatics tool, InterVar⁴⁵.

It was possible to compare most of the BRCA1/2 missense variants with Intervar⁴⁵. Among the BRCA1 and BRCA2 (VUSs) classified as pathogenic by our final model, none were categorised into this class by Intervar⁴⁵. In contrast, the missense variants classified pathogenic were categorised as either likely pathogenic or likely benign by InterVar or remained VUSs.

Noticeably, only ~ 2% of BRCA1/2 missense variants (VUSs) classified as benign by our final model were categorised as likely pathogenic by Intervar⁴⁵. On the other hand, the prevalence of additional missense variants classified as benign remained VUSs or likely benign with the InterVar tool⁴⁵.

We observed many dissimilarities between our final model prediction and the InterVar tool ACMG/AMP variants scoring. This observance is in line with a recent study³³ that revealed distinctions between their classification established on a multifactorial model and ACMG/AMP scoring.

Discussion

Achieving reliable estimations of cancer risk and functional consequence of BRCA1 and BRCA2 sequence variants represent a potential to improve management, diagnosis, and clinical decisions of inherited breast and ovarian cancers^38,46 and computational approaches can enable and support these estimations.

Our study aims to classify and comprehensively estimate the functional consequences of missense variants in BRCA1/2 genes. We have shown that incorporating machine learning approaches with general pathogenicity scoring systems and mutation physicochemical properties is an effective strategy to obtain accurate predictive models for identifying deleterious missense variants in BRCA1 and BRCA2, which might lead to assisting classification of variants of uncertain significance that currently restrict the interpretation of genomic testing data. The final models obtained for each gene presented statistically significant improvements in comparison with other available approaches.

Wide-scale experimental mutational scanning methods, as in the cases illustrated by Findlay³⁸ and Starita³⁷ have provided a broader view of the mutational landscape in BRCA1/2. Although these studies functionally classified thousands of variants (1056 and 3893, respectively), there are still over 12,520 and 22,772 possible unclassified missense variants in BRCA1 and BRCA2⁹, that can be investigated efficiently using computational tools.

There are, however, still many limitations to applying these models. The number of experimentally validated deleterious variants in BRCA1 and BRCA2, necessary for model development, is limited, imposing a challenge for machine learning methods and restrains generalization capabilities. In addition, training data are restricted to defined variants that are in protein regions identified to be involved with impaired DNA repair. For instance, the only BRCA2 missense variants, which are known to be disease-causing, are in the DNA-binding domain. It is not understood whether variants located in other domains, which our model predicted, and others predict to be disease-causing, can repress DNA repair.

Nevertheless, the BRCA1/2 combined model used for predicting the functional impact of all possible missense variants in BRCA1 and BRCA2 demonstrated a sensitivity of 96% and 98% specificity, implying that extrapolation beyond the identified domains could be possible. Employing additional pathogenic and neutral measures could determine whether other components of these genes reflect pathogenicity as well as predict their functional impacts.

Here we demonstrate that our final model (BRCA1/2 combined) is a reliable approach to classify thousands of missense variants in a clinically actionable gene. We anticipate that the in-silico saturation mutagenesis methods would become applicable and reliable for interpreting variants of unknown significance, as well as for providing direct functional estimations for newly observed variants. Moreover, the improved performance in our predictive models could assist researchers in prioritising potential SNVs in BRCA1 and BRCA2 for further exploration and validation. The results of the computational saturation mutagenesis were made available to researchers (see Supplementary Data Set 1 for all potential SNVs in both genes).

Methods

Data sets

To build a gene-specific model as well as a generic model for predicting the functional impact of missense variants in BRCA1 and BRCA2, variants of both genes reviewed by an expert panel (3 stars) and had no conflicting interpretations were curated from the ClinVar¹⁰ database. In this study, two different datasets were used for each gene to build and train a predictive model, comprising 247 missense variants (pathogenic:97; benign:150) for BRCA1, and a total of 189 missense variants (pathogenic:43; benign:146) for BRCA2 as the primary datasets. Moreover, the benign or likely benign variants retrieved from ClinVar (with no conflicting interpretations) in the combined datasets were grouped into the benign category, and variants interpreted as pathogenic or likely pathogenic were grouped as pathogenic. In comparison, the combined datasets consisted of a total of 335 missense variants for BRCA1 and a total of 297 missense variants for the BRCA2 gene.

Furthermore, we have utilised BRCA1/2 missense variants that ENIGMA³¹ quantitatively and qualitatively classified as pathogenic/benign to increase the reliability of our gene-specific models. The classification of these variants was initially derived based on a multifactorial model and causality scores ranking to assess their clinical significance. We included missense variants if they fulfilled the following standards: pathogenic or benign labels, posterior probability score from multifactorial likelihood analysis ≥ 0.99 (pathogenic) or < 0.99 (benign), or International Agency for Research on Cancer (IARC) class1 (benign) and 5 (pathogenic)⁴⁷. (See Supplementary Data Set 2 for more details on the variants used and analysed in the calculations).

The ENIGMA datasets used comprised 141 missense variants (pathogenic:29; benign:112) for BRCA1 and a total of 118 missense variants (pathogenic:11; benign:107) for the BRCA2 gene. The functional validation datasets used in our study were from Hart^4,9, Starita³⁷, and Findlay³⁸. Notably, we have only kept the variants that had a functional impact at the protein level, i.e., nonsynonymous missense variants, excluding splicing variants.

All datasets were divided into a training (85%) and blind test (15%) to train and evaluate the predictive/generalisation performance of the predictive models used for the classification task.

Feature engineering and selection

In this study, a range of features was calculated using different in silico tools to evaluate and predict the molecular and functional consequences of missense variants in BRCA1 and BRCA2.These features incorporated distinct categories, including, evolutionary conservation, protein post-translational modifications (PTMs) changes, sequence properties, biophysical characterization, and variants deleteriousness and pathogenicity evaluation. Supplementary Table 1 summarises the list of investigated features.

1.
Conservation and sequence-based: We estimated the degree of residue conservation using ConSurf²³. Substitution matrices (PAMs, BLOSUMs)⁴⁸ and aaindex²⁵ were calculated to account for the evolutionary conservation scores and physicochemical attributes of amino acids, respectively. Sequence-based Scores from SAAFEC-SEQ²⁷ were measured to evaluate the impacts of single point mutations on protein stability and thermodynamics. Additionally, we applied the Missense Tolerance Ratio (MTR)⁴⁹ to measure the deleteriousness of a missense variant by considering its surrounding regional intolerance.
2.
Protein post-translational modifications (PTMs) changes: We used the AWESOME ³⁰ tool to systematically assess the functional mechanism underlying missense variants and their impact on PTMs that include ubiquitination phosphorylation, glycosylation, methylation, and acetylation.
3.
Biophysical characterization: The Align-GVGD^19,20 version applied can be found at http://agvgd.hci.utah.edu/agvgd_input.php, which explicitly classifies missense substitutions into neutral or deleterious by combining the biophysical properties of amino acids and protein multiple sequence alignments and does not incorporate splicing.
4.
Prediction based on Deleteriousness and pathogenicity scoring methods: Deleteriousness scoring methods from dbNSFP²⁸ (Suppl. Table 1) were employed to quantify the deleterious effects of missense variants. We estimated the functional consequences of each variant using pathogenicity-based features SNAP2³⁶, PANTHER²⁶, SIFT¹², and Provean²⁹.

Selecting the best set of features to train predictive models is known to be a challenging problem. A bottom-up greedy feature selection method was employed to reduce the noise of dimensionality. This approach considers each feature independently and iteratively, keeping only the set of features with the best performance⁵⁰.

Qualitative analysis

To statistically catalog features that differentiate between the two classes (pathogenic and benign) two-sided Welch sample t-test was carried out on the primary and combined datasets by applying a cutoff p-value of < 0.05, employing the ggsignif package in Rstudio.

Machine learning approaches

To obtain predictive classification models, we first evaluated several classification algorithms, including Random Forest, Extremely Randomized Trees, Gradient Boosting, and Adaboost using the implementation available on the Scikit-learn library⁵¹. The predictive models were trained using stratified tenfold cross-validation and evaluated on non-redundant blind tests.

Model evaluation metrics

The performance of classification models was evaluated using well-established evaluation metrics, including the Area Under the ROC curve (AUC), Matthew’s Correlation Coefficient (MCC), Precision, F1 Score, Sensitivity, and Specificity. AUC is an effective measure to evaluate the performance of a model in a classification task at various threshold settings. Higher AUC means that the model is robust and capable of distinguishing between the two classes: pathogenic and benign. AUC ranges from 0 and 1. Therefore, a perfect model would have an AUC of 1, and an AUC of 0.5 indicates that the model is a random classifier. MCC is a balanced metric for assessing a classifier’s performance. The MCC returns values that range between − 1 and 1, where total disagreement in predictions is represented as -1, and a coefficient of 1 indicates a perfect prediction. F1 score is the harmonic mean of Precision and Recall (Sensitivity) of a classifier. Precision is the proportion between the correctly classified as positive and all positives. Recall represents the number of correctly predicted positive observations to all positives (pathogenic) in a dataset. Sensitivity (True Positive Rate) and specificity (True Negative Rate) are statistical measures used to estimate the proportion of positive (pathogenic) and negative (benign) classes that are correctly classified, respectively.

References

Joosse, S. A. BRCA1 and BRCA2: A common pathway of genome protection but different breast cancer subtypes. Nat. Rev. Cancer 12, 372. https://doi.org/10.1038/nrc3181-c2 (2012).
Article CAS PubMed Google Scholar
Cavanagh, H. & Rogers, K. M. The role of BRCA1 and BRCA2 mutations in prostate, pancreatic and stomach cancers. Hered Cancer Clin. Pract. 13, 16. https://doi.org/10.1186/s13053-015-0038-x (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Classification of variants of uncertain significance in BRCA1 and BRCA2 using personal and family history of cancer from individuals in a large hereditary cancer multigene panel testing cohort. Genet. Med. 22, 701–708. https://doi.org/10.1038/s41436-019-0729-1 (2020).
Article PubMed Google Scholar
Hart, S. N. et al. Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models. Genet. Med. 21, 71–80. https://doi.org/10.1038/s41436-018-0018-4 (2019).
Article CAS PubMed Google Scholar
Moschetta, M., George, A., Kaye, S. B. & Banerjee, S. BRCA somatic mutations and epigenetic BRCA modifications in serous ovarian cancer. Ann. Oncol. 27, 1449–1455. https://doi.org/10.1093/annonc/mdw142 (2016).
Article CAS PubMed Google Scholar
Campeau, P. M., Foulkes, W. D. & Tischkowitz, M. D. Hereditary breast cancer: New genetic developments, new therapeutic avenues. Hum. Genet. 124, 31–42. https://doi.org/10.1007/s00439-008-0529-1 (2008).
Article CAS PubMed Google Scholar
Oh, M. et al. BRCA1 and BRCA2 gene mutations and colorectal cancer risk: Systematic review and meta-analysis. J. Natl. Cancer Inst. 110, 1178–1189. https://doi.org/10.1093/jnci/djy148 (2018).
Article CAS PubMed Google Scholar
Zayas-Villanueva, O. A. et al. Analysis of the pathogenic variants of BRCA1 and BRCA2 using next-generation sequencing in women with familial breast cancer: A case-control study. BMC Cancer 19, 722. https://doi.org/10.1186/s12885-019-5950-4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lindor, N. M. et al. A review of a multifactorial probability-based model for classification of BRCA1 and BRCA2 variants of uncertain significance (VUS). Hum. Mutat. 33, 8–21. https://doi.org/10.1002/humu.21627 (2012).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: Public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862-868. https://doi.org/10.1093/nar/gkv1222 (2016).
Article CAS PubMed Google Scholar
Hart, S. N., Polley, E. C., Shimelis, H., Yadav, S. & Couch, F. J. Prediction of the functional impact of missense variants in BRCA1 and BRCA2 with BRCA-ML. NPJ Breast Cancer 6, 13. https://doi.org/10.1038/s41523-020-0159-x (2020).
Article CAS PubMed PubMed Central Google Scholar
Sim, N. L. et al. SIFT web server: Predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452-457. https://doi.org/10.1093/nar/gks539 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ioannidis, N. M. et al. REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885. https://doi.org/10.1016/j.ajhg.2016.08.016 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. mCSM-membrane: Predicting the effects of mutations on transmembrane proteins. Nucleic Acids Res. 48, W147–W153. https://doi.org/10.1093/nar/gkaa416 (2020).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit7 20, doi:https://doi.org/10.1002/0471142905.hg0720s76 (2013).
Poon, K. S. In silico analysis of BRCA1 and BRCA2 missense variants and the relevance in molecular genetic testing. Sci. Rep. 11, 11114. https://doi.org/10.1038/s41598-021-88586-w (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Moghadasi, S. et al. Variants of uncertain significance in BRCA1 and BRCA2 assessment of in silico analysis and a proposal for communication in genetic counselling. J. Med. Genet. 50, 74–79. https://doi.org/10.1136/jmedgenet-2012-100961 (2013).
Article CAS PubMed Google Scholar
Ernst, C. et al. Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Med. Genomics 11, 35. https://doi.org/10.1186/s12920-018-0353-y (2018).
Article CAS PubMed PubMed Central Google Scholar
Tavtigian, S. V. et al. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J. Med. Genet. 43, 295–305. https://doi.org/10.1136/jmg.2005.033878 (2006).
Article CAS PubMed Google Scholar
Mathe, E. et al. Computational approaches for predicting the biological effect of p53 missense mutations: A comparison of three sequence analysis based methods. Nucleic Acids Res. 34, 1317–1325. https://doi.org/10.1093/nar/gkj518 (2006).
Article CAS PubMed PubMed Central Google Scholar
Schwarz, J. M., Cooper, D. N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362. https://doi.org/10.1038/nmeth.2890 (2014).
Article CAS PubMed Google Scholar
Arshad, S., Ishaque, I., Mumtaz, S., Rashid, M. U. & Malkani, N. In-silico analyses of nonsynonymous variants in the BRCA1 gene. Biochem. Genet. https://doi.org/10.1007/s10528-021-10074-7 (2021).
Article PubMed Google Scholar
Ashkenazy, H. et al. ConSurf 2016: An improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344-350. https://doi.org/10.1093/nar/gkw408 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yadegari, F. & Majidzadeh, K. In silico analysis for determining the deleterious nonsynonymous single nucleotide polymorphisms of BRCA genes. Mol. Biol. Res. Commun. 8, 141–150. https://doi.org/10.22099/mbrc.2019.34198.1420 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kawashima, S. et al. AAindex: Amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202-205. https://doi.org/10.1093/nar/gkm998 (2008).
Article CAS PubMed Google Scholar
Tang, H. & Thomas, P. D. PANTHER-PSEP: Predicting disease-causing genetic variants using position-specific evolutionary preservation. Bioinformatics 32, 2230–2232. https://doi.org/10.1093/bioinformatics/btw222 (2016).
Article CAS PubMed Google Scholar
Li, G., Panday, S. K. & Alexov, E. SAAFEC-SEQ: A sequence-based method for predicting the effect of single point mutations on protein thermodynamic Stability. Int. J. Mol. Sci., https://doi.org/10.3390/ijms22020606 (2021).
Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum. Mutat. 37, 235–241, https://doi.org/10.1002/humu.22932 (2016).
Choi, Y. & Chan, A. P. PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747. https://doi.org/10.1093/bioinformatics/btv195 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y. et al. AWESOME: A database of SNPs that affect protein post-translational modifications. Nucleic Acids Res. 47, D874–D880. https://doi.org/10.1093/nar/gky821 (2019).
Article CAS PubMed Google Scholar
Parsons, M. T. et al. Large scale multifactorial likelihood quantitative analysis of BRCA1 and BRCA2 variants: An ENIGMA resource to support clinical variant classification. Hum. Mutat. 40, 1557–1578. https://doi.org/10.1002/humu.23818 (2019).
Article CAS PubMed PubMed Central Google Scholar
Anantha, R. W. et al. Functional and mutational landscapes of BRCA1 for homology-directed repair and therapy resistance. Elife, doi:https://doi.org/10.7554/eLife.21350 (2017).
Caputo, S. M. et al. Classification of 101 BRCA1 and BRCA2 variants of uncertain significance by cosegregation study: A powerful approach. Am. J. Hum. Genet. 108, 1907–1923. https://doi.org/10.1016/j.ajhg.2021.09.003 (2021).
Article CAS PubMed PubMed Central Google Scholar
Biswas, K. et al. Functional evaluation of BRCA2 variants mapping to the PALB2-binding and C-terminal DNA-binding domains using a mouse ES cell-based assay. Hum. Mol. Genet. 21, 3993–4006. https://doi.org/10.1093/hmg/dds222 (2012).
Article CAS PubMed PubMed Central Google Scholar
Julien, M. et al. Intrinsic disorder and phosphorylation in BRCA2 facilitate tight regulation of multiple conserved binding events. Biomolecules, https://doi.org/10.3390/biom11071060 (2021).
Hecht, M., Bromberg, Y. & Rost, B. Better prediction of functional effects for sequence variants. BMC Genomics. https://doi.org/10.1186/1471-2164-16-S8-S1 (2015).
Starita, L. M. et al. A multiplex homology-directed DNA repair assay reveals the impact of more than 1000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103, 498–508. https://doi.org/10.1016/j.ajhg.2018.07.016 (2018).
Article CAS PubMed PubMed Central Google Scholar
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222. https://doi.org/10.1038/s41586-018-0461-z (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Pires, D. E., Ascher, D. B. & Blundell, T. L. mCSM: Predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30, 335–342. https://doi.org/10.1093/bioinformatics/btt691 (2014).
Article CAS PubMed Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894. https://doi.org/10.1093/nar/gky1016 (2019).
Article CAS PubMed Google Scholar
Masso, M., Bansal, A., Bansal, A. & Henderson, A. Structure-based functional analysis of BRCA1 RING domain variants: Concordance of computational mutagenesis, experimental assay, and clinical data. Biophys. Chem. 266, 106442. https://doi.org/10.1016/j.bpc.2020.106442 (2020).
Article CAS PubMed Google Scholar
Padilla, N. et al. BRCA1- and BRCA2-specific in silico tools for variant interpretation in the CAGI 5 ENIGMA challenge. Hum. Mutat. 40, 1593–1611. https://doi.org/10.1002/humu.23802 (2019).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Easton, D. F. et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am. J. Hum. Genet. 81, 873–883. https://doi.org/10.1086/521032 (2007).
Article CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424. https://doi.org/10.1038/gim.2015.30 (2015).
Article PubMed PubMed Central Google Scholar
Li, Q. & Wang, K. InterVar: Clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am. J. Hum. Genet. 100, 267–280. https://doi.org/10.1016/j.ajhg.2017.01.004 (2017).
Article CAS PubMed PubMed Central Google Scholar
Eccles, D. M. et al. BRCA1 and BRCA2 genetic testing-pitfalls and recommendations for managing variants of uncertain clinical significance. Ann. Oncol. 26, 2057–2065. https://doi.org/10.1093/annonc/mdv278 (2015).
Article CAS PubMed PubMed Central Google Scholar
Plon, S. E. et al. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum. Mutat. 29, 1282–1291. https://doi.org/10.1002/humu.20880 (2008).
Article CAS PubMed PubMed Central Google Scholar
Mount, D. W. Comparison of the PAM and BLOSUM Amino Acid Substitution Matrices. CSH Protoc 2008, pdb ip59, https://doi.org/10.1101/pdb.ip59 (2008).
Silk, M., Petrovski, S. & Ascher, D. B. MTR-Viewer: Identifying regions within genes under purifying selection. Nucleic Acids Res. 47, W121–W126. https://doi.org/10.1093/nar/gkz457 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tsamardinos, I., Borboudakis, G., Katsogridakis, P., Pratikakis, P. & Christophides, V. A greedy feature selection algorithm for Big Data of high dimensionality. Mach. Learn. 108, 149–202. https://doi.org/10.1007/s10994-018-5748-7 (2019).
Article MathSciNet PubMed MATH Google Scholar
Li, H. & Phung, D. Journal of machine learning research: Preface. J. Mach. Learn. Res. 39, i–ii (2014).
Google Scholar

Download references

Funding

R.A. is funded with a PhD scholarship from the Kingdom of Saudi Arabia. This work was supported in part by the Medical Research Council (MR/M026302/1 to D.B.A. and D.E.V.P.); the National Health and Medical Research Council of Australia (GNT1174405 to D.B.A.), the Wellcome Trust (093167/Z/10/Z), and the Victorian Government’s Operational Infrastructure Support Program. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
Raghad Aljarf, Mengyuan Shen, Douglas E. V. Pires & David B. Ascher
Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, 3010, Australia
Raghad Aljarf, Mengyuan Shen, Douglas E. V. Pires & David B. Ascher
Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC, 3052, Australia
Raghad Aljarf, Mengyuan Shen, Douglas E. V. Pires & David B. Ascher
School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, 3053, Australia
Mengyuan Shen & Douglas E. V. Pires
Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge, CB2 1GA, UK
David B. Ascher

Authors

Raghad Aljarf
View author publications
You can also search for this author in PubMed Google Scholar
Mengyuan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Douglas E. V. Pires
View author publications
You can also search for this author in PubMed Google Scholar
David B. Ascher
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.A. performed the machine learning experiments and wrote the manuscript. M.S. performed data curation and assisted with feature calculation. D.E.V.P. helped supervise the machine learning. D.B.A. conceived, designed and supervised all aspects of the study. All authors contributed to manuscript preparation.

Corresponding authors

Correspondence to Douglas E. V. Pires or David B. Ascher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Aljarf, R., Shen, M., Pires, D.E.V. et al. Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2. Sci Rep 12, 10458 (2022). https://doi.org/10.1038/s41598-022-13508-3

Download citation

Received: 05 September 2021
Accepted: 25 May 2022
Published: 21 June 2022
DOI: https://doi.org/10.1038/s41598-022-13508-3
Springer Nature Limited

This article is cited by

Green Synthesized Ag Nanoparticles as Promising Antibacterial and Antitumor Agents: In Vitro Studies
- Shadi Mansour Hosseini
- Atena Soleimani
- Shadi Hajrasouliha
Indian Journal of Microbiology (2024)
Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants
- Moonjong Kang
- Seonhwa Kim
- Kyu-Baek Hwang
Scientific Reports (2023)

Understanding and predicting the functional consequences of missense mutations in BRCA1 and BRCA2

Abstract

Similar content being viewed by others

BRCA mutations: is everything said?

HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures

Assessment of small in-frame indels and C-terminal nonsense variants of BRCA1 using a validated functional assay

Introduction

Results

Variant distribution in BRCA1 and BRCA2

Exploring the functional consequences of BRCA variants using statistical analysis and feature engineering

Developing BRCA1 and BRCA2 gene-specific pathogenicity predictors

Predicting the clinical significance of BRCA1/2 variants using ENIGMA data

Developing a general BRCA1/2 pathogenicity predictor

Using the molecular consequences of BRCA variants to identify distinguishing features

Validation of BRCA1/2 general pathogenicity predictor using Functional Data

Comparison with other available methods

Comparison with alternative approaches that rely on genetic data

Comparison of BRCA1/2-general predictor with ACMG/AMP classification

Discussion

Methods

Data sets

Feature engineering and selection

Qualitative analysis

Machine learning approaches

Model evaluation metrics

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Green Synthesized Ag Nanoparticles as Promising Antibacterial and Antitumor Agents: In Vitro Studies

Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants

Search

Navigation