Introduction

Hepatocellular carcinoma (HCC) is the most common primary liver cancer, ranking the third leading cause of cancer related mortalities [1]. Chronic hepatitis and liver cirrhosis lead to accumulation of genetic alterations driving HCC pathogenesis. In Egypt, HCC constitutes a significant public health problem, where it is responsible for 33.63% and 13.54% of all cancers in males and females, respectively. There is a strong link between HCC and the hepatitis C virus (HCV) epidemic affecting 10–15% of the Egyptian population, which is reported as the highest prevalence of HCV in the world [2]. The molecular mechanisms driving HCC tumorigenesis are extremely complex and an understanding of these driver mutations is essential for prevention, as well as diagnostic, prognostic, and therapeutic purposes [3, 4]. However, these key genetic drivers remain unstudied, and an understanding of the underlying molecular mechanisms process in the development of HCC in Egyptian population is necessary, of which this study adds to relevance.

According to the European Association for Study of the Liver (EASL) guidelines, one of the unmet needs in HCC research is to develop new tools for early detection including the assessment of liquid biopsy (blood sample) [5]. In addition, without a liver biopsy, assessment of the genomic profile can be addressed by a noninvasive liquid biopsy which provides actionable genomic information without the risk of complications. Using a liquid biopsy, ctDNA can be extracted to comprehensively profile the tumor genome better than conventional sampling methods [6, 7].

Hepatocellular carcinoma is highly heterogenous malignancy and recent studies found that HCC commonly presents with mutations in the TERT promoter, TP53, CTNNB1, AXIN1, LAMA2, ARID1A, WWP1, and RPS6KA3 genes. Somatic mutations that are HCC-associated vary extensively among individuals and even within a single tumor [8]. Among the causes of genomic intratumor heterogeneity (ITH) including somatic mutations, epigenetic changes, and large-scale genomic alterations, somatic alterations represent the most abundant and studied mutations among cancers. Interplay between mutated driver genes is a major determinant of the carcinogenic process [9].

Several studies have attempted to identify driver mutations within the protein coding regions or the exomes in various types of cancers. Although the exomes represent 1% of human genome, it covers 85% of the disease-causing mutations. For this reason, sequencing of the whole exome using whole exome sequencing (WES) has the potential to discover a large number of variants implicated in many diseases including cancers [10].

The widespread use of next-generation sequencing (NGS) offers in-depth investigation into the tumor type-specific and context-driven characteristics of ITH. Next generation sequencing using WES enables researchers to create the precise genomic profile of HCC and identify protein-altering mutations per tumor in relation to clinicopathologic criteria [11]. It provides a detailed understanding of cancer pathways and the discovery of molecular mechanisms of cancer. It can throw light on the frequent somatic/genetic alterations in driver genes and the main pathways dysregulated in HCC [12, 13].

Ethnicity could contribute to global differences in the molecular profile of HCC due to the presence of various risk factors such as hepatitis B virus, hepatitis C virus, alcohol, aflatoxin exposure and metabolic syndrome [10]. Consequently, in our study we performed WES on dual HCC and surrounding non-HCC tissue samples for each HCC patient. To overview somatic mutations in the form of single nucleotide polymorphisms (SNPs) and assess their clinical relevance in HCC patients, we analyzed the type and number of SNPs in each gene and the association between them and clinicopathological information. In addition, we analyzed the pathways affected and the agreement with previous results in other populations in international databases on HCC. We analyzed the common variants between all studied patients and the presence of novel mutations aiming to generate a genomic profile for HCC Egyptian patients.

Subjects and methods

The subjects of this prospective study were Egyptian patients with HCC attending Ain Shams Centre for Organ Transplantation (ASCOT), and Hepato-pancreatico-biliary unit (HPB) Ain Shams University Hospitals, Cairo, Egypt, between October 2020 and June 2022. Our study was approved by the Faculty of Medicine Ain Shams University Research Ethics Committee (FMASU R92/2020).

Subjects

Sixteen patients underwent surgical intervention (7 patients had surgical resection and 9 patients underwent living donor liver transplantation (LDLT)). Patients were subjected to genomic profiling using whole exome sequencing.

All transplanted patients were HCC on top of HCV cirrhosis, out of 7 patients who underwent resection, 4 patients were HCV-HCC, 2 patients were HBV-HCC and 1 patient had Budd-Chiari syndrome.

Criteria for Liver transplantation selection for HCC patients (n = 9):

  1. 1)

    Age from 18–65 years.

  2. 2)

    Milan Criteria (n = 6), (single lesion greater ≥ 2 cm and ≤ 5 cm, or up to 3 lesions, ≥ 1 cm and less than or ≤ 3 cm) [14]

  3. 3)

    UCSF criteria (n = 1), (single tumor ≤ 6.5 cm or ≤ 3 tumors with the largest tumor diameter ≤ 4.5 cm and total tumor diameter ≤ 8 cm) [15]

  4. 4)

    Patients beyond UCSF are selected as case by case study after successful down staging and radiological criteria showing complete response (CR) or stable disease (ST) according to Modified RECIST (mRECIST) criteria [16] with no evidence of any macrovascular invasion or distant extra hepatic spread (n = 2).

  5. 5)

    Alpha-fetoprotein ≤ 200 ng/dl.

Criteria for HCC Eligibility for Hepatic resection (n = 7):

  1. 1)

    BCLC A (Single, or ≤ 3 nodules each ≤ 3 cm, Preserved liver function, PS 0) [17],

  2. 2)

    Intermediate stage HCC according to Hong Kong liver cancer stage [218]

  3. 3)

    Child A or early B [7, 8].

  4. 4)

    No evidence of portal hypertension.

  5. 5)

    Patients unfit for or refuse liver transplantation.

Methods

Whole exome sequencing was performed using Ion Torrent (Ion Chef and Ion Proton) for sequencing of HCC and non-HCC samples from 16 Egyptian HCC patients. After explanation of the process, receiving consent and signatures from participant patients, 2 tissue samples were obtained from each HCC patients, from both HCC liver tissue and surrounding non-HCC tissue. Four patients’ samples were prepared as fresh frozen paraffin embedded (FFPE) by the pathology department of Ain Shams Specialized Hospital Liver Tissue (LT) (LT2, LT6, LT7, LT8). Samples from patients LT1,3,4,5 were excluded either due to death of patient or technical problems. As a result of the observed low quality of DNA extracted from FFPE samples, as evidenced by gel electrophoresis, the following 12 patients’ samples were taken as fresh tissue samples intraoperative by surgeon on a stabilizing solution for DNA named RNALater (RNAlater™ Stabilization Solution, ThermoFisher) according to manufacturer instructions. Samples were processed within a max of three days preserved at 2–4 °C.

DNA Extraction

  • FFPE samples: samples were fixed in 4–10% formalin as quickly as possible after surgical removal for a fixation time of 14–24 h (longer fixation times lead to more severe DNA fragmentation, resulting in poor performance in downstream assays). As recommended by the manufacturer, samples were thoroughly dehydrated prior to embedding (residual formalin can inhibit the digestion of proteinase K). Stained sections were examined by the pathologist to ensure more than 20% cancer cells in HCC samples. DNA extraction from FFPE was done using QiaAmp DNA FFPE kit (Cat. No. / ID: 56,404, Qiagen). Starting material for DNA purification consisted of freshly cut sections of FFPE tissue, each with a thickness of up to 10 μm. Up to 8 sections, each with a thickness of up to 10 μm and a surface area of up to 250 mm2, can be combined in one preparation.

  • Fresh samples: For fresh tissue, samples were cut to a maximum thickness of 0.5 cm and submerged in 5 volumes of RNAlater as recommended by manufacturer (RNAlater™ Stabilization Solution, ThermoFisher). Samples were stored at 2–4 °C for a max of 3 days until processed. Fresh tissue samples were extracted using QiaAmp DNA Mini Kit (Qiagen). 25 mg of liver tissue were used as a starting material.

  • After extraction: Using Qubit dsDNA HS assay (ThermoFisher Scientific) DNA was quantitated and purity tested using NanoDrop Microvolume Spectrophotometer (ThermoFisher Scientific) (Supplement Table 1). Purity (the ratio of absorbance at 260 and 280 nm is used to assess DNA purity) was accepted for all samples (Ratio ~ 1.8) [19], with a mean concentration of 60.7 ng/uL (14.8–127.6), FFPE samples mean concentration 58.4 ng/uL (21–120), fresh samples 60 ng/uL (14.8–127.6). Sample LT15 was excluded from WES because of exceptionally low concentration and unaccepted purity value of DNA in addition to LT9, 12 due to a failure in the sequencing step.

Whole Exome Sequencing

The breakdown of steps in whole exome sequencing is shown in Table 1. Library preparation was executed using Ion AmpliSeq Exome RDY Library Preparation kit (ThermoFisher Scientific). The amount of DNA required was 50–100 ng in a volume not exceeding 56 uL. We amplified target regions from 50 − 100 ng of genomic DNA (gDNA) in the IonAmpliSeq™ Exome RDY plates using the 5X Ion AmpliSeq™ HiFi Mix. Target amplification reactions were combined and then amplicons partially digested with FuPa Reagent. Ligation of barcode adapters with Switch Solution and DNA Ligase followed, which then was purified. We normalized the libraries after quantification using qPCR Ion Library Taqman Quantitation Kit on Via 7 Real Time PCR machine (ThermoFisher) and diluted to 100 pM. Template preparation and amplification was implemented using emulsion PCR on Ion sphere particles (ISP) and we combined two barcoded exome libraries on a single Ion 540™ Chip using Ion PI™ Hi-Q™ Chef Kit on Ion Chef system. Finally, sequencing was performed on Ion Proton™ system using semiconductor technology based on detection of hydrogen ions released during the DNA synthesis reaction by a highly sensitive pH meter—a microchip sensor.

Table 1 A breakdown outline for the steps used for sequencing HCC tissue samples

Bioinformatics

Reads were initially demultiplexed and aligned to hg19 human reference sequence using the variantCaller v5.12.0.4 (Torrent server software—Thermo Fisher Scientific) which ran as embedded instance within Ion Torrent S5 sequencer. The resulting alignment BAMs were further processed using Ion Reporter™ Software 5.18 pipeline (Thermo Fisher Scientific), which incorporated variant calling and annotate variants based on the most updated databases such as ClinVar, DrugBank, GO, etc. The existence of potentially significant variants was further reassessed through manual inspection of aligned reads in IGV 2.4 software.

Available clinical significance annotation was assessed in real-time from Human Gene Mutation Database Professional (https://portal.biobase-international.com/hgmd/pro/), ClinVar (https://www.ncbi.nlm.nih.gov/ClinVar/) and dbSNP.

(https://www.ncbi.nlm.nih.gov/snp). The predictions for SIFT, PolyPhen and PhyloP tools were retrieved from the IonReporter result files (tab-separated files). Frequency data was provided by Ensembl/VEP release 108.0; additionally, gnomAD v3.1 database was queried for allele frequencies of individual variants (http://gnomad.broadinstitute.org).

Variant annotation

The VCF files were uploaded on the main Galaxy public server on https://usegalaxy.org. SnpEff eff (Galaxy Version 4.3 + T.galaxy2) was used to annotate and predict the effect of these variants [20]. The SnpEff 4.3 hg19 genome data was downloaded using SnpEff download and applied as the genome source for SnpEff eff. The default settings were selected for upstream/downstream length of 5000 base, the set size for splice sites which was set to 2 bases as well as for spliceRegion Settings. For the annotation options, the following were selected: Add loss of function (LOF) and nonsense mediated decay (NMD) tags. The chromosomal position was based on the input type and “-chr” was prepended to the chromosome name. SnpSift Extract Fields was then used on the VCF output files from SnpEff eff to extract specific fields and organize them in a tabular file [20].

Data processing

MUTALISK was aided in determination of mutational signature in HCC samples [21]. A web-based version of OpenCRAVAT (https://run.opencravat.org) was used for performing genomic variant interpretation including variant impact, annotation, and scoring [22]. CancerSpecific High-throughput Annotation of Somatic Mutations (CHASM) was used as well [23]. CHASMplus discriminates somatic missense mutations as either cancer drivers or passengers. Predictions can be done in either a cancer type-specific manner or by a model considering multiple cancer types together. TARGET is a database of genes that, when somatically altered in cancer, are directly linked to a clinical action. TARGET genes may be predictive of response or resistance to a therapy, prognostic, and/or diagnostic [24]. Oncogenes and tumor suppressor genes were delineated as described by Vogelstein et al., 2013 [25]. Franklin was used to analyze variants according to ACMG classification for somatic variants [26]. David Annotation database was used for assessment of enrichment driver pathways in our HCC samples [27].

Results

Liver tissue (HCC and non-HCC) and corresponding blood samples were collected from 16 HCC patients including 15 males (94%) and 1 female (6%), with a median age of 59 years (IQR 51–60 years). Of these, 13 patients had HCV and 2 patients had HBV (LT9, LT13), while 1 patient suffered from Budd-Chiari syndrome (LT11). According to Milan staging, 6 patients followed Milan Criteria (66.7%), 1 patient followed UCSF criteria (11.1%), and 2 patients were beyond both (22.2%). Regarding BCLC clinical staging system, 7 cases were stage A (43.7%), 6 cases were stage B (37.5%), and 3 cases were stage D (18.8%). Among the sixteen enrolled patients, tissue biopsy was taken from HCC and non-HCC liver tissue from 9 patients during liver transplantation, while 7 were sampled during hepatic resection.

Variant types

Using whole exome sequencing, the exons and surrounding noncoding genomic regions of protein-coding genes were captured in pairs of tumor and non-cancerous liver tissues. Genomic regions of variants varied from exon to non-exon variants (Supplement Table 2). Among pathogenic variants (Missense (27.9%), frameshift (1.6%), stop-gained (1.5%) and splice-site (0.4%)), missense was the highest percentage as shown in Fig. 1 and in Supplement Table 3.

Fig. 1
figure 1

Percentage of different variant sequence ontologies in HCC samples

Mutational signatures

We analyzed HCC samples via Mutalisk against 30 mutational signatures COSMIC v2 [21] (Fig. 2; Supplement Table 4, Supplement Table 1). Analysis showed predominance of single base substitutions S1, S5, and S23 in FFPE samples (LT2, LT6, LT7, LT8) and S1, S5, S6, and S12 in fresh tissue samples (LT10, LT11, LT13, LT14, LT16, LT17, LT18, LT19, LT20). Signature 1 is the result of an endogenous mutational process initiated by spontaneous deamination of 5-methylcytosine. The number of S1 mutations correlates with age of cancer diagnosis. Signature 5 is found in all cancer types and most cancer samples. The etiology of S5 is unknown. Signature 6 is found in many cancer types and is most common in colorectal and uterine cancers. In most other cancer types, S6 is found in less than 3% of examined samples. S6 is associated with defective DNA mismatch repair and is found in microsatellite unstable tumors, S6 is associated with high numbers of small (< 3 bp) insertions and deletions at mono/polynucleotide repeats. Signatures 12 and 23 are found in liver cancer. Their etiology remains unknown [21]. Whole results are shown in Supplement MUTALISK Mutational signatures HCC.

Fig. 2
figure 2

Mutational signatures in HCC samples showing predominance of S1, S5 and S23 in FFPE samples (LT2, LT6, LT7, LT8) and S1, S5, S6, and S12) in fresh tissue samples (LT10, LT11, LT13, LT14, LT16, LT17, LT18, LT19, LT20) [21]

Highly mutated genes (highest number of somatic variants)

Analysis of highly mutated genes was done using OpenCRAVAT. Genes showing the highest somatic mutations in HCC and Non-HCC are shown in Tables 2, 3. Analysis of highly mutated genes in both HCC and Non-HCC revealed the presence of 10 common highly mutated genes (AHNAK2, MUC6, MUC16, TTN, ZNF17, FLG, MUC12, OBSCN, PDE4DIP, MUC5b, and HYDIN) (Supplement Tables 5, 6). Mapping of variants on the most highly mutated genes (AHNAK2, MUC16, MUC6, FLG, PDE4DIP, and HYDIN) relative to The Cancer Genome Atlas in HCC (TCGA LIHC) [28, 29] is shown in Fig. 3. Comutant OBSCN and TTN are shown in HCC and Non-HCC (Table 4).

Fig. 3
figure 3

Variants in highly mutated genes HCC from our study (Above the bar), and the variants from TCGA HCC (Below the bar) are shown as a lollipop diagram, Protein domains are shown as brown bars. Variants in multiple samples are taller. Color Sequence ontology of variant

Table 2 Heatmap for pathogenic variants in highly mutated genes in HCC samples
Table 3 Heatmap for pathogenic variants in highly mutated genes in Non-HCC samples
Table 4 Commutant OBSCN and TTN HCC and Non-HCC samples

The Cancer Genome Atlas (TCGA) significantly mutated genes

The Cancer Genome Atlas (TCGA) network performed a large-scale multi-platform analysis of HCC, including evaluation of somatic mutations in 363 patients. Whole exome sequencing on 363 HCC cases revealed that 12,136 genes had non-silent mutations, and 26 significantly mutated genes (SMGs) were determined using the MutSigCV algorithm [28, 29]. The list includes the 26 significantly mutated HCC genes identified across 10 genome sequencing studies in addition to TCGA and is shown in Supplement file TCGA HCC.

We studied the somatic variants in HCC and Non-HCC tissues regarding the 26 significantly mutated genes to determine the type and number of mutations in patients’ liver tissue. Data analysis revealed the presence of 87 unique variants in HCC and 94 in non-HCC tissues, most were frameshift truncation as demonstrated in Fig. 4. Genes APOB and RP1L1 showed the highest number of mutations in both HCC and Non-HCC tissues (Fig. 5). Frameshift mutations comprised 77% in HCC and 66% in Non-HCC tissues, while missense were 21% and 31%. respectively. Analysis of unique variants in individual samples revealed the presence of Tier 1, Tier 2 variants in SMGs in HCC and Non-HCC (TP53, PIK3CA, CDKN2A, and BAP1) among variants with CScape score for driver point mutation exceeding 0.5 [30] are listed in Table 5. CScape Coding predicts the oncogenic status (disease-driver or neutral) of somatic point mutations (missense in our list) specifically in the coding region of the cancer genome values above 0.5 are predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign [30]. Only BAP1 had 2 variants with CScape exceeding 0.5 in Non-HCC TCGA SMGs. Table 6 demonstrates the distribution of different variants in TCGA SMGS in our HCC and Non-HCC patients. A list showing the frequency of TCGA SMGs and their number among HCC and Non-HCC samples is shown in Table 6.

Fig. 4
figure 4

Unique pathogenic variants involving TCGA SMGs in both (a) HCC and (b) Non-HCC

Fig. 5
figure 5

Unique pathogenic variants in Oncogenes, tumor suppressor genes (TSGs) HCC with Phred score > 20, Fs = frameshift-truncation, Sc = Splice site, T1 = Tier1, T2 = Tier 2, T3 = Tier 3, T4 = Tier4. A higher CHASM + LIHC score (liver hepatocellular carcinoma) in colored areas indicates a higher possibility of being a driver mutation in HCC

Table 5 Unique pathogenic variants with (CScape score > 0.5) unique pathogenic variants in SMGs TCGA HCC and Non-HCC
Table 6 Heatmap showing TCGA SMGs unique pathogenic variants among HCC and Non-HCC patients samples and their frequency

Cancer genome landscape

Oncogenes and tumor suppressor genes harbored pathogenic and non-pathogenic mutations. The percentage of unique TSGs mutations were higher (63%) than unique oncogenes mutations (37%) in HCC samples. Missense (60%) were more prevalent than frameshift (37%) in both with a fewer percentage of stop gained and splice site variants, and with a higher percentage of frameshift in TSGs relative to oncogenes (Supplement Table 7). Unique pathogenic variants with a Phred score exceeding 20, missense variants were sorted according to CHASMplus LIHC (Hepatocellular carcinoma) score in HCC samples, with Tier 1 and Tier 2 variants (TP53, CDKN2A, and MSH2) according to ACMG criteria by Franklin delineated in (Fig. 5) and in Supplement Table 8 including chromosome position mapped against GRCH37, targeted treatment, Phred score and zygosity. Unique pathogenic variants with Phred score more than 20, missense variants were sorted according to CHASMplus LIHC (Hepatocellular carcinoma) score in Non-HCC samples, with Tier 1 and Tier 2 variants (KMT2D and ATM) according to ACMG criteria by Franklin delineated in Fig. 6 and in Supplement Table 9.

Fig. 6
figure 6

Unique pathogenic variants in Oncogenes, tumor suppressor genes (TSGs) Non- HCC with Phred score > 20, Fs = frameshift-truncation, Ss = Splice site, T1 = Tier1, T2 = Tier 2, T3 = Tier 3, T4 = Tier4. A higher CHASM + LIHC score (liver hepatocellular carcinoma) in colored areas (CHASMplus LIHC score) indicates a higher possibility of being a driver mutation in Non-HCC

Pathways in Hepatocellular carcinoma

To reveal the potential role of mutated genes in biological process, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation using David annotation tools [27] in HCC samples, analysis of results is listed in Supplement file Go KEGG HCC. Results for the most significant KEGG pathways are shown in Table 7. For KEGG analysis, among the significantly annotated clusters were Notch signaling, Wnt signaling, PI3K-AKT pathway, Hippo signaling, Apelin signaling, Hedgehog (Hh) signaling, and MAPK signaling, in addition to ECM-receptor interaction, focal adhesion, and calcium signaling which are crucial landmarks in a hepatocellular carcinoma pathway. Pathogenic mutations in genes in all HCC samples, among HCC pathway, are illustrated in Supplement Fig. 2.

Table 7 Enriched Kegg pathways in combined HCC samples

Tier 1 and Tier 2 variants

We performed database search on Franklin to reassure the somatic category of variants by other studies and to assess the category according to Association for Molecular Pathology (AMP) standards [26]. Eight variants proved to be (3, Tier 1 and 5, Tier 2) with a Phred score > 20 (TP53, CDKN2A, and MSH2 in individual HCC samples, Kit variant common in 3 HCC, KMT2D (3 variants in Non-HCC), and ATM in Non-HCC, in addition to other 15 variants with Phred score < 20 (KMT2D, NOTCH1, KMT2C, PIK3CA, KIT, SMARCA4, ATM, PTEN, MSH2, and PTCH1). These variants are rare in population studies as revealed by gnomAD frequency on OpenCravat less than 0.1 (Table 8). All the previous mutations are clinically targetable with specific inhibitors. A list of 134 variants (Supplement Table 10) Tier 1 and Ter 2 low frequency variants including those before, and other variants that were low confidence.

Table 8 Tier1 and Tier2 variants in HCC and Non-HCC samples (GRCH37) all heterozygous, population frequency < 0.1

Discussion

Currently, three types of NGS-based analytical methods are mainly used to identify genomic mutations: (i) whole exome sequencing (WES), (ii) whole-genome sequencing (WGS), and (iii) targeted sequencing (TS). This study was done to explore the mutational landscape in Egyptian patients with HCC using whole exome sequencing on both HCC and Non-HCC liver tissue.

In our study, missense was the highest prevalence, followed by synonymous variants with no significant difference between them, with frameshift and stop gained showing much lower percentages. This agrees with other cancer studies where the most called variants are commonly synonymous, followed by non-synonymous and splice site variants; however, the less common frameshift and stop gained/stop loss variants are more likely to have deleterious effects at the protein level, which guides variant prioritization. Although assumed benign and excluded, synonymous variants are claimed to harbor pathogenic properties in cancer, particularly concerning changes in protein expression and splicing [31].

The significantly lower level of variants in FFPE samples than in fresh tissue (FT) in our study can be explained by fragmentation, which is often extensive in formalin fixed samples. Thus, FFPE tissues have a significantly lower amount of amplifiable DNA templates and need special preparation kits for library preparation [32].

Regarding mutational signature, all single base substitution signatures (S1, S5, S6, S12, S17, S23) were previously observed in liver cancer, but with variable frequencies [21]. Signatures SBS1 and SBS5 are clock-like mutational processes that increase with age progress. Signatures 12, 17, and 23 are of unknown reason. Signature SBS1 (associated with spontaneous deamination) and signature SBS5 (associated with the activity of transcription-coupled nucleotide excision repair) [33]. These mutational processes are active in both normal genomes as well as in tumor cells. Signature 6 is associated with defective DNA mismatch repair and is found in microsatellite unstable tumors. It typically constitutes not more than 3% in cancers [34], but in our patients it constituted a high percentage with a median 14.2%. The mutational signature of FFPE samples is different from that of fresh samples. In their 2019 study, Bhagwate et al. also found that the DNA quality and quantity of FFPE samples are often sub-optimal, and resulting NGS-based genetics variant detections are prone to false positives, which could be the cause of the variation in mutational signature [35].

Hepatocellular carcinoma has a higher tumor mutation burden (TMB) than the average for other solid tumors. Tumor mutation burden is defined as the abundance of somatic mutations in a tumor, in the case of high TMB, immunotherapy can significantly improve overall survival than tumors with low TMB. The higher TMB differentiates the tumor from normal tissue, and exposes it to immune cells, thus it becomes more responsive to immunotherapy [36]. This is the reason we performed analysis of variants for both HCC and Non-HCC liver tissue to clarify the prevalence and type of variants present in both cancerous and non-cancerous liver tissue that increment TMB. Analysis of highly mutated genes in both HCC and Non-HCC revealed the presence of 8 highly mutated genes in HCC (AHNAK2, MUC16, TTN, MUC6, MUC12, ZNF717, OBSCN, PDE4DIP).

Mutations in AHNAK2 are listed in pathological specimens and databases from The Cancer Genome Atlas (TCGA) [29]. AHNAK2 (AHNAK nucleoprotein 2), also known as C14orf78, is a member of the AHNAK family. AHNAK mediates a negative regulation of cell growth and acts as tumor suppressor through potentiation of TGFβ signaling. Upregulated ANHAK2 activates the PI3K/AKT signaling pathway and promotes proliferation, migration, and invasion [37].

Analysis of highly mutated genes revealed that MUC16 is highly rich in somatic variants in HCC liver tissue and is ranked third among HCC and Non-HCC regarding the number of pathogenic variants arising in the gene in our study (supplement Tables 3 and 4). MUC16, an oncogene that encodes cancer antigen 125 (CA-125), sustains normal cell function, and its role in the development of numerous cancers is explained by being employed in activation of the p53 and DNA repair pathway. The critical role of the MUC16 gene in HCC is highlighted as MUC16 gene mutations which are enriched in various cancer-related pathways, such as cell cycle and metabolic processes. In addition, MUC16 mutation significantly increases TMB and represents an independent marker with high predictive value for HCC [38, 39]. In our study, AHNAK2, ZNF717, MAP2K3, HYDIN, OBSCN, PDE4DIP are among the highly mutated genes. TTN and OBSCN were most abundant in HCC and Non-HCC respectively. Our results partially agree with Shen et al. (2020) who performed WES on liver tissues collected from 10 HCC Chinese patients. Among the 25 mutant genes in their study including a different gene profile, the highest mutation frequency was found in HYDIN, TTN, OBSCN, and AHNAK2 [40].

In our data analysis, TTN and OBSCN were most frequent in HCC. In HCC samples, 11/13 (85%) samples had a commutant OBSCN and TTN while in Non-HCC 8/12 (67%) had mutation in both genes, with the remaining percentage showing only OBSCN pathogenic variants. Same observation was found previously in colorectal cancer (CRC) patients in TCGA study, TTN and OBSCN were commonly mutated, and mutations correlated with higher TMB and favorable overall survival. Patients with the commutation of TTN and OBSCN were categorized as ‘Double‐Hit’, patients with only one gene mutation TTN or OBSCN were labelled ‘Single‐Hit’, and patients with both genes wild‐type were categorized ‘Double‐WT’. Double‐Hit group revealed low tendency to malignant events, and highest TMB, immune cells infiltration abundance, as well as immune checkpoints expression compared with the other two phenotypes. This finding may extend to HCC but needs exploration through further studies; researchers suggest ‘immune‐hot’ category of tumors to have better immunotherapeutic efficacy and as a result a more favorable prognosis [41].

The HCC mutational landscape is characterized by both pronounced intra and inter-tumoral heterogeneity with a lack of a representative actionable oncogenic driver. Gene-specific filtering was performed on our WES data in both HCC and Non-HCC for the 26 significantly mutated genes in TCGA study [42] to expand data interrogation across the entire exome, and to perform candidate gene analysis based on what we obtained from up-to-date curated databases [43]. Although the Phred score for some of these variants was low, in most variants due to homopolymers errors by ion Proton semiconductor-based sequencing detection method, they passed variant assessment and had an accepted depth of coverage [44]. HCC development is mainly driven by inactivating mutations in various tumor-suppressor genes as TP53, AXIN1, ARID1A, ARID2, and CDKN2A; mutations in oncogenes as CTNNB1, PIK3CA, KRAS, NRAS, NEF2L2, which predispose hepatocytes to accumulate additional oncogenic changes that drive cancer initiation and tumorigenesis [42]. Our study shows concordant results with genes TP53, TERT, Azin1, KEAP1, BAP1, PIK3CA, APOB, ARID1A, CDKN2A, AXIN1, NFE2L2, CREB3L3, and RP1L1 showing variants with a high probability of being driver mutations in a variable number of our HCC cases having CScape score > 0.5 [30].

Exploring The Cancer Genome Atlas Significantly Mutated Genes (TCGA SMGs), in our study the most frequently mutated genes in HCC samples were APOB (53%), RP1L1 (69%), AHCTF1 (46%), AXIN1 (46%) and AHCTF1 (38%), KEAP1 (31%), NEF2L2 (31%), and PIK3CA (31%). The most frequently mutated genes in Non- HCC samples in our study were RP1L1 (50%), APOB (41.7%), TERT (41.7%), ARID1A (33%), CREB3L3 (33%), and AXIN1 (33%), some of the most established drivers of HCC [42]. In accordance with our results, mutated genes in our study were among the most frequently mutated group of HCC driver genes in TCGA. However, in TCGA the two most frequently mutated genes (> 25% of total cases), in their study on 363 HCC cases were tumor suppressor gene TP53 and the WNT pathway oncogene CTNNB1 [30]. Genes TP53 had unique pathogenic coding variants in 7.6% of HCC samples and 16.7% of Non-HCC, while CTNNB1 in 15.4% of HCC and 8.3% of Non-HCC. Genes involved in Wnt pathway as AXIN1 showed unique pathogenic variants in 46% of our HCC samples and were mutually exclusive to the activating β-catenin mutations except in one Non-HCC sample which showed both AXIN1 and CTNNB1 mutations [45]. The AXIN1 gene, on the other hand, is a negative regulator of the Wnt/β-catenin pathway. Mutually exclusive mutations occurring in AXINS are driver mutations that drive hepatocarcinogenesis in the stage of low-grade dysplastic nodules in cirrhotic liver. In their study, TCGA reported a low percentage of both NEF2L2 (3%) and KEAP1 (5%) and highlighted their vital role in cellular antioxidant defenses [9].

Variants were observed in HCC samples in TCGA SMGs in ARID1A (23%), CREB3L3 (23%), GPATCH4 (23%), IL6ST (23%), LZTR1 (23%), RB1 (23%), ARID2 (15%), AZIN1 (15.4%), BAP1 (15.4%), CTNNB1 (15.4%), EEF1A1 (15.4%), CDKN2A (7.6%), ACVR2A (7.6%), KRAS (7.6%), and TP53 (7.6%). In agreement with our results, driver mutational events were observed in TCGA study at the level of LZTR1, EEF1A1, ARID2, AZIN1, GPATCH4, CREB3L3, and ACVR2A genes [29]. Shibata, in 2021, added that low frequency mutated driver genes may trigger specific oncogenic phenotypes such as aggressive growth, invasiveness, or metastasis [46]. Driver mutations in ARID1A and ARID2 which are involved in chromatin remodeling occur early in the stage of high-grade dysplastic nodules preceding the development of HCC. One of the key pathways in HCC included cell cycle regulatory pathways driven by mutations in RB1, CDKN2A [9]. We found no variants in ALB, Shibata mentioned the uncertain role of frequent mutations of ALB and APOB in HCC. He explained that both mutations were enriched in indels and both genes are highly expressed in hepatocytes, these indels could be caused by replication errors [47]. In agreement with his explanation, out of 15 variants in APOB in our study, 11 were indels in HCC and 8/14 variants in Non-HCC samples were frameshift caused by indels. Both ALB and APOB genes are key mediators of hepatocyte function in the secretion of albumin and VLDL. Their role requires a high portion of hepatocyte transcriptional, translational, and energy supplies. This might explain APOB mutation and suppression by the malignant hepatocyte to reserve resources for cell division requirements [46].

Analysis of variants regarding the Cancer Genome Landscape revealed the presence of Tier1 variants in TP53 and CDKN2A, both are driver genes in HCC [41], in addition to Tier 2 variant in MSH2, the DNA mismatch repair gene whose defects can contribute to cancer development. In their 2018 study, Zhu et al., highlighted the role of rs2303428 of MSH2 in HCC prognosis [48]. This is an intron variant that was observed in our study in both HCC (VAF 63%) and non-HCC (VAF 54%) samples of one patient HCV-HCC. It is mainly identified in constitutional germline dependent hereditary cancers and as a somatic mutation in some sporadic cancers. As we traced in our study exome, we filtered for pathogenic coding variants in addition to splice site variants. Further studies are needed to determine the impact on prognosis. In addition, a splice site variant Tier 1 was identified in ATM gene, and 2 Tier 2 variants frameshift in KMT2D in Non-HCC samples. These tumor suppressor genes represent driver mutations in HCC, previous studies revealed many low-frequency somatic mutations that affect multiple genes, including cell cycle control as ATM in 6% of HCC patients, and recurrent inactivating mutations of members of the chromatin remodeling gene family as KMT2D in 6% of HCC patients [27].

Data analysis of the identified KEGG pathways for somatic unique variants in HCC samples highlights mutated genes’ roles during tumorigenesis, revealing biological processes and pathways implicated in carcinogenesis.

Extracellular matrix ECM-receptor interaction (61%) and Focal adhesions (46%) are significantly enriched in our HCC samples. Focal adhesions are structural links between the extracellular matrix (ECM) and actin cytoskeleton. Integrins interact with multiple proteins at cell-ECM adhesion sites forming focal adhesions (FAs). These adhesions orchestrate significant cancer related functions including cell proliferation and survival, cell invasion and epithelial mesenchymal transition. Understanding the pathogenesis of tumor cell motility can pave the way to effective therapeutic targeting to prevent cancer progression [49]. Calcium signaling was enriched in 23% of HCC samples, this pathway is reported to be associated with liver-specific diseases such as HCC, cholestasis, hepatitis, and nonalcoholic fatty liver disease (NAFLD). Calcium signaling is coordinated by ~ 1600 genes, the ultimate role of these genes is to maintain the intracellular calcium homeostasis and normal cell function. Altered calcium signaling genes expression is implicated in cancer hallmarks, such as altered cell metabolism, sustained cell proliferation, cell death resistance, angiogenesis, invasion, and metastasis [50].

Pathways PIK3AKT and Wnt had the highest number of genes enriched (130 and 68, respectively) with PIK3AKT having a frequency 38.4% in HCC samples. Regarding PI3K-Akt, several studies documented mutations of the phosphatidylinositol 3-kinase (PI3K)/the serine-threonine protein kinase (Akt)/mammalian target of the rapamycin (mTOR) signaling pathway in HCC. The PI3K/Akt/mTOR pathway is an important signaling mechanism that regulates the cell cycle, proliferation, apoptosis, and metabolism. The pathway is often dysregulated in HCC which promotes the survival and proliferation of tumor cells [51].

Signaling pathway Notch is enriched in 23% of HCC patients. Mutation in target genes of the Notch pathway is reported in HCC and it is notable that the Notch pathway has controversial effects on HCC. This pathway crosstalks with the Wnt pathway for cancer stem cell (CSC) maintenance, the PI3K/mTOR pathways for HCC proliferation, and the VEGF pathway for angiogenesis [52].

Apelin and Wnt signaling pathways are enriched in 15% of HCC patients. Activated β-catenin activates the transcription of target genes modulating the process of carcinogenesis. These genes are involved in CSC maintenance, proliferation, and epithelial mesenchymal transition (EMT) [53]. The apelin signaling pathway is a G protein-coupled receptor (GPCR) pathway that plays a crucial role in regulating various physiological processes such as angiogenesis, and energy metabolism that are involved in carcinogenesis. The apelin signaling pathway interacts with other pathways as the Wnt/β-catenin pathway, the PI3K/Akt/mTOR pathway, and the Hippo pathway to regulate the development of HCC [54].

Signaling pathways Hippo, Hedgehog (Hh) and MAPK were revealed to be enriched in 3 HCC patients. Hippo is a classical kinase cascade that phosphorylates the Mst1/2-sav1 complex and activates the phosphorylation of the Lats1/2-mob1A/B complex for inactivating Yap and Taz. Yap/Taz is the main effector molecule, which is downstream of the Hippo pathway, and its abnormal activation is related to a variety of human cancers including HCC [55]. In adult healthy liver, Hh signaling is inactive, because mature hepatocytes hardly express Hh ligands. Hh signaling is reactivated in liver diseases, activation enhances transition of quiescent hepatic stellate cells (HSCs) to myofibroblast (MF), which regenerate the liver epithelial cells. The Hh pathway can be activated by inactivating mutations of PTCH1 [56]. The mitogen-activated protein kinase/extracellular signal-regulated kinase (MAPK-ERK) is one of the molecular signaling pathways that are critical to tumor initiation, progression, and metastasis in HCC. It is a cascade of protein kinases that transmits signals from the cell surface to the nucleus. In HCC, the MAPK/ERK signaling pathway is activated in more than 50% of human HCC cases [57].

In our study, we demonstrated the presence of different unique mutations in every patient between HCC and Non-HCC samples, and interpatient molecular differences between different studied patients. Tumor heterogeneity is a hallmark of hepatocellular carcinomas (HCCs), that poses a significant challenge to the development of effective therapeutic solutions in HCC. Inter-tumor heterogeneity that can be related to de novo independent carcinogenesis on a background of cirrhosis and/or intrahepatic metastasis. The presence of variants in Non-HCC can be attributed to either intrahepatic metastases which is reported in previous studies in 20 to 40% of patients, or de novo carcinogenesis on cirrhosis [9]. We explored gene variants in significant pathways, and we demonstrated the presence of considerable heterogeneity in different liver tissues HCC and Non-HCC in the same patient at the DNA sequence level. These patients belong to the same Geographic region but have different etiologies, however they shared most KEGG pathways, except for Hh pathway in HBV HCC. Somatic mutations in similar loci of genes that are clinically actionable even though of different sequence ontology may serve as important therapeutic targets [58].

Tier 1 and Tier 2 mutations

All variants in genes showing clinically actionable Tier1 and Tier 2 variants that represent driver mutations in HCC are involved in driver pathways for HCC which we discussed earlier PI3K/mTOR, Wnt, NOTCH, Hedgehog, and MAPK [52]. In addition, some are determined through cell cycle control, chromatin remodeling and oxidative stress to be involved in the process of carcinogenesis in HCC [28]. We extended the list in supplement files, to include variants within driver genes that are abundant on COSMIC and are targetable [24, 52]. Although frameshift variants represented only 1.6% of all variants in WES, they all lied within driver genes and genes implicated in the process of carcinogenesis [28] in our patients and represented the dominant sequence ontology among pathogenic variants. Some of them on region viewer or on Franklin were found to lie in homopolymers regions [43]. These variants in addition to a considerable number of medium VAF, high and medium confidence Tier 3 variants in every HCC patient require further assessment and consideration of their clinical significance.

Conclusion

Mutational signature was mostly found in S1, S5, S6, and S12 in HCC. Analysis of highly mutated genes revealed the presence of 10 common highly mutated genes in HCC and Non-HCC (AHNAK2, MUC6, MUC16, TTN, ZNF17, FLG, MUC12, OBSCN, PDE4DIP, MUC5b, and HYDIN). Among the 26 significantly mutated HCC genes identified by TCGA, APOB and RP1L1 showed the highest number of mutations in both HCC and Non-HCC tissues. Tier 1, Tier 2 variants in TCGA SMGs in HCC and Non-HCC (TP53, PIK3CA, CDKN2A, and BAP1). Cancer Genome Landscape analysis revealed multiple variants added (MSH2) in HCC and (KMT2D and ATM) in Non-HCC. For KEGG analysis, among the significantly annotated clusters in HCC were Notch signaling, Wnt signaling, PI3K-AKT pathway, Hippo signaling, Apelin signaling, Hedgehog (Hh) signaling, and MAPK signaling, in addition to ECM-receptor interaction, focal adhesion, and calcium signaling. Tier1 and Tier 2 variants KIT, KMT2D, NOTCH1, KMT2C, PIK3CA, KIT, SMARCA4, ATM, PTEN, MSH2, and PTCH1 low frequency variants in both HCC and Non-HCC. Further assessment and confirmation for these variants and other Tier 3 ones are recommended.