Background

Genome-wide association studies (GWAS) enable the simultaneous testing of thousands of genetic variants, usually SNPs, across the genome in order to find variants associated with a trait or a disease [1]. The GWAS methodology, so far, has generated many robust associations for various traits and diseases and has revolutionized our understanding of the genetic architecture of complex traits. With increasing sample sizes, new sequencing technologies and the accumulation of large biobanks it is expected that our ability to investigate the effects of human genetic variation in complex traits will increase in the near future [2]. In the first years of the development of the field, efforts were oriented towards the statistical aspects of the analysis [3], which involved thousands of SNPs simultaneously, including the methodology for multiple testing and quality control. This task was successful and enabled the discovery of associations replicated in subsequent studies, and in several cases, validated experimentally and functionally using a wide variety of methods [4]. However, it was soon clear that most variants discovered via GWAS have small overall effects on disease susceptibility [5]. Thus, it became evident that integrating data from multiple sources and developing reliable bioinformatics tools was a necessary step in order to address the complexity of the underlying genetic basis of common human diseases [5].

Soon after the publication of the first GWAS it also became evident that, at least theoretically, individuals could be identified in such cohorts even if only the summary statistics are available [6]. This led to imposing strict control access for sharing individual patients’ data (IPD) from GWAS. Subsequent works found that privacy attacks are possible in theory but unsuccessful and unconvincing in real practice. For instance, even sharing 1,000 SNPs for datasets with more than 500 individuals generally leads to a low power of the “attack” [7]. A more thorough investigation is given in [8]. In practice, however, not all studies share their data, at least when it comes to the studies published in the first decade of GWAS. It has been estimated that the proportion is only 13%, which increased from 3% in 2010 to 23% in 2017 [9]. On the contrary, researchers sharing their summary data has been shown to receive on average 81.8% more citations, an effect that probably is related, at least partially, to the usability of the data in downstream analyses [10]. Summary statistics do not only offer the additional protection of privacy, but also offer significant advantages in computational cost when using the data in downstream analyses, which does not scale with the number of participants in the study [11]. Thus, it is of no surprise that during the last years a large variety of methods have been developed to perform a so-called post-GWAS analysis using the summary results of a single study, or of several studies, and in most cases integrating data from other sources [11]. The majority of these methods use the summary data in the form of per-allele SNP effect sizes (log odds ratios or betas) along with their standard errors, or equivalently the z-scores (per-allele effect sizes divided by their standard errors). These methods seek to go a step further from the simple analysis, or re-analysis of a study, and aim to improve our understanding about the functional role of the identified variants [12]. The most important factors that played significant role in the development of such methods, in this so-called post-GWAS era, is the linkage disequilibrium (LD) information from a population reference panel such as HapMap or 1000 Genomes Project, the gene expression variation in the form of eQTL, and the integration of functional information on biological pathways [13,14,15].

The methods developed so far cover a broad range of different types of analysis, either in the study of a single trait or in the combined analysis of multiple traits. For a single trait, we may have methods for meta-analysis [16, 17], methods for inferring heritability [18, 19], gene-based tests [20], methods for Gene Set (or Pathway) Analysis [21], or methods for fine-mapping causal variants [22]. Regarding the analysis of multiple traits there is also a variety of methods [23], ranging from those that estimate the genetic correlation between traits [24], the joint analysis of multiple traits [25], or the methods that try to estimate causality between traits such as Mendelian Randomization [26], transcriptome-wide association studies [27], or colocalization [28]. Of course, the data standards [29] used to facilitate these analyses and the databases that the results are stored in, are also of great importance for the community.

In order to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics we performed a systematic review following the PRISMA guidelines [30]. We conducted a comprehensive search of the literature to identify relevant software tools and databases. We categorized the tools and databases by their functionality, in categories related to data, single-trait analysis, and multiple-trait analysis, along with their sub-categories mentioned in the previous paragraph. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a wide range of software tools and databases for GWAS summary statistics analysis, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of using GWAS summary statistics.

The systematic review

In order to collect all the available published papers, we performed a systematic review of the literature following the PRISMA guidelines [30]. The search was performed in PubMed (https://pubmed.ncbi.nlm.nih.gov) with the following query: ("Summary Statistics" OR "Summary Data" OR "Summary Association Statistics" OR "Summary Association Data") AND (GWAS OR genomewide OR genome-wide). The abstracts initially, and then the full articles were scrutinized in order to collect the necessary information. The inclusion criteria state that methods, software tools and databases, suitable for the analysis of GWAS summary data are suitable for inclusion. Methods papers that do not report software, or software pages not currently available are excluded. Additional searches were performed in the reference lists of the identified articles in order to identify additional studies that were missing. In many cases multiple articles regarding a single tool were found, so we kept only one. We decided to include reports deposited in preprint servers like medRxiv/bioRxiv, but some of these papers were eventually published in peer-review journals, so in such cases we retained only the latter reference. Tools regarding Polygenic Risk Scores (PRSs) and visualization were excluded. For all included tools we recorded the URL, the PMID, and the main functionality/es along with comments regarding its main methodological features. The initial search identified 2942 articles (22/12/2023).

In total we identified 305 tools and databases (Fig. 1). We classified them in three broad categories: data, tools for single traits and tools for multiple traits, along with the various sub-categories. The total breakdown is given in Table 1. Several tools may perform different tasks and thus they can be considered for more than one category; so, we classified them to the one most closely related to the primary goal of the analysis they claim to perform. Other tools do not fit exactly to the general description of the category, but we nevertheless classified them to the most similar one. The largest sub-category consists of the tools for pleiotropy analysis, whereas the smallest one is related to reconstruction of genotypes and effect sizes. Most tools are written in R (56.4%) with the largest proportion being in the multiple traits category, followed by Python (12.5%) and C/C +  + (8.2%) (Fig. 2). Apart from the publicly available databases only a handful of tools are offered as webservers (6.95%). Most of the tools were published after 2015 (Fig. 3). Nearly 60% of the tools and databases were published in: Bioinformatics, American Journal of Human Genetics, Nature Genetics, Nature Communications, Nucleic Acids Research and PloS Genetics (Fig. 4). In the following sections we proceed with the detailed description of the various tools identified, classified in the different categories and sub-categories. The complete list of identified tools along with the relevant information is given in Supplementary Table 1.

Fig. 1
figure 1

PRISMA Flow diagram for systematic review

Table 1 The broad categories and the sub-categories of tools and databases
Fig. 2
figure 2

Number of Tools and Databases included in the review Published Per Year

Fig. 3
figure 3

The programming languages used in the various categories of identified tools

Fig. 4
figure 4

The journals in which the studies including in the review were published

The data

Firstly, we are going to present the tools dedicated to the data themselves. We include here tools for quality control of GWA summary statistics, tools for imputation and genotype reconstruction as well as the publicly available databases of summary results.

Standards and quality control

The need for sharing and re-using GWAS summary statistics has been an issue for the community during the last years. Generally, it is acceptable that the minimum information (“mandatory”) contained in GWAS summary statistics should include: the chromosome and the base-pair location, the p-value of the association, the risk allele and the other allele, the risk allele frequency, and an estimate of the effect size (odds ratio or beta) along with its standard error [29]. Other important summary statistics that nevertheless termed as “encouraged” ones include the sample size, the variant ID, the rsID, the confidence interval of the effect size and so on. Such specifications were considered for the GWAS-SSF format [31], which was developed to meet the requirements settled by the community. GWAS-SSF consists of a tab-separated data file with well-defined fields and an accompanying metadata file. Most repositories and programs use some variant of the GWAS-SSF. However, such tabular formats in several cases lead to ambiguity or incomplete storage of information, or other times lack essential metadata. This leads to poor performance and increased risk of possible errors in downstream analyses. To address these issues, an adaptation of the well-known variant call format [32] was developed, capable of storing GWAS summary statistics which was called GWAS-VCF along with software tools to apply it in downstream analyses [33]. The VCF contains a file header with metadata and a main file containing variant-level (one locus per row with one or more alternative alleles/variants) and sample-level (one sample per column) information. This way, the VCF was adapted to include GWAS-specific metadata utilizing the sample column to store variant-trait association data. The GWAS-VCF is the standard used by the MRC-IEU OpenGWAS database [34] and it comes with appropriate tools to map GWAS summary statistics to VCF with on-the-fly harmonization (https://github.com/mrcieu/gwas2vcf).

Despite these efforts, not all available data are in line with the standards, especially when dealing with data from older studies. Thus, there is a need for additional tools to harmonize the data, as well as to identify and correct errors. Tools belonging to the former class were developed early and were focused mainly on harmonizing data in preparation of a meta-analysis. These include QCGWAS [35], GWAtoolbox [36] and EasyQC [37]. GEAR [38] is very interesting in that it incorporates ideas from population genetics which allow verification of the genetic origin and geographic location of each cohort and identifying significant sample overlap. More recent tools like MungeSumstats [39] and GWASlab [40] perform standardization and quality control handling the most common formats, SumStatsRehab [41] can be used for data validation, restoration of missing data, correction of errors or formatting, and GWASinspector [42] provides extensive QC reports and perform harmonization being compatible with recent reference panels and by handling insertion/deletion and multi-allelic variants. The latter class of methods, additionally, leverages information from the LD among SNPs. One such tool is GQS [43] which identifies suspicious regions and prevents erroneous interpretations by comparing the significance of the association for each SNP to its LD value for the reported index SNP. Similar functionalities are offered by DENTIST [44] which uses LD to detect and eliminate errors and disagreements between GWAS data and the LD reference panel. EXTminus23andMe [45] evaluates the quality of summary statistics after data removal and the suitability of the down sampled summary statistics for typical follow-up genetic analyses.

Databases

The publicly available biological databases played and continue to play a central role in bioinformatics and in biological research in general [46,47,48]. The same is the case for databases related to human research [49] and in particular those involved in GWAS [50]. The databases we identified can be roughly divided in two categories: databases that contain summary statistics from GWAS and databases that contain important secondary analyses on those data with some of the methods that we will describe in later sections.

Regarding the databases of the first category, NCBI’s dbGAP [51] was developed to contain the results of studies investigating the interaction of genotype and phenotype, which include GWAS. One of the dbGAP’s primary objectives was to house individual level GWAS data, but the database also contains summary data as well. Summary statistics are generally available to the public, whereas access to IPD requires varying levels of authorization. The NHGRI-EBI GWAS Catalog [52], which was established in 2008 is considered for years the central repository of GWAS summary statistics. It is a high-quality curated collection of all published GWAS and as of 2023–12-20, contains 6,680 publications, 566,798 top associations and 66,825 full summary statistics (Fig. 5). The database played an important role in the community efforts leading to the development of GWAS-SSF format. GWAScentral [53] previously known as the Human Genome Variation (HGV) database of Genotype-to-Phenotype information is a database that contains over 72.5 million P-values for over 5,000 studies, with over 7.4 million unique genetic markers involved in more than 1,700 unique phenotypes. The database contains data from several sources (including NHGRI-EBI GWAS Catalog, OpenGWAS, Japanese GWASdb, dbGaP, WTCCC and so on). The IEU MRC OpenGWAS [34] is a new addition and contains 346 million genetic associations from 50,037 GWAS summary datasets. It contains complete data from various consortia and the UK Biobank and comes with a lot of tools for harmonizing the data and storing them in the GWAS-VCF format. At the time of writing there are 4,126 binary traits, 725 metabolites, 3,371 proteins, 3,143 brain imaging phenotypes, and 3,217 other continuous phenotypes. In addition to the complete GWAS summary data, it also contains independent top hits for every dataset, totaling 116,918 independent signals in which 7,109 datasets have at least one hit. GeneATLAS [54] and GBE [55] contain associations from the UK Biobank cohort. GeneATLAS currently contains data for 452,264 individuals, 778 traits and 30 million variants, whereas GBE contains summary statistics from over 750,000 individuals combining data from the UK Biobank, the Million Veterans Program and the Biobank Japan. GTEx [56] and QTLbase [57] are the primary resources for xQTL data. The GTEx project has been expanded over time, and currently contains data of genetic associations for gene expression and splicing in 838 individuals in 49 tissues. QTLbase, similarly, contains genome-wide QTL summary statistics for many molecular traits across 95 tissue/cell types and multiple conditions. Contains tens of millions of significant genotype-molecular trait associations under different conditions. Other resources of this category, related to various large consortia (GIANT, WTCC, PGC etc.) as well as other biobanks (FinnGen etc.) can be found in Supplementary Table 2.

Fig. 5
figure 5

A snapshot of the data. A A view of the Type 2 Diabetes Mellitus studies deposited in NHGRI-EBI GWAS Catalog. B Type 2 Diabetes Mellitus studies contained in GWAS Central, depicting the significant hits in the chromosomes. C The SFF format

The second category contains databases of important secondary analyses performed on GWAS summary statistics with some of the methods that we describe in detail in later sections, such as gene-based tests, heritability analysis, TWAS, colocalization and so on. TSEA-DB [58] and PCGA [59] use information from gene-expression in various tissues to perform tissue or cell-type enrichment analysis of the GWAS association statistics. webTWAS [60] and COLOCdb [61] also use information on eQTL but in different fashion. webTWAS currently contains data for over 1,389 full GWAS for which it calculates the causal genes using single tissue expression imputation (using MetaXcan and FUSION), or cross-tissue expression imputation (using UTMOST). COLOCdb on the other hand is the most comprehensive colocalization analysis by integrating publicly available GWASs with different types of xQTL and different algorithms (COLOC, SMR). GWAS ATLAS [62] contains results of 4,756 GWAS from 473 unique studies across 3,302 unique traits accompanied by useful information obtained from downstream analysis. Each study is accompanied by MAGMA results (see also “gene-based tests”), SNP heritability estimation and genetic correlations with other traits in the database. GWASROCS [63], on the other hand, contains a large and comprehensive set of SNP-derived AUROCs and heritabilities. Currently includes 579 simulated populations (corresponding to 219 traits) and SNP data (odds ratio, risk allele frequency, and p-values) for 2,886 unique SNPs. Phenome-wide association studies (PheWAS) invert the idea of a GWAS by searching for phenotypes associated with specific variants across the range of thousands of human phenotypes, or the “phenome [64,65,66]. Thus, it is expected that a PheWAS will need large databases of GWAS results. PhenoScanner [67] is the most complete such database with publicly available results from over 65 billion associations and more than 150 million unique genetic variants. Similar functionalities are offered also by OpenGWAS, GWAS ATLAS and PheWAS Catalog [68]. Lastly, we need to mention LD Hub [69], a centralized database of publicly available GWAS results for 173 diseases/traits which offers a web interface that automates the LD score regression (LDSC) analysis pipeline (see also “Genetic correlation”).

Imputation and genotype reconstruction

Although some of the methods for quality control mentioned previously can correct errors and alter the data, the methods used for imputation go one step further. As expected, imputation methods were developed initially for individual data for handling studies genotyped with different platforms [70,71,72]. Such methods can infer missing genotypes using LD information from reference samples genotyped using denser arrays or sequencing. Genotype imputation increases the coverage of SNPs and thus can be used to increase statistical power, increase the accuracy of fine-mapping and harmonize the data in order to facilitate meta-analysis [70]. Several factors can influence the imputation accuracy: the sample size, the suitability of the reference panel for the particular sample, the genotyping chip and the allele frequency [71]. In general, however, these methods are time-consuming since they process individuals one at a time, and thus methods that impute directly the summary statistics were developed. These methods utilize only the information provided in the sample regarding the studied population (p-value, z-score or odds-ratio/beta) and require additional information regarding the LD structure. Nearly all methods perform a kind of multiple regression assuming the multivariate normal distribution for the test statistics and utilizing the theoretical result pointing that the correlation of such test statistics equals the correlation of the corresponding variables [73], that is the genotype correlation, available through the reference panel. Such methods include FAPI [74], ImpG [75], RAISS [76], DIST [77] and SSimp [78] with most of the differences lying in the choice of the reference panel and the exact details of the mathematical methods used to handle matrix inversions in the multivariate normal. DISSCO [79] uses a similar framework but allows for covariates. Such methods may perform poorly in cases where the sample has a different LD structure compared to the reference panel. Thus, extensions such as DISTMIX [80] and ARDISS [81] were developed to handle mixed ethnicity cohorts, improving the imputation performance. Adapt-Mix [82] estimates the correlation structure in both admixed and non-admixed individuals using simulated and real data and allows the use of this matrix with other imputation methods. Other methods such LS-meta [83] and LSimputing [84] offer additional advantages; LS-meta imputes both genetic and environmental components using information from additional omics-trait association summary data, whereas LSimputing implements a non-parametric method that allows for nonlinear SNP-trait associations and predictions in case a sample of IPD is available. Using the same principles, simGWAS [84] allows simulation of whole GWAS summary data, without generating individual data as an intermediate step.

Genotype reconstruction methods take a different approach. Given the summary statistics for a SNP (either directly measured or imputed), one can reconstruct the genotype counts that produced it. This will offer many advantages, since with the reconstructed genotypes the researchers could perform additional analyses using other statistical methods suitable for grouped data and test different hypotheses [85]. For instance, one can calculate grouped Polygenic Risk Scores (PRS) [85], perform logistic regression for grouped data [85, 86], perform multivariate meta-analysis [87], or implement robust tests for association that is expected to work better when the underlying model of inheritance deviates from the additive which is usually assumed [88, 89]. The details and the success of the reconstruction depend heavily on available summary statistics. As one can easily understand, p-values and z-scores cannot be used, and one must rely on available effect sizes such as the odds ratio (OR). When the OR, the standard error and the sample size is given, methods are available in epidemiology that allow the reconstruction of the allelic 2X2 table [90]. If z-scores, confidence intervals or p-values are available one can use them to obtain the standard error. React [85] uses an equivalent method relying on solving a system of nonlinear equations. If the allele frequency in one group (usually the controls) is also known, the allelic counts may easily be obtained with a simple calculation. In all cases the accuracy of the reconstruction may depend on the precision of the available summary statistics. After the allelic 2X2 table is reconstructed, it is straightforward to obtain the genotype counts, assuming HWE (which as one might expect adds another source of potential bias). MetaSustract [91] is a tool that recreates analytically the results of the validation cohort from meta-analysis summary statistics, allowing the researchers to compute meta-analysis summary statistics that are independent of the validation cohort, without requiring access to the IPD. Spkmt [92] works in similar fashion but in families; it can be used to derive the summary statistics of one parent from the data of the offspring and the other parent. Finally, we need to mention two tools that work in somewhat different modes. OATH [93] is used to reproduce reported results from a GWAS and recover underreported results from other alternative models with a different combination of nuisance parameters, whereas LMOR [94] performs transformations from the genetic effects estimated under the Linear Mixed Model to the Odds Ratio that only rely on summary statistics.

Analysis of a single trait

In this section we are going to present the various types of methods and tools dedicated to the analysis of a single trait. These include tools for meta-analysis, tools for the estimation of heritability, tools for implementing gene-based tests, gene set methods and fine mapping methods.

Meta-analysis

One of the most obvious uses of GWAS summary data is to combine them and perform a meta-analysis. Meta-analysis is the statistical procedure used to combine evidence from multiple studies in order to increase statistical power and it is a methodology widely used in medical research for decades [95]. A meta-analysis can be performed with various methods [16] using IPD or summary data; the former offers many advantages, but the latter is far more easy to be performed taking into account the various restrictions imposed on sharing GWAS IPD and the difficulties in the logistics of such a project [17]. Moreover, given the large samples usually encountered in GWAS it has been shown, both theoretically and empirically, that meta-analysis using summary statistics has the same efficiency as the joint analysis of IPD [96]. A compromise between these two extremes arises when a research group has access to individual-level genotype data of a limited sample size and wants to integrate these with existing summary data available in the databases. Such methods are in use in epidemiology for years [97] and several tools have been developed especially for handling GWAS data, for instance IGESS [98], metaGIM [99] and LEP [100]. PolyGIM [101] can be applied with or without IPD and uses polytomous logistic regression to investigate disease subtype heterogeneity in situations when only summary data is available.

Regarding summary-data meta-analysis of GWAS, the most commonly used methods includes standard methods, such as combining p-values, z-statistics or effects sizes like Odds Ratio (for binary traits) or mean differences (for continuous traits) using fixed or random effects models [16, 102]. These statistical methods are straightforward to implement, and are available in general purpose statistical packages such as STATA and R. However, there are several specialized tools that facilitate the process and provide integration with useful bioinformatics or visualization functions. Such widely used tools include METAL [103], GWAMA [104] and PLINK [105]. Other tools are oriented to more specialized cases offering advanced options. For instance, YAMAS performs meta-analysis including missing SNPs identified with LD without performing imputation [106] and rareMETALS [107] uses a partial correlation based score to perform meta-analysis in the presence of large amounts of missing values. There is also a class of tools which focus on the replication of GWAS and the combined analysis of data from primary and replication studies. Such tools include rfdr [108] and Jlfdr [109] which control for False Discovery Rate (FDR), Rrate [110], which determines the sample size of the replication study and checks the consistency between the primary and the replication study, and MAJAR [111] which jointly test prognostic and predictive effects in meta-analysis without the need of using an independent cohort. metaGAP [112] is an online tool for calculating the statistical power of a meta-analysis of GWAS (Fig. 6). METACARPA works with overlapping or related samples, even when details of the overlap or relatedness are unknown [113], MAGENTA [114] performs meta-analysis with gene set enrichment analysis (GSEA), whereas GWASmeta [115] and MetABF [116] work in a bayesian framework calculating the Approximate Bayes Factor (ABF). Other tools offer more advanced options such as meta-analysis with multiple traits (see also “multiple traits”), like nGWAMA [117], metaCCA [118], CPASSOC [119], metaUSAT [120] and CPBayes [114] (and its extension GCPBayes [121]), and others are designed for meta-analysis under different genetic models, like GWAR [89] which uses robust methods (like MIN2 or MAX) in order to handle the uncertainty in the underlying genetic model, or like the simulation tool [122] which implements an alternate strategy for the additive genetic model simulating data for the individual studies. Finally, we need to mention sPLINK [123] which performs privacy-aware GWAS on distributed datasets, and XPEB [124] which is an empirical Bayes approach designed to improve the power GWAS in minority populations by exploiting information from GWASs performed in populations of different origin.

Fig. 6
figure 6

Tools for meta-analysis. A GWASmeta (SMetABF) for performing Bayesian meta-analysis. B The MetaGAP power calculator. C GWAR for robust analysis and meta-analysis of GWAS

Inferring heritability

Heritability is generally defined as the fraction of phenotypic variation explained by genetic variation. Heritability is a dimensionless parameter of the population, and it was introduced by Sewall Wright and Ronald Fisher in the previous century. Traditionally, heritability is estimated using family-based designs such as twin studies. However, there are controversies regarding the various methodologies for estimation and interpretation of the results [125]. Despite all these, heritability is an important aspect of research in modern genetics, and regarding the prediction of disease risk from genomic data [126]. The technological advancements have facilitated the development of methods that use large samples of unrelated, or related, individuals. Thus, family-based designs using genomic data (trio-genome-wide complex trait analysis, and so on) have emerged. Such methods are discussed and compared in [127]. Of course, heritability can also be estimated via the results obtained in a traditional GWAS using unrelated individuals. The gap between these estimates and those obtained from classical heritability estimation methods has been termed the "missing heritability problem" and it is an important open question in current research [128]. Recent reviews of the methods that use GWAS data, are given in [18, 19] focusing on their modeling assumptions, their similarities, and their applicability.

One of the first and simplest methods to calculate heritability from allele frequency, odds ratio and prevalence of the disease was implemented in the SumVg package [129]. This method, however, utilizes only the significant SNPs. The same authors extended the method later in order to allow calculation using the z-statistics from the whole GWAS sample [130]. A disadvantage of this method is that LD is not taken care of, and highly correlated SNPs need to be filtered manually. AVENGEME [131] is a tool that treats causal effect sizes as fixed effects and models the genotypes as random correlated variables. HESS [132] which was presented later built upon the same ideas and can be viewed as a weighted sum of the squares of the projection of effect sizes onto the eigenvectors of the LD matrix at the particular locus, with weights inversely proportional to the corresponding eigenvalues. LD Score Regression (LDSC) has been frequently applied to summary statistics from GWAS and one of its functionalities is to estimate the SNP heritability of a trait [133]. LDER [134] extends LDSC making full use of the information from the LD matrix providing more accurate estimates, whereas s-LDSC [135] is an extension suitable for partitioning heritability. SumHer [136] presented later and offers the same functionalities, with the main difference being that it allows for different so called “heritability models”. According to these, a SNP with high MAF is expected to contribute more to the total heritability compared to one with low MAF, whereas on the other hand, a SNP in a region of low LD is expected to contribute more compared to one in a region of high LD. On the contrary, LDSC estimates are obtained by assuming that all SNPs contribute equally. HEELS [137] is a new tool using REML to produce accurate and precise local heritability estimates and RSS, is a multiple regression-based fine-mapping tool (see “Fine-mapping”), can also calculate SNP heritability from the regression model. VarExp [138] and GxESum [139] are methods for estimating the phenotypic variance explained by genome-wide gene-environment (GxE) interactions. There are also tools like GWIZ [63] and SummaryAUC [140] that calculate the Receivers Operator’s Characteristic (ROC) curve and the associated Area Under the Curve (AUC). GWIZ generates ROC curves and the AUC using simulations and then estimates heritability using the square of the Somers’ rank correlation D. SummaryAUC on the other hand approximates the AUC of a PRS and its variance. HAMSTA [141] is a tool that, among others, estimates heritability explained by local ancestry using data from admixture mapping studies. Estimating the Effect size distribution is also a related important concept. GENESIS [142] uses LD and a Likelihood-based approach to estimate effect-size distributions. It also allows predictions regarding yield of future GWAS with larger sample sizes. GWEHS [143] calculates the distribution of effect sizes of SNPs, as well as their contribution to trait heritability. Furthermore, it performs predictions for the change in the effect size as well as in the heritability when new variants are identified. FMR [144] is a method-of-moments for calculating the effect-size distribution and GWAS-Causal-Effects-Model [145] is a random effects model for estimating the causal variants and their effect size distribution. Finally, there are tools to implicate gene-expression in heritability analysis: MESC [146] which estimates the proportion of heritability mediated by gene expression levels using linkage disequilibrium (LD) scores and eQTL, and GCSC [147] which uses results from a TWAS (see “TWAS and Colocalization”) in the so-called gene co-regulation score regression, to identify gene sets enriched for disease heritability.

Gene-based tests

Historically, association tests are oriented towards single variants, and this was the case for both traditional association studies as well as for GWAS. However this approach has some limitations that were noted earlier and a call for a shift towards gene-based tests was made [148]. Gene-based tests aggregate individual variant associations within a gene, providing a more comprehensive assessment of the gene's overall contribution to a trait or disease. This approach helps prioritize genes with multiple associated variants, enhancing the biological relevance of findings, and it has proven to be useful particularly in case of low frequency variants [148]. There are plenty of different methods for combining the association statistics or p-values within a gene, ranging from simple Fisher’s method or the minimum p-value approach, to more advanced methods like the Burden Test (BT) [149] or quadratic tests like SKAT [150] with variations in power [151]. Nevertheless, there is a consensus regarding the importance of incorporating LD information of the nearby variants into the methods for controlling the type I error rate at the desired level [20].

VEGAS, GATES, fastBAT and GCTA are among the oldest tools available for summary data, which remain efficient and widely used. SKAT (Sequence Kernel Association Test) is a well-known regression method for testing association between variants and traits adjusting for covariates. As a score-based variance-component test, it calculates p-values analytically by fitting the null model containing only the covariates [150]. The original SKAT method uses only IPD, but later implementations like metaSKAT or SKAT-O have been extended to handle summary data. GCTA and VEGAS also use the multivariate normal framework adjusting the estimates for LD using a reference panel [152, 153]. Of note, GCTA also offers methods for conditional analysis (see “Fine mapping”), and same also holds for KGG [154], whereas VEGAS’s new version allows for mixed ethnicity populations. GATES [155], on the other hand, uses an extended Simes procedure that integrates functional information and association evidence to combine p-values, whereas fastBAT [156] offers fast analytical p-value computations. The gene analysis in MAGMA (Multi-marker Analysis of GenoMic Annotation) is based on a multiple linear principal components’ regression model to account for LD and uses an F-test to compute the overall gene p-value [157]. Its extension, nMAGMA, extends the lists of genes that can be annotated by integrating local signals, long-range regulation signals, and tissue-specific gene networks. It also provides tissue-specific risk signals, which are useful for understanding disorders with multi-tissue origins [158]. H-MAGMA [159] and eMAGMA [160] are two other extensions. The former integrates 3D chromatin configuration, whereas the latter leverages significant tissue-specific cis-eQTL information to assign SNPs to putative genes. EPIC [161] and GAMBIT [162] also utilize functional data for gene-based analysis; the former using cell-type-specific gene expression data obtained from single-cell RNA sequencing and the latter using coding and tissue-specific regulatory annotations. Such methods share several features in common with TWAS methods (see respective section). AgglomerativLD [163] also captures LD between SNPs of nearby genes, which induces correlation of the gene-based test statistics. DOT [164] is one of the few methods that applies a decorrelation-based approach before combining SNP-level statistics or p-values. Tools like GPA [165], oTFisher [166], TS [167] and aSPU [168] implement some type of so-called adaptive tests (AT), that is, they account for possibly varying association patterns across SNPs, whereas some modern tools like MKATR [169], COMBAT [170], MCA [171], OWC [172], FST [173], ACAT [174], HYST [175], GBJ [176] and sumFREGAT [177] perform analysis with multiple statistical methods and test and combine the results. Notably, tools like aSPU [168], snpGeneSets [178], Pascal/PascalX [179, 180], MAGMA, chromMAGMA [181] and FUMA [182], also offer the option of performing gene-set analysis after performing the gene-based analysis (see next section), whereas HSVS-M [168, 183] tests the association of a gene with multiple correlated traits.

Gene Set analysis

Gene set analysis (GSA), or Pathway Analysis, extends the concept of gene-based methods by jointly analyzing groups of functionally related genes and identifying biological pathways enriched with trait-associated genes. By considering the collective impact of multiple genes within a pathway, researchers can obtain a clearer picture of the underlying biological mechanisms influencing the phenotype under investigation. The first applications of such methods borrowed ideas from the microarray data analysis literature, and since then they became widespread in analysis of GWAS [184]. Any GSA method needs to address some issues. Firstly, how to handle SNPs of the same gene; secondly, how to define the appropriate gene-set or pathway, and finally how to combine the effects from multiple SNPs/genes within the same set/pathway [185]. Thus, the choices made by different methods can be very diverse leading to a wide variety of different approaches. For instance, some methods operate with SNP-level statistics (effect sizes, z, or p-values) assigning the SNP to the closest gene (usually within a range of ± 20 K bases), whereas others take as input a gene-level statistic or simply a gene list obtained by a gene-based method (of course, several tools allow for both a gene-based and a GSA approach). Regarding the choice of set there is a plethora of databases containing biological pathways (KEGG, PANTHER etc.), or other types of gene-set representation like PPI interactions, ontologies and so on [186]. Finally, regarding the statistical method used to aggregate evidence there is also a wide range of different methods that handle with different approaches the gene set size and gene length, the LD patterns and the presence of overlapping genes within pathways, or apply different statistical approaches such as those using the so-called competitive null hypothesis, or those using the self-containing one [14, 187]. A tutorial regarding the use of such methods is given in [21].

Among the most easily used and frequently cited are the tools that utilize a webserver. FUMA [182] and iGSE4GWAS [188] are tools specialized in GWAS and use SNP-level statistics as inputs, differing in the subsequent analyses: FUMA uses MAGMA for gene-based testing and allows for ORA and Kologorov-Smirnov test (GSEA), whereas iGSE4GWAS maps the most significant SNP to a gene and then performs an improved GSEA with label permutation to obtain accurate p-values. Tools like Enrichr [189], g:Profiler [190], DAVID [191], WebGestalt [192] and PANTHER [193] are general purpose enrichment tools that provide functionalities for different types of omics data (Fig. 7). They accept gene or SNP-list as input and provide Application Programming Interface (API) ensuring interoperability, whereas for the statistical analysis they all use some version of ORA and/or GSEA (WebGestalt also uses Network Topology-based Analysis). A major feature of these tools is that they incorporate a large number of biological and pathway databases, with g:Profiler and Enrichr offering the most complete collection. GSA-SNP2 is one of the first methods to be developed for GWAS and has seen several improvements regarding the calculation of the combined gene score and the execution time, being among the fastest methods [194]. aSPUpath2 [195] and GIGSEA [196] are two methods that integrate expression data (eQTL) in the pathway analysis. The former uses an adaptive test that extends the aSPU methodology based on chi-square, whereas the latter uses a regression-based approach coupled with permutations to calculate accurate p-values. In a similar fashion, deTS [197] and PGCA perform tissue-specific enrichment analysis (TSEA) for detecting tissue-specific genes and for enrichment test of different forms of query data. Other methods use different definitions of the gene-sets, in some cases utilizing additional information. For instance, dmGWAS [198] integrates PPI networks and uses a search method to identify subnetworks. Compared with standard pathway methods it offers to the users the flexibility in the definition of a gene set and can utilize local PPI information. GEMB [199] defines the gene-sets using gene weights from model predictions and gene ranks from GWAS, and GENOMICper [200] uses permutations of the identified SNPs by rotation with respect to the genomic locations. GWAB [201] uses network connections to reprioritize candidate genes by integrating the GWAS and network data, whereas GenToS [202] searches for trait-associated variants in existing human GWAS. We also need to mention PAPA [203] which is a flexible tool for pleiotropic pathway analysis. As we already mentioned, aSPU, snpGeneSets, PascalX/PASCAL and MAGMA/chromMAGMA are gene-based methods that also perform GSA, whereas MAGENTA is a tool that performs meta-analysis and subsequently GSA (see “meta-analysis”). Lastly, we need to mention Inferno [204] and Mergeomics [205] which are webservers offering a variety of options, extending typical GSA applications. Inferno integrates a variety of functional genomics sources to identify causal noncoding variants using COLOC, WebGestalt, LDSC and MetaXcan. Mergeomics uses summary statistics of multi-omics association studies (GWAS, EWAS, TWAS, PWAS, etc.) and performs correction for LD, GSEA, meta-analysis and identification of regulators of disease-associated pathways and networks.

Fig. 7
figure 7

Enrichment. A Summary view in g:Profiler of the significant SNPs for Type 2 Diabetes Mellitus. B Enrichr results for the same set. C Output of GWAB for Type 2 Diabetes Mellitus SNPs. D Detailed results from g:Profiler

Fine-mapping

While GWAS can identify broad genomic regions associated with the trait, it doesn't pinpoint the exact causal variant within those regions. Fine mapping, working in the opposite direction of that of the gene-based approaches, is a process aimed at narrowing down and identifying causal variants, that is the specific genetic variants responsible for the observed associations between genomic regions and traits of interest. The plethora of statistical methods and study designs makes it difficult to choose an optimal approach. The different approaches that have been proposed to perform fine-mapping can be divided in three broad categories: heuristic methods that select SNPs based on LD patterns, conditional or penalized regression models that perform variable selection, and Bayesian methods that calculate posterior probabilities or Bayes Factors. Based on theoretical and empirical evidence it seems that Bayesian methods have superior performance [22]. Several factors may influence the performance of fine-mapping approaches, including the true number of causal SNPs in a region and their effect sizes, the local LD structure, the sample size, and the SNP density [22, 206]. Functional annotations are also of great importance leading to the so-called functionally informed fine-mapping (FIFM) methods [206]. The hypothesis of a single causal variant is also very restrictive, and several methods have been developed to allow multiple causal variants in a region as well as to incorporate additional layers of functional annotations, like eQTL [207]. Moreover, methods for fine-mapping of multiple datasets have been proposed, either exploiting different LD patterns across ethnic groups or borrowing information between different traits [207].

As we already noted Bayesian methods seem to have superior performance [22] and thus it is of no surprise that most of the currently available methods operate in a Bayesian framework calculating Posterior Inclusion Probabilities (PIP) and/or Bayes Factors (BFs) in various settings: PAINTOR [208], DAP [209], fgwas [210], FINEMAP [211], flashfm [212], FINMOM [213], CARMA [214] and CAVIAR/CAVIARBF [215]. MsCAVIAR [216] is an extension of the latter method leveraging information from multiple studies, useful in trans-ethnic fine mapping. Similarly, XMAP [217] performs cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. BEATRICE [218] is a unique method that combines a hierarchical Bayesian model with a deep learning-based inference procedure, whereas RIVIERA-beta [219] performs Bayesian fine-mapping using Epigenomic Reference Annotation. On a different level, PolyFun/PolyLoc [220] do not perform fine-mapping per se but are used for estimating the prior causal probabilities of SNPs, which can then be used by other Bayesian fine-mapping methods. SusieR [221], BVS-PICA [222] and JAM [223], operate also in a Bayesian regression framework performing variable selection and penalized regression. Other regression-based methods, like SOJO [224] and ANNORE [225] work in a frequentist framework and perform lasso-type and differential shrinkage via random effects, respectively, whereas GSR utilizes a gene score regression approach [226] and RSS performs multiple regression utilizing the so-called summary statistics likelihood [227]. AHIUT [228] performs an intersection–union test based on a joint/conditional regression model with all the SNPs in a region. Lastly, we need to mention PICS2 [229], which performs probabilistic identification of causal SNPs and is the only of the methods that is available as a web-server, and echocolatoR [230] which requires minimal input from users and integrates a suite of fine-mapping tools to identify consensus variants, test enrichment and visualize the results.

Analysis of multiple traits

In this section we analyze methods developed for handling multiple traits. Depending on the type of data and the purpose of the analysis the methods can be divided into pleiotropy methods, methods that calculate the genetic correlation, methods for mendelian randomization, transcriptome-wide association and colocalization methods.

Pleiotropy

Pleiotropy is the phenomenon in which a single variant influences several traits [231]. Such methods are of great importance in genetic research and several methods have been developed during the last years. A major goal of such methods is to increase the statistical power over single trait methods. Imagine for instance a variant that produces a near-significant effect when analyzed separately for two or three traits. A method that can combine these estimates may produce significant results. Another application of a joint analysis would be to identify variants that influence both traits, or variants that influence only one of them. When all the relevant variants are considered, one can also estimate the kind of relationship between the traits (see “genetic correlation”). A review of the statistical methods to detect pleiotropy in complex traits can be found in [25]. Usually, the methods that allow for multiple trait analysis are oriented toward quantitative traits like BMI, SBP, DBP and so on, that traditionally are measured on a single cohort, resulting in the existence of cross-trait correlation that needs to be taken into account in the analysis. However, there are also methods for performing the same analysis with summary estimates derived from different cohorts, as well as methods that allow for binary traits with the case–control design, using overlapped or non-overlapped controls.

All methods base their inference on the assumption that the z-statistics follow a multivariate normal distribution (MVN) and perform different types of tests and/or different procedures to estimate or approximate the correlation structure. ACA [232] one of the first methods, estimates the traits covariance from a subset of the phenotypic data or from published studies, p_ACT [233] integrates the MVN using the trait correlation, PAT [234] uses a likelihood-ratio test, and PLEI [235] uses the union-intersection testing method, but in addition to the likelihood ratio test, it also applies generalized estimating equations under the working independence model; it can be applied for both marginal analysis and conditional analysis. USAT [236] uses a score-based test, JaSPU [237] uses an adaptive test which is robust to violations of the MVN assumptions and MTAR [238] uses a Principal Components (PC)-based test. BMASS [239] on the other hand is a Bayesian multivariate method, whereas TWT [240], MTAFS [241] and EBMMT [242], which are among the newer tools, perform a Cauchy Combined Test (CCT) to handle the correlation structure and obtain accurate p-values. SHAHER [243] uses a linear combination of traits by maximizing the proportion of its genetic variance explained by the shared variants and allows both shared and unshared variants to be effectively analyzed and HIPO [244] performs heritability-informed power optimization for conducting multi-trait association analysis. HOPS [245] computes a horizontal pleiotropy score by removing correlations between traits caused by vertical pleiotropy and normalizing effect sizes across all traits and PDR [246] performs a pleiotropic decomposition regression to identify shared components and their underlying genetic variants. We also need to mention methods like MTAG [247] and PLEIO [248] which use LDSC and apart from sample overlap also allow data from multiple studies, something that can be considered meta-analysis and methods like MSKAT [249], multiSKAT [250], MGAS [251], MAIUP [252] and MTAR (multi-trait analysis of rare variants) [253] which are gene-based methods specialized for multiple traits. Finally, methods like iMAP [254] and graphGPA2 [255] use graphical models and are capable of performing analysis of large number of traits.

On the other hand, there are several methods that assume independence of the studied samples. Most of them are designed for larger analyses of many traits from multiple studies, for instance PolarMorphism [256], JASS [257], gwas-pw [258] and FactorGo [259], sumDAG [260], combGWAS [261] and GCPBayes pipeline [262]. GCPBayes_pipeline uses the functionality of GCPBayes to perform cross-phenotype gene-set analysis between two traits. gwas-pw is used for the joint analysis of two GWAS in order to identify variants influencing both traits. PolarMorphism is based on a transform from Cartesian to polar coordinates and reports a per variant degree of 'sharedness' across traits, whereas FactorGo provides scalable variational factor analysis model that is computationally efficient for large number of traits. JASS provides interactive exploration and visualization of the results of comparison of many traits through a web interface (Fig. 8 A-C), sumDAG goes one step further and constructs phenotype networks by using a Gaussian linear model and a directed acyclic graph, and combGWAS identifies susceptibility variants for comorbid disorders and calculate genetic correlations. EPS [263] and GPA [264] differ in integrating Pleiotropy and functional annotation from eQTL.

Fig. 8
figure 8

Analysis of multiple traits. A JASS analysis for Type 2 Diabetes Mellitus (T2DM), Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (SBP), indicating the pairwise genetic correlations between the traits. B Manhattan Plot from JASS for the combined analysis of the three traits. C Pairwise analysis of the SNPs identified as significant in the univariate analysis and in the combined analysis. D Two-sample Mendelian Randomization analysis for the association of SBP and T2DM obtained by MR-BASE

Genetic correlation

Genetic correlation is related to pleiotropy and describes the relationship between two traits, that is, the extent to which the genetic variants influencing one trait overlap with the genetic variants associated with the other. It thus can quantify the overall genetic similarity and provide insights into the polygenic genetic architecture of complex traits [23]. As we already saw, analyzing simultaneously multiple traits may increase power in case of horizontal pleiotropy; an additional potential application is to use the estimated correlation in order to establish causality between traits in case of vertical pleiotropy (see also next sections). Since heritability is the proportion of the phenotypic variance explained by genotypic variation it is of no surprise that genetic correlation (or, the genetic covariance) is related to the traits’ heritabilities. Thus, several of the methods for estimating heritability discussed earlier, like HESS and SumHer can also calculate the correlation between traits. The most commonly used method, however, for calculating genetic correlation is LDSC (LD Score Regression). The method originally developed for distinguishing polygenicity from bias by examining the relationship between test statistics and LD score, but it is also used for estimating heritability and genetic correlation [133]. LDSC is also available through the LD Hub server. PCGC-s [265] is an adaptation of stratified LDSC for case–control studies and can also estimate genetic heritability, genetic correlation, and functional enrichment. Another popular tool is GNOVA [266] which calculates annotation-stratified covariance using the method of moments and allows for sample overlap. Its extension, SUPERGNOVA [267] identifies global and local genetic correlations that could provide new insights into the shared genetic basis of many phenotypes. Local correlations, among others, can be also computed using LAVA [268]. HDL [269] is a likelihood-based method which produces more precise estimates. A recent comparison found that LDSC and GNOVA are more similar and robust to LD and sample overlap compared to HDL. HDL provides biased estimates of the genetic covariance in most cases and could not distinguish genetic from non-genetic correlation. Moreover, HDL restricts the users to using the built-in reference panel, and its performs poorly when the number of shared SNPs between reference panel and GWAS is small [24]. Other tools provide somewhat different types of analyses. For instance Popcorn [270] estimates transethnic genetic correlation, GECKO [271] estimates both genetic and environmental covariances, PhenoSD [272] uses LDSC for estimating phenotypic correlations and then performs correction for multiple testing using the spectral decomposition of matrices, whereas LPM [273] is a latent probit model scalable to hundreds of annotations and phenotypes that integrates functional annotations. ccGWAS [274] is a tool for comparing two different disorders with small genetic correlation providing a case-case association test, and RHOGE [275] estimates the genetic correlation between two traits as a function of predicted gene expression effect. LOGOdetect [276] uses scan statistics with an LD score-weighted inner product of local z-scores to identify small segments that harbor local genetic correlation between two traits. DONUTS [277] is a unique method since it operates on summary statistics from families.

Mendelian randomization

Mendelian Randomization (MR) is a method suggested in the pre-GWAS era to investigate causal relationships between two traits, usually a phenotype and a disease [278] using genotype–trait associations to make inferences about environmentally modifiable causes of the traits. In technical terms, MR uses genetic variants as instrumental variables [279] to mimic the random assignment of exposures in a randomized controlled trial, similar to the way Mendel's laws of inheritance dictate the random assortment of alleles during gamete formation. By utilizing the natural randomization of genetic inheritance, MR aims to minimize biases introduced by confounding factors that usually affect observational studies when investigating the association of two traits. Usually, we are interested in a disease and some other intermediate phenotype, or another disease. For instance, the MR approach may involve the relationship between hypertension and BMI, or between hypertension and diabetes. Traditionally MR was performed with one sample (1SMR) using a single variant (usually referred to IPD methods), and subsequently multivariate methods for MR meta-analysis were developed [280]. With the emergence of GWAS these methods evolved to the most commonly used two-sample MR (2SMR) methods that utilize summary data estimates from several variants regarding the genotype–phenotype and genotype-disease association from different samples [26, 281]. To establish connection with the previous sections, MR seeks to analyze correlated traits [282] and to provide evidence for causation, in other words to distinguish vertical from horizontal pleiotropy.

Several standard methods for MR in GWAS with summary data have been made available during the last years: the inverse-variance weighted method (IVW), the various types of median estimators (simple or weighted) and the MR-Egger regression approach. IVW gives consistent estimates only if all the genetic variants in the analysis are valid instruments. The median estimator is consistent even when up to 50% of the information comes from invalid instrumental variables, whereas MR-Egger performs equally well but provides somewhat less precise estimates [283]. These methods are readily available in standard packages like TwoSampleMR [284] and MR [285]. The functionalities of TwoSampleMR are also offered, at least partially, through the webserver of MRBASE [284], which is the only method available as such (see Fig. 8, D). BWMR [286] is a tool that performs MR in a Bayesian framework. Besides the issue of weak instruments which is of importance, most modern methods also aim to perform the MR analysis accounting or correcting for horizontal pleiotropy. For instance, pIVW [287] is an extension of the IVW that accounts simultaneously for weak instruments and balanced horizontal pleiotropy and MRmix [288] uses a mixture approach allowing a fraction of the instruments to have pleiotropic effect on the outcome. Similarly, MRcML [289], MR-LDP [290], MR-Corr2 [291] and MR-PRESSO [292] provide functionalities to account for horizontal pleiotropy, whereas IMRP [293] takes a different approach and searches iteratively for horizontally pleiotropic variants and causal effects. MR-APSS [294] differs in that it performs MR accounting for both pleiotropy and sample structure which seems to be another important confounder (and includes population stratification, cryptic relatedness, and sample overlap); MRlap [295] considers both weak instrument bias and winner's curse, accounting for sample overlap. MR.CUE [296] and TS_LMM [297] offer additional functionality for handling variability of the estimates. LCV [298] is a method that estimates causal associations between traits avoiding confounding by genetic correlation, whereas OMR [299] uses information from all GWAS SNPs for causal inference and JAM-MR [300] performs variable selection and causal effect estimation in MR. CS [301], BiDirectCausal [302], MRCI [303] and LHC-MR [304] constitute another important class of methods since they can identify bidirectional causal effects. Another important extension is offered by methods like MR2 [305], MV-MR [306], MRBEE [307], MVMR-cML [308] and adOMICs [309] which extend the MR framework in the multivariate setting allowing more than one exposures or outcomes, as well as MR-BMA [310] which go one step further performing multivariate MR in a Bayesian framework. Finally, other methods like hJAM [311], MR.RAPS [312] and MRPEA [313] offer more advanced options. hJAM unifies the framework of MR and TWAS and can be applied to correlated instruments and multiple intermediates, MR.RAPS uses a three-sample genome-wide design with many independent genetic instruments across the genome to handle many weak genetic instruments and pleiotropy, whereas MRPEA uses pathway association MR analysis approach using data of environmental exposures.

Colocalization and TWAS

As we already described, the MR approach involves the combination of two types of data, a genotype-disease association, and a genotype–phenotype association. If the phenotype involves gene-expression, that is the result of an eQTL study, then we have two distinct but fundamentally related methods, the Transcriptome-wide association study (TWAS) and the colocalization approach (Fig. 9). TWAS is based on the idea that genetic variants can influence gene expression, which subsequently can affect complex traits or diseases [27]. Thus, the approach uses information from eQTL to identify associations between predicted gene expression levels and complex traits/diseases [314]. Even though there are several different methods, the resemblance to MR is obvious; in fact several methods like SMR that uses a single variant [315], GSMR that uses multiple variants [310], and PMR [316] which can account for correlated instruments, horizontal pleiotropy, and can accommodate both single traits and multiple correlated outcomes, all use the term MR, whereas the authors of TScML [317], which uses two-stage constrained maximum likelihood, which is an extension of 2SLS, explicitly state that can be used for both MR and TWAS analyses. FUSION and S-PrediXcan are the oldest and most widely known methods. FUSION is the current implementation of the first TWAS method [318], whereas S-PrediXcan [319] is the summary-data version of PrediXcan. Xu et al. [320] noted that PrediXcan and TWAS can be viewed as a special case of general association testing with multiple SNPs in a GLM and proposed the so-called sum of powered score (SPU) test implemented in aSPU-TWAS [320]. A subsequent evaluation has shown that the original TWAS statistic is equivalent to an LD-aware version of standard MR [321]. iFunMed [322] and sMIST [323] formulate the problem within the framework of mediator analysis, and similarly PTWAS [324] applies principles from instrumental variables analysis. Comm-S* [325] uses a variational Bayesian EM algorithm and a likelihood ratio test to assess expression-trait association. Its extension Tiss-Comm [326] leverages the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model and also detects the tissue-specific role of candidate target genes in complex traits. Similar multi-tissue approaches are followed by fQTL [327], sCCA [328] and UTMOST [329]. Primo [330], and OPERA [331] extend further the integration by allowing different types of xQTL data (eQTL, pQTL, mQTL etc.) to allow estimation under different conditions, whereas SUMMIT [332] uses a large eQTL summary-level dataset, penalized regression and Cauchy Combination Test and HMAT [333] aggregates TWAS association tests obtained across multiple gene expression prediction models using the harmonic mean P-value combination (HMP). BGW [334] and ARCHIE [335] are two methods that utilize trans-regulated eQTLs. Other tools use combination of methods, like TIGAR [336] which combines DPR and PrediXcan, whereas others, like JEPEGMIX2‐P [337] or FOCUS [338], perform TWAS using pathway information, or use LD to perform fine-mapping over the gene–trait association signals obtained from TWAS, respectively. Even though the various methods discussed here have different modeling assumptions and many were initially developed to answer different biological questions, a recent technical review of the TWAS methods showed that all can be viewed as versions of the two-sample MR analysis [339]. Indeed, several recent tools like MRLocus [340], TWMR [341], and Mr.MtRobin [342] make explicit use of the MR methodology and jargon in order to perform a sophisticated TWAS. MRLocus performs first a colocalization step to each nearly-LD-independent eQTL, and then performs an MR analysis step across eQTLs. TWMR performs a multi-gene multi-instrument MR approach to identify genes whose expression influence the phenotype. Finally, Mr.MtRobin uses multi-tissue eQTL and a reverse regression random slope mixed model to infer whether a gene is associated with a complex trait. As we have already noticed, webTWAS, apart from the database, also offers a webserver for accessing S-PrediXcan, SMR and UTMOST with user supplied datasets.

Fig. 9
figure 9

Incorporation of eQTL data. A Overview of the gene-expression patterns in T2DM obtained by PCGA. B Top associated tissues and cells for T2DM (PCGA). C An example of colocalization output perform by LocusFocus. D TSEA-DB view of the analysis of significant SNPs involved in T2DM. E Heat-map for the tissues involved in T2DM significant hits obtained by COLOC. F Plots of the genome-wide significant hits obtained from GWAS and eQTL (COLOC). G Heat-map for the tissues involved in T2DM (TSEA-DB). H Example of fine-mapping regarding a SNP indicated in T2DM obtained by PICS2

Another method that also uses GWAS results along with eQTL data is colocalization. Colocalization approaches are used to assess whether two different traits or diseases share a common causal genetic variant or set of variants at a specific genomic locus [13]. Colocalization analysis identifies genetic variants that show significant association in both GWAS and eQTL studies. However, unlike TWAS, it does not perform gene expression prediction and gene-trait association tests, but it focuses on the colocalized SNPs [28]. TWAS and colocalization are related approaches but not identical, since it has been shown that may give different results under different conditions (for instance in case of horizontal pleiotropy) and thus it has been suggested that they should be used complementary [28, 343]. COLOC was one of the first methods for colocalization and has seen several improvements [344, 345] (see also Fig. 9). The latest version uses SuSiE and allows evidence for association at multiple causal variants to be evaluated simultaneously, while at the same time separating the statistical support for each variant conditional on the causal signal being considered. MOLOC [346] is multiple-trait version of COLOC, operating in a Bayesian framework that integrates GWAS summary data with multiple xQTL data to identify regulatory effects, HyPrColoc [347] is a deterministic Bayesian method that detects colocalization across large numbers of traits, and SS2 [348] operates across any number of gene-tissue pairs allowing for sample overlap. LLR [349] works for colocalizing genetic risk variants in multiple GWAS and phenotypes, whereas POEMcoloc [350] is an approximation to the COLOC method that can be applied when limited data are available. SparkINFERNO [351], PwCoCo [352] and ColocQuiaL [353] are pipelines offering additional functionalities, all using COLOC. eCAVIAR is another popular method [354] that uses a probabilistic model that accounts for more than one causal variant at a given locus. MSG [355] increases the power using a spliced gene approach and SharePro [356] integrates LD modeling and colocalization assessment to account for multiple causal variants in colocalization analysis. PESCA [357] uses estimates of LD that are ancestry-matched, in order to infer proportions of population-specific and shared causal variants in two populations. These estimates are then used as priors in an empirical Bayes framework for colocalization and test for enrichment of these causal variants in loci of interest. Lastly, we have to mention the methods that operate as webservers offering ease of use. Sherlock [358] which is also one of the oldest methods, uses a database of eQTL associations from different tissues to identify genetic signatures that match those for specific genes. Unlike other methods it incorporates information from both cis- and trans- eQTL SNPs. LocusFocus [359] is a web-based colocalization tool that tests colocalization using the Simple Sum method to identify relevant genes and tissues for a particular GWAS locus in the presence of high linkage disequilibrium and/or allelic heterogeneity. Regarding the analysis of eQTL data, ezQTL [360] is a webserver performing various tasks like data quality control for variants matched between different datasets, LD visualization, and colocalization analysis using eCAVIAR and HyPrColoc, whereas BAGEA [361] uses a variational Bayes framework to model cis-eQTLs using directed and undirected genomic annotations.

Conclusions

Summary statistics offer protection of privacy over IPD, as well as significant advantages in computational cost, which does not scale with the number of individuals in the study [11]. Naturally, in the post-GWAS era it is expected that a large number of methods would be developed to perform analysis using the summary results of GWAS [11]. The particular methods, integrating data from multiple sources such as LD, gene expression and biological pathways, aim to provide biological insight and improve our understanding about the functional role of identified variants [12,13,14,15]. One thing which we should emphasize is the fact that GWAS summary statistics are not mere replacements for IPD. Of course, some types of analysis can be applied using both summary data or IPD, like meta-analysis, heritability analysis, fine-mapping and so on. In such cases the summary data methods greatly enhance the applicability and the ease of use overcoming the limitations of IPD mentioned earlier. However, methods for other types of analysis, and particularly those that use multiple datasets, like TWAS, colocalization or Mendelian Randomization were designed having in mind the summary data and the integration of data from multiple sources. This is exactly the spirit of the so-called post-GWAS analysis that brought bioinformatics into a central role in genetics research [11]. Most of the “success stories” in GWAS during the last years can be attributed to the development and the application of such methods in identifying new variants, in functional annotation, causal discovery or even in medical applications [2, 12, 362].

In this work we conducted, for the first time in the literature, a systematic review in order to identify software tools and databases dedicated to GWAS summary data analysis. We categorized the tools and databases by their functionality, in categories related to data, single-trait analysis, and multiple-trait analysis, along with their sub-categories which we analyzed and reviewed. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a wide range of tools, each with unique strengths and limitations. We provided descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discussed the overall usability and applicability of each tool for different research scenarios. We identified families of related tools for performing different or complementary tasks, for instance the CAVIAR tools (CAVIAR, CAVIARBF, msCAVIAR, eCAVIAR), the EpiXcan tools (S-MultiXcan, S-PrediXcan), the LDAK programs (SumHer, GBAT), the MAGMA tools (nMAGMA, H-MAGMA, eMAGMA) and so on. We need to emphasize that in many cases a tool, originally developed for IPD, is later adapted to handle summary data, whereas in other cases a tool is succeeded by a newer version with added capabilities. For instance, the original PrediXcan method uses only IPD, but it is now considered deprecated. S-PrediXcan and S-MultiXcan are later versions that are designed to be used with summary data. The same is the case regarding SKAT. The original method uses only IPD, but later implementations like metaSKAT or SKAT-O allow for summary data as well. At the same time, it is of importance that there are several tools that combine different functionalities. For instance there are tools that can perform meta-analysis and GSA (MAGENTA), gene-based methods that also offer functionalities for conditional analysis (GCTA), methods for analysis of multiple traits with gene-based tests (multiSKAT, MSKAT), methods that can be seen both as methods for multiple-traits or as meta-analysis (PLEIO, PASCAL), methods that perform both GSA and gene-based tests (aSPU, snpGeneSets, PascalX, PASCAL,MAGMA, FUMA). Of course, there are several single-purpose methods that use and combine different statistical tests or different methods (OWC, MCA, TWT, EBMMT, COMBAT, sumFREGAT, MKATR), and we may not forget methods like LDSC, with its variants, which was originally developed for distinguishing polygenicity from bias, but it is also used for estimating heritability and genetic correlation being integrated in many other tools and pipelines.

As we already mentioned, the tools and databases included in the study were those with a functioning URL. In many publications identified through the literature search the URL was not working. In some situations, we recovered a valid link by performing google searches, or by identifying the authors’ websites, but in many cases, this was not enough. Similarly, several tools deposited in CRAN had been removed or archived. This kind of problem is something already known in the scientific community for years [363,364,365]. However, there is more to it. Even for the tools included in the review we could not verify without proper testing that they all work seamlessly, especially for the older ones [366]. Operating systems evolve, programming languages change, and with these the dependencies of each software also change. Even though there are available best practices [367], it is not always realistic to expect complex software to work forever without maintenance. Even for some of the tools having valid URLs, for instance deposited on GitHub, or on personal web pages, we found statements by the authors indicating that the software is no longer maintained and that it is not easy to provide technical support. It is clear that more advanced solutions should be pursued. For instance, among the tools we identified the majority are written in R and Python, but only a handful is available as a webserver: ten of the tools for GSA, three tools for colocalization, two tools for meta-analysis, and one for pleiotropy analysis, MR and fine-mapping. Of course, several of the secondary databases we identified also provide the functionality of performing the analysis using data provided by the user (webTWAS, TSEA-DB, PCGA), but even counting these the proportion of web-tools is rather low (< 10%). Web servers and web services have become of high relevance to the field of bioinformatics during the last 20 years [368], so it is expected to have an increasing number of relevant webservers in the near future as relevant tools are available to facilitate the incorporation of existing applications [369,370,371,372]. On the other hand, some tools may be too computationally demanding, so other solutions must be found. Container-based applications [373, 374] such as Docker can simplify maintenance procedures and add to the reproducibility of research [375]. Community efforts such as udocker [376] may promote usability of complex software tools by non-experts in multi-user environments.

As data accumulates it is unavoidable to head to analyses on an even larger scale. Traditionally the large-scale analysis of many gene-disease associations is modeled by the so-called diseasome [377, 378] using graph theoretic methods [379, 380]. The gene-disease network is composed of pairwise associations obtained from public databases and is a bipartite network [379] consisting of two separate sets of nodes and the interactions between nodes belonging to the different sets. The projection to the one or the other of the sets may lead to the gene–gene or the disease-disease projected networks that inform us about the associations between members of the same set (for instance, two diseases are connected if they share common genes, and so on). Such methods are available for years, but they treat the associations as fixed inputs to the graph. As data accumulate and even more complex statistical methods are developed that allow cross-trait comparisons and combined analyses of multiple traits, along with the integration of different types of data such as xQTL, it is tempting to speculate that a fusion of these two traditions may come, in which the statistical formalism of the tools presented in this review will merge with the graph theoretic approaches developed in the systems biology literature. For instance, we may see network approaches leading to causal analyses (similar to MR) that consider simultaneously all the diseases and traits for which we have GWAS summary data, or similar approaches that integrate xQTL data of various types, different tissues and so on.

We hope that this comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases, as well as to methodologists that develop and test relevant methods. We provided a detailed overview of the available tools and databases, and we hope that this work will facilitate informed tool selection and will maximize the effectiveness of using GWAS summary statistics.