Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits

Ling, Ziqi; Li, Jing; Jiang, Tao; Zhang, Zhen; Zhu, Yaling; Zhou, Zhimin; Yang, Jiawen; Tong, Xinkai; Yang, Bin; Huang, Lusheng

doi:10.1038/s42003-024-06050-7

Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits

Article
Open access
Published: 29 March 2024

Volume 7, article number 381, (2024)
Cite this article

Download PDF

You have full access to this open access article

Communications Biology

Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits

Download PDF

1250 Accesses
1 Citation
10 Altmetric
1 Mention
Explore all metrics

Abstract

Genetic variants can influence complex traits by altering gene expression through changes to regulatory elements. However, the genetic variants that affect the activity of regulatory elements in pigs are largely unknown, and the extent to which these variants influence gene expression and contribute to the understanding of complex phenotypes remains unclear. Here, we annotate 90,991 high-quality regulatory elements using acetylation of histone H3 on lysine 27 (H3K27ac) ChIP-seq of 292 pig livers. Combined with genome resequencing and RNA-seq data, we identify 28,425 H3K27ac quantitative trait loci (acQTLs) and 12,250 expression quantitative trait loci (eQTLs). Through the allelic imbalance analysis, we validate two causative acQTL variants in independent datasets. We observe substantial sharing of genetic controls between gene expression and H3K27ac, particularly within promoters. We infer that 46% of H3K27ac exhibit a concomitant rather than causative relationship with gene expression. By integrating GWAS, eQTLs, acQTLs, and transcription factor binding prediction, we further demonstrate their application, through metabolites dulcitol, phosphatidylcholine (PC) (16:0/16:0) and published phenotypes, in identifying likely causal variants and genes, and discovering sub-threshold GWAS loci. We provide insight into the relationship between regulatory elements and gene expression, and the genetic foundation for dissecting the molecular mechanism of phenotypes.

Expression genome-wide association study identifies key regulatory variants enriched with metabolic and immune functions in four porcine tissues

Article Open access 11 July 2024

A compendium of genetic regulatory effects across pig tissues

Article Open access 04 January 2024

Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases

Article Open access 29 April 2019

Introduction

Pigs, as long-time domesticated animals, have become one of the primary meat sources. In 2020, pork maintained the second-highest average per capita consumption among meat products. To address the increasing demand for pork consumption and improve meat quality, it is crucial to efficiently raise pig populations that exhibit desirable performance matched with the specific requirements, such as high growth rate, low fat, and good adaptation to particular environments. The liver is a key metabolic and heat-producing organ that participates in the processing and utilization of energy sources such as glucose, fatty acids, and amino acids¹. It plays a vital role in growth, fat metabolism, cold adaptation, and other economic traits in agricultural animals. For example, proteins related to antioxidant enzymes and ribosomal proteins can affect the cold adaptation of pigs at high altitudes². Liver vitamin A metabolism affects feed efficiency in pigs³. Liver glucose metabolism correlates strongly with protein and lactose concentrations in the milk of dairy cows^4,5. Therefore, understanding the genetic basis of liver-related traits would benefit pig production.

Genome-wide association studies (GWAS) identified thousands of genetic variants responsible for important economic traits⁶. Still, the majority of GWAS loci are located in non-coding regions of the genome⁷, hampering the identification of causal variants. In humans, likely causal variants that alter gene expression through changes to regulatory elements were prioritized by the integration of eQTLs and H3K27ac QTLs⁸, yet comparable efforts in pigs remain lacking. H3K27ac is one of the most widely studied histone modifications due to its predominant deposition in active promoters and enhancers^9,10, and is highly correlated with gene expression. Recent studies have employed H3K27ac to annotate many regulatory elements in pigs, narrowing down the genome regions containing candidate variants associated with complex traits identified by GWAS^11,12,13. However, causal variants hiding in these candidate variants and governing the activity of regulatory elements remain to be pinpointed. Moreover, genetic variants can influence complex traits by modulating gene expression with the assistance of changes to regulatory element activity, but the extent to which genetic effects on gene expression through changes to regulatory element activity remain incompletely characterized genome-wide. Thus, it is necessary to identify genetic variants affecting the activity of regulatory elements genome-wide and elucidate their impact on gene expression, which is valuable for exploring the molecular mechanisms underlying pig complex traits.

In this study, we study H3K27ac activity in up to 292 liver samples from a heterogeneous population managed under the same external environment, thus reducing the variance of environmental factors and amplifying the genetic effect. We identify high-quality H3K27ac peaks and super-enhancers, providing abundant regulatory elements in pig liver. Based on the large sample size, we further characterize inter-individual variation in regulatory element activity, facilitating subsequent acQTLs mapping. Combined with DNA and RNA-seq data, from these individuals, we detect expressed genes, acQTLs, and eQTLs for sharing analyses, colocalization, and GWAS fine-mapping. We validated two putative causal variants contributing to H3K27ac signals in independent datasets through allelic imbalance analyses. Noticeably, both variance decomposition and causal inference analyses support a pleiotropic mode, i.e., in the majority of cases, H3K27ac exhibits a concomitant rather than causative relationship with gene transcription. Furthermore, we demonstrate the utility of H3K27ac, acQTLs, and eQTLs in identifying likely functional gene AKR1A1, regulatory element, and causal variant 6_165830307 responsible for liver dulcitol levels and unveiling sub-threshold GWAS variants for liver PC(16:0/16:0) levels. To further interpret GWAS loci for phenotypes that may act through the liver, we intersect our datasets with published variants associated with pig diseases and traits, prioritizing liver-related phenotypes by gene-peak pairs. Overall, we provide a unique resource to disentangle the genetic regulations of H3K27ac states and gene expression, which will facilitate the applicability of GWAS in pig breeding.

Results

Data description and annotation of regulatory elements

We obtained H3K27ac profiles from 292 pig livers through chromatin immunoprecipitation and sequencing (ChIP-seq). The Sscrofa 11.1 genome was used as the reference for mapping, resulting in an average of 27.4 million uniquely mapped reads per sample (88.7% mapping rate, Supplementary Fig. 1a). After peak calling and filtering procedures, we identified an average of 77,947 H3K27ac peaks per sample, with the fraction of reads in peaks averaging 16% (Supplementary Fig. 1b, Supplementary Data 1). The average peak width across all samples was 691 bp, and the frequency distribution of peaks is highly right-skewed (Fig. 1a). All peaks were further merged into 90,991 consensus peaks that occurred in at least three samples. We defined consensus peaks within the 1 kb of transcription start site (TSS) as promoters and the others as enhancers, yielding 16,544 promoters and 74,447 enhancers. In addition, 41% of the H3K27ac peaks were present in introns, nearly one-third of which were the first introns (Fig. 1b). Twenty-three percent and eighteen percent peaks were distributed in distal intergenic and promoter regions, respectively. Nearly half of the peaks were within 10 ~ 100 kb of the TSS of the nearest gene (Supplementary Fig. 1c). To verify the reality of these peaks, we overlapped H3K27ac peaks in this study with those from a previous study¹³. The results showed that 30,169 H3K27ac peaks from this study covered 98.6% (74,865 out of 75,905) of peaks in research conducted by Kern et al.¹³. The remaining 60,822 peaks contained 56,408 (93%) enhancers, and 41,795 (69%) resided in regulatory regions identified by six epigenetic marks in previous studies^13,14, supporting the reliability of the H3K27ac peaks. We then correlated the occurrence percentage of each consensus peak with its abundance measured by FPM (fragments per million). The peak occurrence percentage was significantly positively associated with its abundance (Spearman’s correlation, ρ = 0.809, P-value < 2.2 × 10⁻¹⁶, Fig. 1c). The top 5000 H3K27ac peaks ranked by FPM were used to overlap with promoters and enhancers (Supplementary Fig. 1d), showing that H3K27ac activities are generally higher in the promoter (75% overlapped) than in the enhancer. Besides, promoters had a greater likelihood of being shared across individuals than enhancers (Fig. 1d).

**Fig. 1: Comprehensive profiling of H3K27ac.**

Super-enhancers are essential in controlling genes that could determine cell and tissue identity¹⁵. Herein, we identified an average of 1090 super-enhancers per sample, covering an average of 47.2 kb in width (Supplementary Fig. 1e). These super-enhancers are subsequently merged into 2463 consensus super-enhancers that were found in at least three samples (Supplementary Data 2). The biological coefficients of variation (mean = 0.22) for the peaks within consensus super-enhancers were lower (two-sided T-test, P-value = 1.12 × 10⁻¹⁰²) than those of regular peaks (mean = 0.23), indicating that the activity of super-enhancer peaks was more stable across individuals (Supplementary Fig. 1f). Among the 2463 consensus super-enhancers, 43 were active in at least 99% of individuals and covered 237 genes. Analysis of gene enrichment revealed their involvement in liver-related pathways, such as cellular response to lipid (LDLR, GPBAR1, SCARB1) and folate metabolism (MAT1A, SHMT2), highlighting important roles of these cross-individual shared super-enhancers in maintaining the function of the liver (Fig. 1e, Supplementary Fig. 1g). Taken together, we generated a unique H3K27ac profile of pig liver at the population scale.

To determine the effect of H3K27ac on the transcriptome, we obtained 40 million RNA uniquely mapped reads on average from the same population, with a 98% mapping ratio to the Sscrofa 11.1 genome (Supplementary Fig. 1a). A total of 15,509 genes expressed in at least 20% of individuals were identified, 2667 (17%) of which lacked H3K27ac signals in their promoter regions, indicating asynchronous alteration between H3K27ac and gene expression. Using the chromatin accessibility dataset, this analogous phenomenon was also observed in human livers¹⁶. Enhancer RNAs (eRNAs), transcribed from enhancer regions, play crucial roles in development and disease^17,18. To determine the putative polyadenylated eRNAs in livers, we selected enhancers from distal intergenic peaks and within 300 bp downstream of genes to assemble new transcripts, harvesting 276 potential eRNAs overlapping with 378 H3K27ac peaks. Notably, 187 (49%) of 378 peaks were located in super-enhancers (Hypergeometric test P-value = 1 × 10⁻²²), indicating that super-enhancer regions may serve as chromatin niches that facilitate the expression of polyadenylated eRNAs. Compared with ordinary mRNAs, the putative eRNAs had fewer exons and were shorter in length (Fig. 1f and Supplementary Fig. 1h). Fifty-one percent of eRNAs were not spliced (Supplementary Data 3), which is consistent with the characteristics of the eRNAs¹⁹.

Identification of genetic variants associated with liver H3K27ac

Insights into the inter-individual variation of peak activity and its heritability (h²) can help in understanding pathways from DNA to H3K27ac. We estimated the heritability of peak activity of 88,926 peaks located in autosomes using 278 individuals. The genomes of these individuals were sequenced to an average depth of 7.8 × ^20,21. The activity of regulatory elements is controlled by both cis-QTL and trans-QTL²². Thus, we estimated for each peak h²_cis (the variance explained by genetic variants located within ±1 Mb from the peak) and h²_trans (the variance explained by genetic variants located beyond ±5 Mb from the peak). Among 88,926 peaks, 10% peaks have h²_cis greater than 0.2, and 5% peaks have h²_trans greater than 0.2 (Supplementary Fig. 2a, b). Mean h²_cis was 0.064, which was significantly higher than h²_trans (mean = 0.029, two-sided T-test, P-value < 2.2 × 10⁻¹⁶). We further group h² into promoters-h² and enhancers-h². Estimates of promoters-h² showed significantly higher than enhancers-h², regardless of the cis or trans patterns (Supplementary Fig. 2c, d). Besides, estimates of h²_cis are still >h²_trans in both groups (Supplementary Fig. 2e, f).

To further understand the genetic basis underlying the H3K27ac (Supplementary Fig. 3a), we performed H3K27ac quantitative trait loci (acQTLs) analysis based on 30,244,904 single-nucleotide polymorphisms (SNPs) and insertion-deletions (Indels). We represented a cis-acQTL using the lead variant within 1 Mb, and a trans-acQTL as the lead variant >5 Mb from the peak. The results showed that 27% of the 90,991 consensus peaks were affected by 24,836 cis-acQTLs. Bayesian fine-mapping analyses further revealed that 6651 (27%) cis-acQTLs were narrowed to <200 kb with 95% confidence intervals and 5782 (23%) harbored <20 candidate causal variants (Supplementary Fig. 3b, c). Additionally, we identified 3589 trans-acQTLs responsible for 3395 (3.7%) peaks, 312 of which were associated with trans-chromosome peaks (Supplementary Fig. 3d, Supplementary Data 4 and 5).

To search for the extent of the pleiotropic effect of acQTLs, we conducted pairwise colocalized analysis among lead acQTL variants within a 500 kb distance. The result showed that the majority of acQTLs were responsible for one peak, which was consistent with findings in humans²³ (Fig. 2a). Eighty-four percent of 27,397 genetic variants included in 28,425 acQTLs were associated with one peak. Notably, we also observed both acQTL 9_118156481 and 14_106813309 were linked to nine peaks on the same chromosome. The nine target peaks for acQTL 14_106813309 are located in the intron of CYP2C42, a gene involved in pig liver NADPH-dependent electron transport. These peaks showed high cooperativity of direction (Supplementary Data 4). The majority of acQTLs are cis-acQTLs (87%), which is similar to the result of chromatin accessibility QTLs in mouse²². We then investigated the genomic features of these cis-acQTLs. Among 24,836 cis-acQTLs, 5171 and 19,665 were associated with promoter and enhancer peaks, respectively. We found the cis-acQTLs have a higher enrichment in enhancer peaks than in other genomic features, which is comparable to the result of chromatin accessibility QTLs in human liver¹⁶. The degree of the enrichment of the cis-acQTLs in enhancer peaks increased with their posterior probabilities (PPs) obtained from the Bayesian fine mapping analysis (Fig. 2b). Cis-acQTLs tend to be symmetrical within 50 kb of their target peaks, and the acQTL closer to the target peaks has a higher association significance (Supplementary Fig. 3d–f), suggesting that genetic variants within and adjacent to the H3K27ac peaks are more likely to regulate the corresponding peaks.

**Fig. 2: Characterization and transcription factor binding analysis of acQTLs.**

Exploring the regulatory mechanism of acQTLs

The causal variants determining H3K27ac can contribute to the allelic imbalance of histone marks in heterozygotes^24,25. To validate the causality of these lead acQTL variants, we examined allelic imbalance of H3K27ac activity for 21 lead acQTL variants that met the following criteria: (1) reads covering the variants have no mapping bias²⁶; (2) a sufficient number of heterozygous samples for the statistical test; (3) the variants are located inside H3K27ac peaks; (4) the PPs of variants exceeding 0.9. We observed that 14 lead acQTL variants exhibited consistency between acQTL analysis and allelic imbalance analysis in terms of effect allele direction, eight of which showed significant differences between reads coverage of reference alleles and alternative alleles (Supplementary Data 6). To confirm these 8 lead acQTL variants further, we retrieved 24 H3K27ac data of pig livers from three independent studies^11,12,13. Five out of 8 lead acQTL variants had sufficient heterozygous individuals for the statistical test. Four lead acQTL variants showed a consistent tendency, and 2 were successfully verified (Supplementary Data 6, Fig. 2c, d, Supplementary Fig. 3g, h), supporting the reliability of our identified lead acQTL variants.

Genetic variants could regulate the histone modification with the assistance of transcription factors (TFs)^24,27. We selected 3228 lead acQTL variants inside their target peaks to examine the binding ability of TFs. The results showed that 1288 (39.9%) loci harboring lead acQTL variants could bind TF, and 722 of which were inferred to gain/loss or alter the binding of 67 TFs when alternative alleles substituted reference alleles (Fig. 2e, Supplementary Data 7 and 8). The most frequently affected TFs included IRF1, STAT1, ZBTB16, HMGA1, and VEZF1. For example, the T allele at 70,309,544 on chromosome 4 (4_70309544), associated with greater peak activity of chr4:70308066-70310904, was inferred to have enhanced DNA binding strength with TF ZBTB16 (Fig. 2f–h). The analysis provided a list of candidate TFs regulating the H3K27ac peak activity by binding with lead acQTL variants and partly clarified the regulatory mechanism of H3K27ac.

A pioneering study shows that genetic variants could control distal H3K27ac through spatial interaction in lymphoblastoid cell lines²³. Herein, we employed Hi-C data from pig liver from another independent research to explore the mechanism of genetic variants affecting H3K27ac of pig liver²⁸. Combined with the H3K27ac data of this study, we found that 25,612 (90%) of 28,425 acQTLs contact with their target peaks inferred from the Hi-C data (Supplementary Data 9). Compared with all interaction pairs with matched distance, cis-acQTLs (Hypergeometric test P-value = 1.72 × 10⁻⁴⁰) and trans-acQTLs (Hypergeometric test P-value = 2.43 × 10⁻³⁰²) were significantly enriched in the contact regions encompassing their target peaks (Fig. 2i). Notably, 15% (48 out of 312) inter-chromosome acQTLs contacted genomic regions encompassing their target peaks, supported by an average of 45 Hi-C reads (Supplementary Data 9). Consequently, the Hi-C data not only strengthened the confidence of acQTLs but also agreed with the report where inter-chromosome coordination between regulatory elements had been identified using H3K27ac and Hi-C data in human lymphoblastoid and fibroblast cell lines²⁹.

Identification and characterization of eQTLs

H3K27ac is usually linked to the upregulation of gene expression. To explore the relationship between the genetic regulations on gene expressions and that on H3K27ac activity, we identified liver eQTLs in 256 individuals using the same strategies as those used for acQTLs (Supplementary Fig. 4a). The 10,078 (65%) out of 15,509 expressed genes (eGenes) were associated with 12,250 eQTLs, including 10,042 cis-eQTLs and 2208 trans-eQTLs (Supplementary Data 10 and 11). The majority of eQTLs (92%) were found to be associated with one gene (Fig. 3a). Bayesian fine-mapping revealed that 2766 (27%) cis-eQTLs were narrowed to <200 kb with 95% confidence intervals and 2659 (26%) harbored <20 candidate causal variants (Supplementary Fig. 4b). Similar to acQTLs, the majority of eQTLs were located in intron (53%) and distal intergenic (34%; Supplementary Fig. 4c). Moreover, the number of eQTLs and acQTLs across chromosomes is positively correlated (Supplementary Fig. 4d). Comparing the distribution of eQTLs and acQTLs within 2 Mb windows across the entire genome, we discovered a low similarity (Pearson’s R² = 0.32, P-value < 2.2 × 10⁻¹⁶) between eQTLs and acQTLs at the genome distribution (Supplementary Fig. 5). We further focused on acQTLs associated with promoter peaks (promoter-acQTLs) and discovered a medium similarity (Pearson’s R² = 0.61, P-value < 2.2 × 10⁻¹⁶) between eQTLs and promoter-acQTLs.

**Fig. 3: Genetic sharing and causal relationship between H3K27ac and gene expression.**

Next, we investigated the genomic features of the eQTLs. Among 10,042 cis-eQTLs, 2967 (29.5%) are located within the H3K27ac peaks, preferentially in promoter peaks (Fig. 3b), e.g., 1304 (44.0%) of the 2967 cis-eQTLs were located in the promoter peaks, corresponding to a fold enrichment of 4.02 (Hypergeometric test P-value = 1 × 10⁻³⁸⁸) and the fold change value increased with PPs. In addition, higher PPs were associated with a higher frequency of cis-eQTLs in the promoter peaks of their target genes (Supplementary Fig. 4e and Data 12). Analogous to acQTLs, eQTLs tend to localize within a genomic distance of 50 kb from the TSS (Supplementary Fig. 4f, g), and their association significance was positively associated with the distance from the TSS of target genes (Supplementary Fig. 4h). The results indicated that the genomic proximity between eQTLs and the TSS of target genes is a critical determinant for the likelihood of variants exerting an eQTL effect.

In summary, these analyses revealed similar genomic features of the acQTLs and eQTLs, intriguing the further exploration of the shared genetic controls on H3K27ac activity and gene expression.

Joint analyses of H3K27ac and transcriptome

Connecting H3K27ac peaks to their target genes is challenging, but promoters are highly associated with gene expression. We examined the sharing between the loci for H3K27ac promoter peaks and those for corresponding gene expression using the π₁ value (Methods). The majority (π₁ = 0.84) of promoter-acQTLs were preserved in eQTLs regulating gene expression, which was in line with the result in humans³⁰. Approximately half (π₁ = 0.54) of eQTLs were replicated in promoter-acQTLs, suggesting a substantial sharing of the genetic controls between gene expression and promoter H3K27ac activity (Fig. 3c). Besides, QTL significance for both H3K27ac and gene expression led to an increase in the sharing value (Fig. 3c). To determine all shared loci, we then conducted a colocalization analysis. The lead eQTL variants and the lead acQTL variants were considered colocalized when they were in high linkage disequilibrium (LD, r² > 0.8) and <500 kb apart. The results showed that 2336 acQTLs were colocalized with 1770 eQTLs, leading to 2682 peak-gene pairs (Supplementary Data 13). To reduce the false discovery rate, we kept peak-gene pairs in which the peak is the promoter peak of this gene, or the peak activity is significantly associated with gene expression by the Spearman’s correlation method after multiple testing correction of Benjamini-Hochberg (adjusted P-value < 0.05). Besides, we employed the Bayesian test to perform colocalization and intersected the above results³¹. In total, we identified 1183 target genes for 1616 peaks, comprising 1818 unique peak-gene connections. Most (90%) of the peaks were linked to one gene. But we also identified peaks that were linked to multiple genes. For example, the H3K27ac peak at chr6:54331130-54345331, classified as a promoter of gene HRC, was linked to 4 genes (HRC, CPT1C, EMC10, FCGRT). We further checked their physical contact using Hi-C data and found that the HRC promoter contacts with gene CPT1C, EMC10, and FCGRT, supported by 23, 4, and 24 Hi-C reads respectively, suggesting promoters can function as enhancers for target genes they interacted with³² (Supplementary Data 13). The 1616 peaks were significantly enriched at the promoter with a fold enrichment of 1.8 (Hypergeometric test P-value = 9.71 × 10⁻⁵⁸, Fig. 3d). We showed an example of how an acQTL affects the promoter H3K27ac activity and the expression of gene FLRT3 (Supplementary Fig. 6a–f). Notably, 606 peaks outside the promoter or gene body of target genes appeared to be enriched in the promoters of other genes (Fold enrichment = 1.2, hypergeometric test P-value = 5.65 × 10⁻⁴, Fig. 3d).

To explore the causal hierarchies between H3K27ac and gene expression, we first applied a variance decomposition model to estimate the contribution of genetic and H3K27ac factors to transcriptional variance. The H3K27ac explained a lower proportion of transcriptome variance in models where epigenetic elements were adjusted for proximal genetic effects compared to the corresponding unadjusted models, indicating the correlations between the H3K27ac activity and gene expression for the most part can be attributed to genetic variation (Fig. 3e). Secondly, using QTLs as instructors, we inferred the causal relationships between the H3K27ac and target gene expression with the Intersection-Union Test³³. It discriminated four causal scenarios: (1) type0, the causal relationship could not be dissolved; (2) type1, QTLs act on gene expression through H3K27ac; (3) type2, QTLs act on H3K27ac through gene expression; (4) type3, QTLs act on H3K27ac and gene expression, respectively. Among 1900 QTL-peak-target gene trios, we identified 791 type0, 78 type1, 147 type2, and 884 type3 scenarios (Fig. 3f), suggesting that a large proportion of H3K27ac accompany rather than determine the gene expression^34,35.

Identification of functional regulatory elements, genes, and putative causal variants for metabolism-related molecular phenotype and published GWAS loci

Integrating the significant GWAS signals with H3K27ac, gene expression, acQTLs, and eQTLs could aid in identifying functional genes, regulatory elements, and causal variants responsible for interesting traits or diseases^16,36. Dulcitol is a type of sugar alcohol produced by the reduction of galactose, the excessive accumulation of which could lead to the development of galactosemic cataracts, neurological impairment, and renal dysfunction^37,38,39. We first performed GWAS for the liver dulcitol levels using 321 individuals from the same population used for the H3K27ac ChIP-seq, which revealed a notable signal on chromosome 6 with the top SNP 6_165846829 located within an intronic region (Fig. 4a). To identify the regulatory elements regulating dulcitol, we found 19 H3K27ac peaks associated with acQTLs within flanking 500 kb centered at SNP 6_165846829. Moreover, peak chr6:165828531-165836912, which was associated with the acQTL 6_165829987, is the nearest peak to the SNP among the peak-SNP pairs in which the acQTL was colocalized (r² = 0.93, PP4 = 0.935) with the SNP 6_165846829. We then colocalized acQTL 6_165829987 with eQTLs to link functional genes. The result showed that acQTL 6_165829987 was colocalized with eQTL 6_166100952 (r² = 0.96, PP4 = 0.99) responsible for gene AKR1A1 (Aldo-Keto Reductase Family 1 Member A1) and PRDX1 (Peroxiredoxin 1). Besides, the eQTL 6_166100952 was colocalized (r² = 0.91, PP4 = 0.96) with the top SNP 6_165846829 from GWAS (Fig. 4b). Notably, AKR1A1 encodes an Aldo-Keto reductase that is involved in the metabolism of dulcitol⁴⁰. Both the activity of peak chr6:165828531-165836912 and the expression of AKR1A1 were significantly associated with dulcitol levels (Fig. 4c). Thus, we considered AKR1A1 and chr6:165828531-165836912 to be a strong candidate gene and regulatory elements, respectively, in controlling dulcitol. To identify likely causal variants, candidate variants were defined within the 95% confidence intervals of the GWAS, the acQTL for peak chr6:165828531-165836912, and the eQTL for gene AKR1A1 (Fig. 4d). Six variants were collected, among which Indel 6_165830307 not only showed high LD with lead variants for the above three molecular traits but also was located within the peak chr6:165828531-165836912 (Fig. 4d, e). Furthermore, we predicted the alteration of TF binding when the alternative allele at Indel 6_165830307 substituted the reference allele. Consequently, the binding ability of TFs TP73, TP63, and TP53 were inferred to change significantly (Fig. 4f). We herein highlighted Indel 6_165830307 as a likely causal variant for dulcitol.

**Fig. 4: Identification of candidate causal variants, regulatory elements, and target genes for dulcitol.**

Utilizing epigenomic data allows for sub-threshold loci detection in GWAS^41,42. We first performed GWAS for PC(16:0/16:0) using 321 individuals from the same population used for the H3K27ac ChIP-seq and identified two variants (13_160375595, 13_160375587) exceeding the empirical threshold (P-value < 5 × 10⁻⁸; Supplementary Fig. 7a). To discover more loci, we harvested 4065 nominal variants by employing a weaker threshold with a P-value of 1 × 10⁻⁴. These variants were significantly enriched within the H3K27ac peak region (Hypergeometric test P-value = 8.7 × 10⁻⁴), implying their potential function and stimulating further investigation. We grouped these 4065 nominal variants using LD (minimum r² = 0.2) to identify 602 independent sub-threshold loci. Among 602 loci, 194 overlapped with H3K27ac peaks, and variants within these loci exhibited significantly stronger P-values than the remaining 408 loci (two-sided T-test, P-value = 3.0 × 10⁻⁸; Supplementary Fig. 7b). We linked H3K27ac peaks with these 194 loci by colocalizing them with acQTLs through high LD (r² > 0.8), resulting in 13 loci associated with 13 peaks. Particularly, a locus located at 79.5 Mb on chromosome 13 (locus A) was associated with peak chr13:110306820-110312485 and chr13:110312491-110318835 (Supplementary Fig. 7c–e). Unexpectedly, another independent locus located at 110.2 Mb on chromosome 13 (locus B) was also linked to these two peaks (Supplementary Fig. 7c, e). To identify putative target genes for these two peaks, we searched the previously generated list of peak-gene pairs. The two peaks were discovered to be associated with the gene PLD1 (Phospholipase D1), which encodes a PC-specific phospholipase involved in PC metabolism⁴³, and were located within the first intron of this gene (Supplementary Fig. 7c,e). On locus B, the colocalization results between signals of PC(16:0/16:0) and QTLs of three molecular phenotypes were further assessed, i.e. chr13:110306820-110312485 (PP4 = 0.84), chr13:110312491-110318835 (PP4 = 0.84) and PLD1 (PP4 = 0.84). On locus A, only chr13:110306820-110312485 (PP4 = 0.86) and PLD1 (PP4 = 0.84) were confirmed. In addition, PC(16:0/16:0) was significantly associated with peak chr13:110306820-110312485 and PLD1 expression (Supplementary Fig. 7f). Therefore, we proposed PLD1 as the functional gene and peak chr13:110306820-110312485 as a regulatory element in modulating PC(16:0/16:0).

To further interpret GWAS loci for phenotypes that may act through the liver, we obtained GWAS variants pertaining to 11 categories of pig traits from the ISwine database⁴⁴. We linked 55 phenotypes to 111 gene-peak pairs via LD score (r² > 0.8), resulting in 167 candidate variants (Fig. 5a, Supplementary Data 14 and 15). The liver is the primary organ responsible for promoting rapid erythrocyte elimination and iron recycling^45,46. Correspondingly, we found that the variant 7_32054693 for mean corpuscular hemoglobin concentration (MCHC) was linked with the gene BNIP5 (BCL2 Interacting Protein 5) and its promoter peak chr7:32045442-32062909, which overlapped with the super-enhancer chr7: 32045738-32067174 and was positively associated with the expression of BNIP5 (Fig. 5b, c). Notably, the gene BNIP5 was documented in the MGI (Mouse Genome Informatics) dataset as influencing the hematopoietic system. Thus, we proposed that the gene BNIP5 and the peak chr7:32045442-32062909 may have a functional role in the modulation of MCHC. Besides, variant 7_32054693 was prioritized as a candidate functional variant due to its location within the peak region and inclusion in the 95% credible sets of both the acQTL and the eQTL (Fig. 5d). Similarly, another prioritized candidate functional variant 14_107869191 for hematocrit and red blood cell count resided within the peak chr14:107868236-107869603 and was the lead acQTL variant for this peak as well as the lead eQTL variant for the gene TLL2 (Supplementary Fig. 8a–c). In addition, we also prioritized genes involved in the growth of pigs, such as average daily gain, for which 5 genes were collected. One of these (ENSSSCG00000022032) was documented in the MGI dataset as mouse growth-related genes (Supplementary Data 14).

**Fig. 5: Prioritizing functional variants, genes, and regulatory elements for published GWAS of MCHC.**

Overall, our analyses thereby demonstrated that acQTLs and eQTLs generated in this study are valuable instruments for dissecting the molecular mechanism of phenotypic GWAS loci.

Discussion

Epigenetics, as an important regulatory layer, with the assistance of gene expression, has become a powerful tool to dissect molecular mechanisms underlying phenotypic variation. Although many regulatory elements have been annotated by epigenetic marks in pigs^11,12,13, the effects of genetic variants on epigenetics have yet to be comprehensively characterized. This study identified an extensive set of H3K27ac peaks corresponding to active promoters, enhancers, and super-enhancers. Genetic variants associated with H3K27ac and gene expression were mapped to investigate their relationships and successfully utilized for aiding in identifying the likely causal genetic variant of liver-related phenotype.

The high overlap of other epigenetic marks from previous studies with H3K27ac peaks in this work shows the credibility of our data. The distribution of liver H3K27ac with respect to genomic features exhibits a similar pattern across pigs, humans, and mouse^8,47. Previous studies demonstrated that enhancers are less conserved across species and tissues compared to promoters^{13,14,48,49,50}. Our result further showed that enhancers have a lower likelihood of being shared across individuals than promoters. Comparative regulatory genomic analysis in 20 mammalian species has revealed the rapid evolution of enhancers, and the rate of divergence for enhancers is estimated to be 3 times faster than for promoters⁴⁸. Newly evolved enhancers showed high inter-individual variability and tended to be less integrated in transcriptional networks⁵¹. Indeed, abundant enhancers without contacting promoters do not regulate gene expression⁴⁹. Besides, many enhancers are functionally redundant or have modest effects on target gene expression^52,53. Consequently, the high variation of enhancers across species, tissues, and individuals might be better tolerated.

Our results indicated a strong peak signal tends to have a high peak occurrence across individuals. There are two reasons for this: (1) strong peak signal is easy to detect, in turn, leading to their high occurrence; (2) regulatory elements with strong peak signals exert an essential function in pig liver, reflected by frequent occurrence. Comparably, promoters and enhancers that are active in both humans and mice have stronger H3K27ac signals than species-specific regulatory elements⁵⁴. We further obtained chromatin states conducted by Pan et al.¹⁴, and found that all-tissue shared promoters/enhancers have higher activity than liver-specific promoters/enhancers (Supplementary Fig. 9). These findings suggested that regulatory elements with high H3K27ac signals tend to be stable across individuals, tissues, and species.

Highly shared super-enhancers cover these genes involved in liver-related pathways, implying their importance in the maintenance of liver function⁵⁵. Combined with RNA-seq data, a significant enrichment of polyadenylated eRNAs within super-enhancer regions corroborates that super-enhancers induce higher levels of eRNAs than typical enhancers¹⁷. For the 15,509 expressed genes, H3K27ac signals cannot be observed in promoter regions for 17% of genes. This phenomenon has been observed in several studies of accessible chromatin^16,56. This is possibly a consequence of the limitations of H3K27ac as a mark for completely capturing the accessibility of regulatory elements^57,58. A previous study also showed that an average of 4.79% accessible pig genome was not marked by any four epigenetic marks including H3K27ac¹⁴. Another explanation is the potential asynchrony that H3K27ac disappears before mRNA degradation³⁵.

Using Bayesian fine-mapping analyses, more than 5000 cis-acQTLs with small confidence intervals (<200 kb) and few candidate causal variants (<20) were identified, which could greatly assist in identifying causal variants responsible for interesting traits. The estimates of heritability showed genetic variants located within cis windows are more likely to affect peak activities than trans variants, supporting the result that most acQTLs were cis-acQTLs. Nonetheless, we found 3589 trans-acQTLs, which involved 312 trans-chromosome peaks. Enriching trans-acQTLs and peaks in the Hi-C contact region supports their 3D genomic interactions, consistent with previous studies^23,29. Besides, acQTLs could be associated with multiple peaks, indicating its pleiotropic effect on H3K27ac. Higher cis-acQTL enrichment in enhancer peaks than promoter could also reflect the rapid evolution of enhancers, possibly due to it harboring more genetic variants. Whether acQTLs are causal mutations or not can be effectively examined by analyzing allelic imbalance of H3K27ac peaks within individuals, due to paternal and maternal alleles functioning as within-sample controls⁵⁹. To explore the regulatory mechanism of acQTLs affecting H3K27ac, the allelic imbalance of H3K27ac activity was successfully validated in several lead acQTL variants, implying the strong reliability of other lead acQTL variants. Furthermore, 1288 lead acQTL variants have binding sites for TF, which not only strengthens the credibility of acQTLs but also implicates its regulatory mechanism.

Our results show that gene expression (or H3K27ac abundance) is more likely to be affected by genetic variants closer to the TSS (or peak midpoint), which also prompts that the causal variants located at the TSS or H3K27ac peak midpoint have a high probability of being causal. High sharing (π₁) from promoter-acQTLs to eQTLs indicates that gene expression and promoter activity are under the same genetic regulation. The lower π₁ value in the direction from eQTLs to promoter-acQTLs suggests that RNA regulation may originate from multiple mechanisms, such as RNA processing. The sharing pattern is conserved in different species³⁰.

In mouse neurons, massively parallel reporter assays demonstrated the sufficiency of promoters to independently initiate transcription while enhancers stimulate transcriptional initiation in a promoter-dependent manner⁶⁰. Analogously, our study also revealed that putative regulatory elements controlling gene expression prefer to be promoters by high enrichment of eQTLs in promoters, similar distribution between eQTLs and promoter-acQTLs, and significant colocalization between promoter-acQTLs and eQTLs.

The variance decomposition showed that genetic factors are the primary determinants of gene expression, in contrast to the view that histone marks play a causal role in transcription. In mouse embryonic stem cells, mutations in genes encoding H3.3 transfer lysine 27 to arginine, preventing H3.3K27 from being acetylated³⁴. Despite the dramatically reduced H3K27ac signals in enhancers, enhancer activity remains unchanged and gene expression is barely affected. In another study on K562 cells, rapid loss of H3K27ac was observed after blocking transcription initiation, indicating that H3K27ac serves as a supportive mark in transcription³⁵. In the causal inference between H3K27ac and gene expression, 46% of H3K27ac peak-gene pairs that share common QTLs were inferred to be independent, while only 4% supported a causal role of H3K27ac on gene expression. This result supported that a large proportion of H3K27ac is the proxy for regulatory elements rather than the driver.

We identified a likely causal variant 6_165830307 for dulcitol with the help of the resources of this study. The variant 6_165830307 inside peak chr6:165828531-165836912 is significantly associated with dulcitol level and colocalizes with the acQTL for peak chr6:165828531-165836912 and the eQTL for gene AKR1A1. When the allele of variant 6_165830307 was altered from the reference allele to the alternative allele, three TFs (TP73, TP63, and TP53) were predicted to bind to the GWAS locus for initial transcription. However, the lack of a sufficient number of heterozygous individuals for variant 6_165830307 hampers further validation through the allelic imbalance of H3K27ac. Taken together, the alternative allele of variant 6_165830307 may increase the activity of regulatory element chr6:165828531-165836912 by enhancing the binding ability of TFs, resulting in the elevated expression of gene AKR1A1 and high dulcitol level. Our dataset can favor identifying extra GWAS loci with modest effect sizes. In GWAS for PC(16:0/16:0), two independent GWAS loci were linked to the same peaks by acQTLs. PLD1 harboring the peaks was highlighted as a strong causal gene affecting PC(16:0/16:0). PLD1 encodes a PC-specific phospholipase involved in PC metabolism⁴³, indicating its direct relationship with PC(16:0/16:0) phenotype in biological process. Our dataset can also aid in the prioritization of variants and genes for published GWAS signals, such as hematological and growth-related traits, which reinforces the utility of our resource. For example, our result suggests variant 7_32054693 to be a promising candidate for MCHC, likely functioning through the gene BNIP5 and its promoter peak chr7:32045442-32062909.

Collectively, this study expands the H3K27ac atlas, acQTLs, and eQTLs dataset in pig liver, shedding light on the impact of genetic variants on both H3K27ac and gene expression. This resource will aid in dissecting molecular mechanisms underlying liver-related traits, thus facilitating GWAS fine-mapping.

Methods

Ethics statement

All procedures involving animals followed the guidelines for the care and use of experimental animals established by the Ministry of Agriculture of China. The ethics committee of Jiangxi Agricultural University specifically approved this study.

Samples

All the liver samples were derived from the sixth (F6) generation pigs from a heterogeneous population generated by crossing eight founder breeds including four aboriginal Chinese breeds (Erhualian, Laiwu, Bama Xiang, and Tibetan) and four highly selected international commercial breeds (Duroc, Large White, Landrace, and Pietrain). The population was kept through a rotation mating scheme to acquire an equal mixture of genetic material from eight founder breeds. Feeding conditions were the same for all F6 pigs. Castration was performed on male individuals on day 90. All pigs were slaughtered at 240 ± 10 days of age. Samples were immediately collected and transformed to liquid nitrogen, and then stored at −80 °C until use. Liver samples were obtained from the left lobe of the liver.

DNA extraction and genotyping

Genomic DNA was obtained from frozen muscle tissue using a phenol-chloroform-based DNA extraction protocol. DNA quality control was performed according to DNA concentrations and length by Nanodrop-1000 and agarose (0.8%) gel electrophoresis. Next, DNA was fragmented into 300-400 bp pieces. After adenylation and indexed ligation, the cDNA library was amplified by PCR using Phusion High-Fidelity DNA polymerase (NEB, USA). The sequencing was completed on Illumina X-10 instruments (Illumina Inc., San Diego, CA) with a 2 × 150 bp paired-end strategy. The low-quality raw reads were removed according to the following criteria: (1) the percentage of N base contents >10%; (2) the percentage of quality score ≤20 bases >50%. After removing low-quality and short reads from raw DNA fastq files, clean fastq files were aligned to the Sscrofa11.1 reference genome using BWA (v0.7.17)⁶¹. Subsequently, sort and index bam files using Samtools (v1.9)⁶². Individual genotypes were acquired using Platypus (v0.8.1)⁶³. VCF files from each sample were merged into a single VCF file using PLINK (v1.9)⁶⁴. Next, imputed the missing genotypes with Beagle (v0.40)⁶⁵.

mRNA extraction and sequencing

Total RNA was isolated using TRIzol™ (Invitrogen, USA) from 256 pig livers, including 146 females and 110 males. The integrity and purity of RNA were tested by an eNanoPhotometer® spectrophotometer (IMPLEN, USA) and a Bioanalyzer 2100 system (Agilent Technologies, USA). Next, mRNA was enriched by poly-T oligo-attached magnetic beads in NEBNext® UltraTMR NA Library Prep Kit for Illumina (NEB, USA). Poly(A) + mRNA was then fragmented and used for strand-specific cDNA library construction. The cDNA was purified, end-repair, A-tailing, adapter ligation, and size selection using AMPure XP beads. The sequencing was performed on the Illumina Novaseq 6000 platform using a 150-bp paired-end strategy. The low-quality raw reads were removed if the percentage of N base contents was >10% or the percentage of Q ≤ 5 bases was >50%.

ChIP-Seq experiments

Chromatin immunoprecipitation followed by sequencing was performed using the SimpleChIP Plus Enzymatic Chromatin IP Kit (Magnetic Beads) (CST, USA). Pig liver tissue from 172 females and 120 males were collected. In brief, ~200 mg of liver tissue was minced in 1 mL of PBS and cross-linked with 1% formaldehyde for 10 min, followed by quenching with glycine and lysis in the buffer. The cross-linked chromatin was sonicated to produce fragments of 100-300 bp, with 10 μL of the solution reserved as input. The remaining chromatin was then immunoprecipitated with an H3K27ac antibody (active motif, 39133), purified using magnetic beads and a column, and subjected to DNA sequencing with corresponding input samples using an Illumina HiSeq 2500 in a single-end model. The raw reads were filtered to remove reads containing the following: (i) contaminated adapter sequences; (ii) more than half the bases with Phred quality scores below 19; and (iii) >5% ambiguous or undetermined (N) bases.

RNA-seq data processing

Clean reads were aligned to the Sscrofa11.1 reference genome using STAR (v2.7.1a)⁶⁶. We kept reads with MAPQ value 255 using Samtools Samtools (v1.9). Stringtie (v1.3.6)⁶⁷ was used to assemble transcripts with the version 1.98 pig GTF file from Ensembl database using the -e parameter and merged GTF files from each sample into a non-redundant set of transcripts. Quantification of genes was performed using FeatureCounts (v1.5.3)⁶⁸.

ChIP-seq data processing

Clean reads were mapped to the Sscrofa11.1 reference genome using BWA (v0.7.17)⁶⁹. Uniquely mapped reads were obtained using Sambamba (v0.8.1)⁷⁰, and duplicates were removed using Picard (v1.119, https:// broadinstitute.github.io/picard). To assess library complexity based on ENCODE ChIP-seq Standards, PCR Bottlenecking Coefficient 1 (PBC1), PCR Bottlenecking Coefficient 2 (PBC2), and Non-Redundant Fraction (NRF) were calculated and summarized in Supplementary Data 1. Samples meeting the criteria were used to call peaks with MACS2 (v2.1.1)⁷¹, using input data as the control. The fraction of reads in peaks was calculated for each sample, and a threshold of 1% was utilized to refine the sample set. DiffBind package implemented in R software was used to identify consensus peaks with peaks presenting in at least 3 samples. The reads coverage was calculated using Bedtools (v2.27.0)⁷², and peaks were retained if the log2 reads per million (log2RPM) was >0 in at least 3 samples, yielding 91,011 raw peaks. To ensure consistency between acQTL and eQTL mapping analyses, FPM (fragments per million, similar to transcript per million from RNA) was used to represent the activity of H3K27ac peaks, and 90,991 consensus peaks satisfied the further filtering criteria from the GTEx project. The 90,991 peak list was supplied with Supplementary Data 16.

Quality control for samples

We used verifyBamID (v2.0.1)⁷³ with Bam files and corresponding genotype files as input to verify sample ID and removed samples when they were predicted to be swapped or contaminated. To eliminate the impact of RNA degradation, RSeQC (v2.6.4)⁷⁴ was used to show the coverage profile along the gene body from Bam files, we removed samples that displayed large bias to the 3’ end, and left 256 RNA samples manually for subsequent analysis.

Regulatory elements identified by H3K27ac used for overlapping with that of this study

The predefined regulatory elements using pig liver H3K27ac ChIP-seq data were conducted by Kern et al.¹³. Samples were obtained from two castrated, sexually mature, adult male Yorkshire littermate pigs. The data are available at http://farm.cse.ucdavis.edu/~ckern/Nature_Communications_2020/.

Chromatin states used for overlapping with regulatory elements of this study

The predefined chromatin states of pigs were obtained from two published independent studies. One was conducted by Kern et al.¹³, and they defined 14 distinct chromatin states by utilizing five epigenetic marks (H3K4me3, H3K27ac, H3K4me1, CTCF, H3K27me3) in pig livers. The data are available at http://farm.cse.ucdavis.edu/~ckern/Nature_Communications_2020/. Another was conducted by Pan et al.¹⁴, and they employed five epigenetic marks (H3K4me3, H3K27ac, H3K4me1, ATAC, H3K27me3) to characterize 15 distinct chromatin states in pig livers. The data was available at https://figshare.com/articles/dataset/6_type_of_regulator_hg19_zip/13480425?file=25875270⁷⁵.

Raw sequencing reads for H3K27ac ChIP-seq used for validating the allelic imbalance of H3K27ac

Raw sequencing data for H3K27ac ChIP-seq from 24 pig liver samples were obtained from three independent studies. Two raw data were from Kern et al.¹³ and deposited in the Gene Expression Omnibus (GEO) under accession GSE158430. Eight raw data were from Zhao et al.¹² and deposited in the NCBI database under accession number PRJNA597497. Fourteen raw data were from our previous study¹¹.

Hi-C matrix

Pig liver Hi-C contact matrix data with 40 kb resolution were generated by Foissac et al.²⁸, using samples from two male and two female Large White pigs. The data is accessible at the Functional Annotation of Animal Genomes (FAANG) data portal (https://www.fragencode.org/results.html).

Transcription factors motifs

Position weight matrices of transcription factor binding motifs were collected from the MEME Suite motif database⁷⁶, including human, mouse, and mammalian from various sources. The motif matrix data are available at https://meme-suite.org/meme/db/motifs.

Characterizing H3K27ac peaks

To classify the type of peaks, ChIPseeker (v1.12.1)⁷⁷ package implemented in R software was utilized with the version 1.98 pig GTF file from Ensembl database. Specifically, peaks located within the 1 kilobase (kb) range distance from the transcription start site (TSS) were identified as promoters, while those located outside of this range were defined as enhancers. The identification of super-enhancers was performed using ROSE (v1.3.1)⁷⁸, and consensus super-enhancers were generated if they present in at least 3 samples using DiffBind (v2.10.0)⁷⁹ package implemented in R software. The biological coefficient of variation (BCV) calculated by edgeR (v2.2.6)⁸⁰ package implemented in R software was employed to determine the variation of the peaks across the samples. The tagwise dispersion for each peak was calculated.

Identifying polyadenylated eRNA

Stringtie (v1.3.6)⁶⁷ was employed to identify genes without the -e parameter to produce a GTF file without annotated transcripts. Subsequently, genes identified overlapping with distal intergenic and downstream peaks were regarded as candidate enhancer RNAs (eRNAs). To confirm the identity of eRNAs, we performed permutation tests on the annotated genes, based on the number of eRNAs, generating 1000 permutations. Our analysis revealed a P-value of 9.99 × 10⁻⁴ for both the exon number and length of the eRNAs.

Heritability estimates for H3K27ac peaks

We employed the GREML-LDMS-I method^81,82 for heritability estimation. For each peak, we estimated cis-heritability using variants within ±1 Mb, while trans-heritability was computed using variants located beyond ±5 Mb. To summarize, we first calculated the segment-based LD score with a 200 Kb window size. SNPs were then stratified into four groups according to LD score quartiles. Following this, GRMs were generated for each SNP group and utilized to estimate heritability.

Expression QTL (eQTL) mapping

The cis-eQTL analysis was conducted following the Genotype-Tissue Expression (GTEx) project version 8 protocol⁸³ using the wrapper script. Raw counts matrix and transcripts per million (TPM) values were prepared, and genes with low expression were filtered using the default parameters “--tpm_threshold 0.1 --count_threshold 6 --sample_frac_threshold 0.2”. After filtering, 15,509 genes were subjected to the trimmed mean of M-values (TMM) normalization and inverse normal transformation. PEER (probabilistic estimation of expression residuals) factors represent unmeasured and unknown confounders in eQTL mapping, which can be predicted by PEER software (v1.3)⁸⁴. The number of PEER factors was set to 45 as recommended (as detailed at https://github.com/broadinstitute/gtex-pipeline/tree/master/qtl). Covariates such as slaughterAge, transportBatch, RIN_value, gender, uniquely mapped reads, three principal components from genotypes, and 45 PPER factors were adjusted using the limma removeBatch-Effects function (v3.38.3)⁸⁵. The modified FastQTL (v6p)⁸³ provided by the GTEx project was used with the parameters “--window 1e6 --permute 1000 --maf_threshold 0.01 --ma_sample_threshold 10” to scan the variants within 1 Mb range from TSS and generate a genome-wide empirical P-value threshold for each gene. The empirical P-values were adjusted for multiple testing, and a false discovery rate (FDR) threshold of 0.05 was used to produce a nominal threshold for each gene. The cis-eQTL was determined by following a fine-mapping analysis for each gene.

Trans-eQTL analysis was performed using QTLtools (v1.3.1)⁸⁶ for variants located >5 Mb apart. Permutation was applied with the parameters “--sample 1000” and the false discovery rate (FDR) was set to 0.05 to adjust for multiple testing. The trans-eQTL with the highest level of significance was selected for each gene in each chromosome. To determine the reliability of the trans-eQTL mapping results, we employed a mixed linear model to identify trans-eQTLs with the fastGWA tool⁸⁷ in GCTA (v1.9.0) software. The results showed that 99% of trans-eQTLs reach the empirical significance threshold of 5 × 10⁻⁸, indicating the robustness of trans-eQTLs identification. To eliminate the potential impact of the sex chromosomes, only eQTLs from autosomes were retained.

H3K27ac quantitative trait loci (acQTLs) mapping

The main analysis was similar to that of eQTLs mapping, and the 90,991 consensus peaks were further utilized as molecular phenotypes. Raw counts matrix and fragment per million (FPM, similar to TPM from RNA) values were prepared, and 90,991 peaks were retained with parameters “--tpm_threshold 0.1 --count_threshold 6 --sample_frac_threshold 0.2”. Similar to eQTL mapping, PEER factors were predicted based on H3K27ac signals and the number of factors was also set to 45. Subsequently, covariates such as slaughterAge, transportBatch, gender, uniquely mapped reads, three principal components from genotypes, and 45 PPER factors were adjusted. Cis-acQTL analysis was performed using FastQTL and all variants within a 1 Mb distance from the first base of peaks were utilized. For trans-acQTL analysis, QTLtools were used and all variants outside of the 5 Mb distance were included. Also, only autosomal acQTLs were kept. FastGWA tool further confirmed 99% of trans-acQTLs.

Fine-mapping analysis

The CAVIAR (v2.2)⁸⁸ software was employed for the fine-mapping of variation signals. CAVIAR utilized both the correlation statistical results and the LD information to model and infer the posterior probabilities (PPs) that a variant was causal. The variants were ranked based on their PPs given by CAVIAR in descending order, and the variant sets with cumulative PPs no larger than 0.95 were considered credible variants. To represent a cis-QTL, the variant with the highest PPs was chosen.

Allelic imbalance analysis of acQTLs

Only acQTLs that were located within the target peaks and had PPs exceeding 0.9 were included. For raw sequencing data for H3K27ac ChIP-seq from other studies, Platypus software was utilized to genotype variants. To mitigate the effects of mapping bias of reads, we employed the WASP (v0.3.4) software developed by van de Geijn et al.²⁶ to remove reads exhibiting allele-biased mapping. Briefly, WASP was used to flag reads that should be remapped due to overlapping with genetic variants. The alleles of variants were then computationally swapped, and the reads were remapped to determine if they would still be aligned to the original location. After alleles swap, reads with unchanged mapping genome positions were retained. Heterozygous individuals were selected for each acQTL, and the ASEReadCounter⁸⁹ function from the Genome Analysis Toolkit (GATK, v.4.2) was employed to quantify allele coverage. Further comparison analysis between the reference and alternative alleles was conducted only on acQTLs that had a minimum of 8 supporting reads and were observed in at least 3 heterozygous individuals.

Distribution of QTLs

The genomic distribution of all genome variants was analyzed using ChIPseeker package. The variants were classified into seven types based on their location, including 3’UTR, 5’UTR, distal intergenic, exon, intron, promoter peak, and enhancer peak. Cis-QTLs were grouped according to different posterior probability intervals. Cis-acQTLs were further subdivided into two groups, based on their association with either the promoter or enhancer H3K27ac signals, referred to as promoter-acQTLs and enhancer-acQTLs. The ratio of the seven variant types was calculated for each group, and the fold change values were calculated as the ratio of cis-QTLs divided by the ratio of all variants.

Physical contact region enrichment analysis

We obtained an interaction matrix from Hi-C with a 40 Kb window size from the pig liver²⁸ and filtered the interaction bins to include only those with at least 3 supporting contact reads. To avoid confounding effects arising from low contact between different chromosomes, we restricted our analysis to include only acQTLs and target peaks residing on the same chromosome. We calculated the number of contacted bins encompassing acQTLs and target peaks. We then separated all the contact bins into two groups based on distance, i.e., ≤1 Mb and ≥5 Mb, and treated them as control pairs. Finally, we performed a hypergeometric test for both groups.

Transcription factor prediction

Only acQTLs that were located within target peaks were considered. The primary methodology was derived from a prior study⁹⁰. The process involved extraction of the acQTLs with surrounding 25 bp sequences and utilizing FIMO (v4.11.2)⁹¹ with parameter “—bfile --uniform-- --norc --max-strand” to predict TFBS (transcription factor binding site) on both the reference and alternative sequences. Motif PWMs of TFs from human and mouse were downloaded from the MEME suite website⁹². The impact of acQTLs on TF binding was categorized as either perturbation in binding affinity or loss/gain of binding.

Estimating sharing of QTLs

To estimate how QTLs are shared between the H3K27ac levels at the promoter and corresponding gene expression, we utilized the π₁ statistic (qvalue)⁹³. In essence, we selected genes with H3K27ac signals in the promoter region that had corresponding acQTLs, calculated the association P-value between the promoter-acQTL and gene expression, and estimated the enrichment of low P-value via π₁ estimation. We repeated the same procedure in reverse.

Variance decomposition of gene expression

We employed a linear mixed model from LIMIX (v2.0.4) to investigate the contributions of genome variation and H3K27ac to gene expression variability^94,95. The model is as follows:

$${{{{{\rm{y}}}}}}=N(1\mu ,{\sigma }_{l}^{2}{K}_{l}+{\sigma }_{g}^{2}{K}_{g}+{\sigma }_{h}^{2}{K}_{h}+{\sigma }_{e}^{2}I)$$

Where y represents the gene expression levels across all samples, 1μ represents an offset term, ${K}_{l}$ is relatedness matrix built by cis genetic variants or H3K27ac signals, ${K}_{g}$ represents a relationship matrix considering all variants and ${\sigma }_{e}^{2}I$ is the noise term. ${K}_{h}$ represents expression heterogeneity and was calculated using the equation ${K}_{h}$ = (1/G)Z${Z}^{T}$, in which Z is the N × G gene expression matrix for N samples and G genes. Subsequently, the proportion of gene expression variability explained by genetic variants or H3K27ac signals was calculated as follows:

$$h=\frac{{\sigma }_{l}^{2}}{{\sigma }_{l}^{2}+{\sigma }_{g}^{2}+{\sigma }_{h}^{2}+{\sigma }_{e}^{2}}$$

In brief, the main variance component considered genetic variants and H3K27ac in our study. Firstly, only the genome variants or H3K27ac within 1 Mb from the gene body were considered independently in the model and the proportion of expression variance explained by them was computed. Next, a joint model across genome variants within 100 kb of H3K27ac peaks and H3K27ac was performed to account for the impact of variants, and the variance explained by H3K27ac alone was calculated.

Causal inference for H3K27ac and gene expression

We employed the Intersection-Union Test³³ to infer the causal relationship between H3K27ac and gene expression, taking into account genetic information. The causal inference test (CIT) is a mediation-based method introduced by Millstein et al.³³, which examines the hypothesis that a potential causal mediator (G, such as H3K27ac signal) mediates a causal association between a genetic locus (L) and a quantitative trait (T, such as gene expression). Causality (from genetic variants to the mediator to the trait) can be inferred if four conditions are met:

(1)
L and G are associated
(2)
L and T are associated
(3)
L is associated with G, given T
(4)
L is independent of T, given G

A total of 1900 candidate L/G/T trios meeting the first two conditions, obtained from peak-gene colocalization analysis, were used for CIT, which can test the strength of a chain of mathematical conditions that as a set are consistent with causal mediation. The Intersection-Union Test framework³³ is used to compute an omnibus P-value for the suite of conditions that would function as CIT. For each particular trio with genotype and gene/H3K27ac levels, CIT outputs omnibus P-values of a causal model (genetic variants → H3K27ac signals → gene expression; pCausalCIT) and a reactive model (genetic variants → gene expression → H3K27ac signals; pReactiveCIT), which represent the highest P-value (i.e., minimal significance) among the four component tests. The CIT predicted casual direction when pCausalCIT <0.05 and pReactiveCIT >0.05 (Type1), and reactive direction when pCausalCIT >0.05 and pReactiveCIT <0.05 (Type2). Trios with pCausalCIT >0.05 and pReactiveCIT >0.05 were considered independent (Type3). The CIT makes no call if pCausalCIT <0.05 and pReactiveCIT <0.05 (Type0).

Genome-wide association study

The metabolites of pig liver were derived from existing databases of our laboratory. In general, metabolite levels were determined with Ultra-performance liquid chromatography (UPLC) and analyzed with Analyst 1.6.3 software. Covariates for metabolites, such as slaughterAge, transportBatch, and gender, were adjusted using the lm() function implemented R program. The simple linear mixed model from Genome-wide Efficient Mixed-Model Association (GEMMA, v.0.97)⁹⁶ was employed for further genetic association analyses.

Colocalization analyses

To search for pleiotropic effects of acQTLs, we calculated the pairwise linkage disequilibrium (LD) score (r²) between lead variants within 500 kb using PLINK (v1.9)⁶⁴. An acQTL is considered to regulate multiple peaks if its LD score with another acQTL is >0.8. The eQTLs were examined identically.

To identify the peak-gene pairs, we calculated the pairwise LD score between acQTLs and eQTLs within a 500 kb range. We obtained peak-gene pairs if the LD score was >0.8. We employed the Bayesian test implemented in COLOC software (v5) to assess colocalization³¹. All variants within a ± 1 Mb from the lead variants of eQTLs and acQTLs were intersected and used for colocalization. The threshold of posterior probabilities of H4 (Association with H3K27ac peaks and gene expression, one shared genetic variant) (PP4) was set to 0.8.

Integration analysis utilizing published GWAS variants from the ISwine dataset

All GWAS variants for pig phenotypes were obtained from the ISwine dataset⁴⁴ (http://iswine.iomics.pro/). The eleven categories of phenotypes were as follows: behavioral, blood, disease, exterior, fat, growth, meat, muscle, physiochemical, reproduction, and slaughter. We removed variants located on sex chromosomes, resulting in 15,622 GWAS variants for 498 phenotypes. In addition, LD scores were calculated between GWAS variants and eQTLs/acQTLs of peak-gene pairs derived from colocalization analysis. Phenotypes were linked to genes or peaks with an LD score threshold of 0.8, resulting in 297 GWAS variants for 64 phenotypes.

Statistics and reproducibility

Thorough descriptions of the statistical analyses applied in this study are provided in the respective sections of Methods. We utilized 292 samples for peak calling, 256 samples for gene identification, and 321 samples for GWAS. To determine the statistical significance of QTLs associated with peak activity or gene expression, we performed linear regression tests followed by permutation testing using the QTLtools/FastQTLs software suite. H3K27ac ChIP-seq data from 24 pig liver samples were obtained from three independent studies to perform allelic imbalance analysis. To validate acQTLs, t-tests comparing read coverage between alternative and reference alleles were performed, with the resulting P-values used to assess statistical significance. The colocalization among different molecular phenotypes was determined by linkage disequilibrium score, as well as posterior probabilities of H4 from Coloc software. Spearman’s correlation was utilized for testing the correlation among different molecular phenotypes.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All RNA-seq data and ChIP-seq data were publicly available in the GSA database under accession numbers CRA014924, CRA014923 and CRA014930. All genotype data were publicly available at the GVM²⁰. Source data for graphs and charts were available at Figshare (https://doi.org/10.6084/m9.figshare.25239307.v1)⁹⁷. GWAS results for the metabolism of phosphatidylcholine (PC) (16:0/16:0) and dulcitol were available at Figshare (https://doi.org/10.6084/m9.figshare.25264963.v1)⁹⁸.

Code availability

The codes for ChIP-seq analysis, RNA-seq analysis, Peak calling, QTL mapping, WASP mapping, Variance decomposition and Casual inference are available from the GitHub repository (https://github.com/lingziqi8278/pig-omics-project/)⁹⁹.

References

Rui, L. Energy metabolism in the liver. Compr. Physiol. 4, 177 (2014).
Article PubMed PubMed Central Google Scholar
Chang, X. et al. Quantitative proteomic analysis of Yorkshire pig liver reveals its response to high altitude. J. Agric. Food Chem. 71, 7618–7629 (2023).
Article CAS PubMed Google Scholar
Zhao, Y. et al. Transcriptome analysis reveals that vitamin a metabolism in the liver affects feed efficiency in pigs. G3 6, 3615–3624 (2016).
Article CAS PubMed PubMed Central Google Scholar
Grum, D., Drackley, J. & Clark, J. Fatty acid metabolism in liver of dairy cows fed supplemental fat and nicotinic acid during an entire lactation. J. Dairy Sci. 85, 3026–3034 (2002).
Article CAS PubMed Google Scholar
Weber, C. et al. Hepatic gene expression involved in glucose and lipid metabolism in transition cows: effects of fat mobilization during early lactation in relation to milk performance and metabolic changes. J. Dairy Sci. 96, 5670–5681 (2013).
Article CAS PubMed Google Scholar
Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 59 (2021).
Article CAS Google Scholar
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).
Article CAS PubMed PubMed Central Google Scholar
Caliskan, M. et al. Genetic and epigenetic fine mapping of complex trait associated loci in the human liver. Am. J. Hum. Genet. 105, 89–107 (2019).
Article CAS PubMed PubMed Central Google Scholar
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).
Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl Acad. Sci. USA 107, 21931–21936 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Y. et al. Mapping and analysis of a spatiotemporal H3K27ac and gene expression spectrum in pigs. Sci. China Life Sci. 65, 1517–1534 (2022).
Article CAS PubMed Google Scholar
Zhao, Y. et al. A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome. Nat. Commun. 12, 2217 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kern, C. et al. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat. Commun. 12, 1821 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pan, Z. et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat. Commun. 12, 5848 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Article CAS PubMed Google Scholar
Currin, K. W. et al. Genetic effects on liver chromatin accessibility identify disease regulatory variants. Am. J. Hum. Genet. 108, 1169–1189 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hah, N. et al. Inflammation-sensitive super enhancers form domains of coordinately regulated enhancer RNAs. Proc. Natl Acad. Sci. USA 112, E297–E302 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sartorelli, V. & Lauberth, S. M. Enhancer RNAs are an important regulatory layer of the epigenome. Nat. Struct. Mol. Biol. 27, 521–528 (2020).
Article CAS PubMed PubMed Central Google Scholar
Han, Z. & Li, W. Enhancer RNA: what we know and what we can achieve. Cell Prolif. 55, e13202 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yang, H. et al. ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs. Nature 606, 358–367 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Subcutaneous and intramuscular fat transcriptomes show large differences in network organization and associations with adipose traits in pigs. Sci. China Life Sci. 64, 1732–1746 (2021).
Article CAS PubMed Google Scholar
Keele, G. R. et al. Integrative QTL analysis of gene expression and chromatin accessibility identifies multi-tissue patterns of genetic regulation. PLoS Genet 16, e1008537 (2020).
Article PubMed PubMed Central Google Scholar
Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).
Article CAS PubMed PubMed Central Google Scholar
McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pelikan, R. C. et al. Enhancer histone-QTLs are enriched on autoimmune risk haplotypes and influence gene expression within chromatin networks. Nat. Commun. 9, 2905 (2018).
Article PubMed PubMed Central Google Scholar
Van De Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Article PubMed PubMed Central Google Scholar
Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503, 487–492 (2013).
Article CAS PubMed PubMed Central Google Scholar
Foissac, S. et al. Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biol. 17, 108 (2019).
Article CAS PubMed PubMed Central Google Scholar
Delaneau, O. et al. Chromatin three-dimensional interactions mediate genetic effects on gene expression. Science 364, eaat8266 (2019).
Article CAS PubMed Google Scholar
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10, e1004383 (2014).
Article PubMed PubMed Central Google Scholar
Chandra, V. et al. Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nat. Genet. 53, 110–119 (2021).
Article CAS PubMed Google Scholar
Millstein, J., Zhang, B., Zhu, J. & Schadt, E. E. Disentangling molecular relationships with a causal inference test. BMC Genet. 10, 23 (2009).
Article PubMed PubMed Central Google Scholar
Zhang, T., Zhang, Z., Dong, Q., Xiong, J. & Zhu, B. Histone H3K27 acetylation is dispensable for enhancer activity in mouse embryonic stem cells. Genome Biol. 21, 45 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z. et al. Prediction of histone post-translational modification patterns based on nascent transcription data. Nat. Genet. 54, 295–305 (2022).
Article CAS PubMed PubMed Central Google Scholar
Selvarajan, I. et al. Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease. Am. J. Hum. Genet. 108, 411–430 (2021).
Article CAS PubMed PubMed Central Google Scholar
Unakar, N. J., Tsui, J. Y. & Johnson, M. J. Effect of aldose reductase inhibitors on lenticular dulcitol level in galactose fed rats. J. Ocul. Pharm. Ther. 8, 199–212 (1992).
Article CAS Google Scholar
Koch, T. K., Schmidt, K. A., Wagstaff, J. E., Ng, W. G. & Packman, S. Neurologic complications in galactosemia. Pediatr. Neurol. 8, 217–220 (1992).
Article CAS PubMed Google Scholar
Schadewaldt, P. et al. Renal excretion of galactose and galactitol in patients with classical galactosaemia, obligate heterozygous parents and healthy subjects. J. Inherit. Metab. Dis. 26, 459–479 (2003).
Article CAS PubMed Google Scholar
Stambolian, D. Galactose and cataract. Surv. Ophthalmol. 32, 333–349 (1988).
Article CAS PubMed Google Scholar
Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).
Baca, S. C. et al. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nat. Genet. 54, 1364–1375 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bowling, F. Z. et al. Crystal structure of human PLD1 provides insight into activation by PI(4,5)P2 and RhoA. Nat. Chem. Biol. 16, 400–407 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model. Commun. Biol. 3, 502 (2020).
Article PubMed PubMed Central Google Scholar
Theurl, I. et al. On-demand erythrocyte disposal and iron recycling requires transient macrophages in the liver. Nat. Med. 22, 945–951 (2016).
Article CAS PubMed PubMed Central Google Scholar
Anderson, E. R. & Shah, Y. M. Iron homeostasis in the liver. Compr. Physiol. 3, 315 (2013).
Article PubMed PubMed Central Google Scholar
Lopez-Perez, A., Remeseiro, S. & Hornblad, A. Diet-induced rewiring of the Wnt gene regulatory network connects aberrant splicing to fatty liver and liver cancer in DIAMOND mice. Sci. Rep. 13, 18666 (2023).
Article CAS PubMed PubMed Central Google Scholar
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vangala, P. et al. High-resolution mapping of multiway enhancer-promoter interactions regulating pathogen detection. Mol. Cell 80, 359–373. e358 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hariprakash, J. M. & Ferrari, F. Computational biology solutions to identify enhancers-target gene pairs. Comput. Struct. Biotechnol. J. 17, 821–831 (2019).
Article CAS PubMed PubMed Central Google Scholar
Castelijns, B. et al. Recently evolved enhancers emerge with high interindividual variability and less frequently associate with disease. Cell Rep. 31, 107799 (2020).
Article CAS PubMed Google Scholar
Choi, J. et al. Evidence for additive and synergistic action of mammalian enhancers during cell fate determination. eLife 10, e65381 (2021).
Article CAS PubMed PubMed Central Google Scholar
Osterwalder, M. et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature 554, 239–243 (2018).
Article CAS PubMed PubMed Central Google Scholar
Donnard, E. et al. Comparative analysis of immune cells reveals a conserved regulatory lexicon. Cell Syst. 6, 381–394.e387 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pott, S. & Lieb, J. D. What are super-enhancers? Nat. Genet. 47, 8–12 (2015).
Article CAS PubMed Google Scholar
Reske, J. J., Wilson, M. R. & Chandler, R. L. ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation. Epigenetics Chromatin 13, 1–17 (2020).
Article Google Scholar
Miao, L. et al. The landscape of pioneer factor activity reveals the mechanisms of chromatin reprogramming and genome activation. Mol. Cell 82, 986–1002.e1009 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gorkin, D. U. et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature 583, 744–751 (2020).
Article CAS PubMed PubMed Central Google Scholar
Light, N. et al. Interrogation of allelic chromatin states in human cells by high-density ChIP-genotyping. Epigenetics 9, 1238–1251 (2014).
Article PubMed PubMed Central Google Scholar
Nguyen, T. A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Rimmer, A. et al. Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Browning, B. L. & Browning, S. R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Abuín, J. M., Pichel, J. C., Pena, T. F. & Amigo, J. BigBWA: approaching the burrows–wheeler aligner to big data technologies. Bioinformatics 31, 4003–4005 (2015).
Article PubMed Google Scholar
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, 1–9 (2008).
Article Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wang, L., Wang, S. & Li, W. J. B. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).
Article CAS PubMed Google Scholar
Pan, Z. Processed data for the article: 'Pig genome functional annotation enhances the biological interpretation of complex traits and human disease'. Figshare https://doi.org/10.6084/m9.figshare.13480425 (2020).
Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Article CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
Article CAS PubMed Google Scholar
Lovén, J. et al. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell 153, 320–334 (2013).
Article PubMed PubMed Central Google Scholar
Stark, R. & Brown, G. DiffBind: differential binding analysis of ChIP-Seq peak data. R package version 100, 4–3 (2011).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Article CAS PubMed Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).
Article PubMed PubMed Central Google Scholar
Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).
Article CAS PubMed Google Scholar
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Article CAS PubMed PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Klees, S., Heinrich, F., Schmitt, A. O. & Gültas, M. AgReg-SNPdb: A database of regulatory SNPs for agricultural animal species. Biology 10, 790 (2021).
Article PubMed PubMed Central Google Scholar
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME suite. Nucleic Acids Res. 43, W39–W49 (2015).
Article CAS PubMed PubMed Central Google Scholar
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Article CAS PubMed PubMed Central Google Scholar
Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).
Article CAS PubMed Google Scholar
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e1324 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ling, Z. Source data for the article: 'Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits'. Figshare https://doi.org/10.6084/m9.figshare.25239307.v1 (2024).
Ling, Z. GWAS results for the article: 'Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits'. Figshare https://doi.org/10.6084/m9.figshare.25264963.v1 (2024).
Ling, Z. All scripts for the article: 'Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits'. GitHub https://doi.org/10.5281/zenodo.10674065 (2023).

Download references

Acknowledgements

B.Y. is supported by the National Key Research and Development Program of China (2021YFF1000601). L.H. is supported by the National Natural Science Foundation of China (31790410) and the National Pig Industry Technology System (CARS-35).

Author information

These authors contributed equally: Ziqi Ling, Jing Li.

Authors and Affiliations

National Key Laboratory for Swine genetic improvement and production technology, Ministry of Science and Technology of China, Jiangxi Agricultural University, NanChang, Jiangxi Province, P.R. China
Ziqi Ling, Jing Li, Tao Jiang, Zhen Zhang, Yaling Zhu, Zhimin Zhou, Jiawen Yang, Xinkai Tong, Bin Yang & Lusheng Huang

Authors

Ziqi Ling
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yaling Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiawen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xinkai Tong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lusheng Huang: Project administration, Supervision, Funding acquisition, Resources, Writing—review and editing. Bin Yang: Supervision, Data curation, Methodology, Writing—review and editing. Ziqi Ling: Supervision, Formal Analysis, Methodology, Visualization, Writing—original draft. Jing Li: Formal Analysis, Visualization and Methodology. Tao Jiang and Zhen Zhang: Formal Analysis and Methodology. Yaling Zhu and Zhimin Zhou: Methodology. Jiawen Yang and Xinkai Tong: Writing—review and editing.

Corresponding authors

Correspondence to Ziqi Ling, Bin Yang or Lusheng Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Lingzhao Fang, Jianhai Chen, Martien Groenen for their contribution to the peer review of this work. Primary Handling Editors: John Mulley and George Inglis. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file

Supplementary Information

Description of additional supplementary files

Supplementary data 1-16

Reporting summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ling, Z., Li, J., Jiang, T. et al. Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits. Commun Biol 7, 381 (2024). https://doi.org/10.1038/s42003-024-06050-7

Download citation

Received: 09 September 2023
Accepted: 14 March 2024
Published: 29 March 2024
DOI: https://doi.org/10.1038/s42003-024-06050-7
Springer Nature Limited

Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits

Abstract

Similar content being viewed by others

Introduction

Results

Data description and annotation of regulatory elements

Identification of genetic variants associated with liver H3K27ac

Exploring the regulatory mechanism of acQTLs

Identification and characterization of eQTLs

Joint analyses of H3K27ac and transcriptome

Identification of functional regulatory elements, genes, and putative causal variants for metabolism-related molecular phenotype and published GWAS loci

Discussion

Methods

Ethics statement

Samples

DNA extraction and genotyping

mRNA extraction and sequencing

ChIP-Seq experiments

RNA-seq data processing

ChIP-seq data processing

Quality control for samples

Regulatory elements identified by H3K27ac used for overlapping with that of this study

Chromatin states used for overlapping with regulatory elements of this study

Raw sequencing reads for H3K27ac ChIP-seq used for validating the allelic imbalance of H3K27ac

Hi-C matrix

Transcription factors motifs

Characterizing H3K27ac peaks

Identifying polyadenylated eRNA

Heritability estimates for H3K27ac peaks

Expression QTL (eQTL) mapping

H3K27ac quantitative trait loci (acQTLs) mapping

Fine-mapping analysis

Allelic imbalance analysis of acQTLs

Distribution of QTLs

Physical contact region enrichment analysis

Transcription factor prediction

Estimating sharing of QTLs

Variance decomposition of gene expression

Causal inference for H3K27ac and gene expression

Genome-wide association study

Colocalization analyses

Integration analysis utilizing published GWAS variants from the ISwine dataset

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation