Visualization of GTEx Data

Overview of this study. (A) The study workflow. The GTEx v6 data set was used and annotated based on GENCODE v19, GRCh37.p13 (Dec 2013). (B-C) Anatomically close brain subregions cluster together based on the results of principal component analysis of gene expression for (B) all brain samples and for (C) all brain samples excluding cerebellum (Brain-1). (D) Demographics of the 549 study subjects. The “Others” class includes 2 American Indian or Alaska Native and 2 unknown. P-values were calculated using t-test (for age and BMI) or Chi-square test (for race).

Visualization code available here.

A network-based approach to eQTL interpretation and SNP functional characterization

Expression quantitative trait loci (eQTLs) analysis is commonly used to identify genetic variants affecting gene regulation. While such studies provide insight into genetic associations, they generally consider only cis-acting SNPs, and fail to address tissue-specific effects that might influence eQTL associations. Here we applied a meta-analysis approach that jointly analyzes cis- and trans-eQTLs. Using data from twelve tissue types from the Genotype-Tissue Expression (GTEx) project V6, we identified hundreds of thousands significant eQTLs in each tissue (FDR < 10%), among which 12% to 42% were trans-eQTLs. By representing the associations between SNPs and genes as links in a bipartite graph, we discovered that these eQTL networks organize into dense, highly modular communities, driven by biological processes. While some communities are enriched for processes relevant to the tissue of origin, other communities can be retrieved in several tissues and are enriched for general functions. We found that SNPs with high degree of centrality, representing global hubs in the networks, were completely depleted in GWAS association with traits and diseases. Contrary to the global network hubs, we found that local, community-specific network hubs are enriched for association with traits and diseases. These “core-SNPs” are also preferentially located in active chromatin regions, such as DNase hypersensitive regions with active transcription factor binding sites, further implicating a regulatory role. Our results lead to new hypotheses about how large numbers of weak- effect SNPs may work together to alter function and phenotype.

Please cite:

Maud Fagny, Joseph N. Paulson, Marieke L. Kuijjer, Abhijeet R. Sonawane, Cho-Yi Chen, Camila M. Lopes-Ramos, Kimberly Glass, John Quackenbush, and John Platig (2017). Exploring regulation in tissues with eQTL networks. PNAS published ahead of print August 29, 2017, doi:10.1073/pnas.1707375114

eQTL Community Analysis

Download

QTL Associations for given Gene or SNP in community:

GO terms for the community

Genes in the community

SNPS in the community

eQTL Community Analysis Legend

QTL Associations

  • RSID - SNP ID in dbSNP 142
  • SNP.CHR - SNP location in GRCh37.p13 (chromosome)
  • SNP.POSITION - SNP location in GRCh37.p13 (position)
  • ENSEMBL - ENSEMBL gene ID
  • HGNC - HGNC gene symbol
  • GENE.CHR - Gene position in GRCh37.p13 (chromosome)
  • BETA - eQTL coefficient (calculated using R lm() function)
  • T.STAT - Value of the t statistic
  • PVALUE - eQTL nominal p-value
  • P.ADJ - eQTL p-value corrected for multiple testing using Benjamini-Hochberg method
  • CIS.TRANS - Whether the eQTL has been found in cis or in trans.
  • QIK - SNP modularity
  • REGULOMEDB - SNP regulomeDB annotation [1]. As all our SNPs are eQTLs, we reclassified them (see below).
  • CHROMSTATE - Chromatin state at the SNP location. Obtained using the epigenomic roadmap core 15-state model data from the corresponding tissue 2.
  • REGULOMEDB
Category RegulomeDB annotation Description
A 1a or 2a TF binding + matched TF motif + matched DNase footprint + DNase peak
B 1b or 2b TF binding + any motif + DNase footprint + DNase peak
C 1c or 2c TF binding + matched TF motif + DNase peak
D 1d or 3a TF binding + any motif + DNase peak
E 1e or 3b TF binding + matched TF motif
F 1f or 5 TF binding + DNase peak
G 4 TF binding or DNase peak
H 6 Motif hit
I 7 No Information
  • CHROMSTATE
Tissue Epigenomic Roadmap Tissue ID
Adipose subcutaneous E063
Artery aorta E065
Fibroblast cell line E126
Esophagus mucosa E079
Heart left ventricle E095
Lung E096
Skeletal muscle E107
Whole blood E062

Gene Ontology

  • COMMUNITY - Community ID
  • GOID - Gene Ontology ID
  • TERM - Gene Ontology Term
  • ONTOLOGY - Ontology (BP = biological process, CC = cellular content, MF = molecular function)
  • PVALUE - Enrichment nominal p-value obtained using the Bioconductor R GOstats package [3].
  • P.ADJ - Enrichment p-value corrected for multiple testing using Benjamini-Hochberg method
  • ODDS.RATIO - Enrichment odds ratios
  • EXPECTED - Number of expected genes for the GO term
  • OBSERVED - Number of observed genes for the GO term
  • SIZE - Total number of genes for the GO term

Genes

  • ENSEMBL - ENSEMBL gene ID
  • HGNC - HGNC gene symbol
  • COMMUNITY - Gene community in eQTL network
  • GENE.CHR - Gene position in GRCh37.p13 (chromosome)
  • GENE.START - Gene position in GRCh37.p13 (gene start)
  • GENE.END - Gene position in GRCh37.p13 (gene end)

SNPs

  • RSID - SNP ID in dbSNP 142
  • COMMUNITY - SNP community in eQTL network
  • SNP.CHR - SNP location in GRCh37.p13 (chromosome)
  • SNP.POSITION - SNP location in GRCh37.p13 (position)

Bibliography

[1] Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A. and Kasowski, M. (2012). Annotation of functional variation in personal genomes using regulomedb. Genome Res 22.

[2] Roadmap Epigenomics Consortium (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330.

[3] Maud Fagny, Joseph N. Paulson, Marieke L. Kuijjer, Abhijeet R. Sonawane, Cho-Yi Chen, Camila M. Lopes-Ramos, Kimberly Glass, John Quackenbush, and John Platig (2017). Exploring regulation in tissues with eQTL networks. PNAS published ahead of print August 29, 2017, doi:10.1073/pnas.1707375114

Understanding tissue-specific gene regulation

Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue-specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue-specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue-network. However, they assume bottleneck positions due to changes in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue-specificity is driven by the creation of new regulatory paths, providing transcriptional control of tissue-specific processes.

PANDA Networks

Download

List of edge weights:

Target Gene HGNC: Target gene ID (HGNC ID); Target Gene ENSEMBL: Target gene ID (ENSEMBL ID) ; TF=Transcription factor ID (HGNC ID); In Prior: Equal 1 if edge was in prior and 0 otherwise; Other columns: edge weights for each tissue.

Below we provide the full tissue network provided as an R rds file.*

Download

*What is an rds file?