Here we show how we applied the network analysis tolkit to scggm-net, a network that is described in Sections 4.2 and S.3.2 of the supplement to the spacemap publication. This document shares virtually identical analytical steps as the network analysis of spaceMap’s prot-net and spaceMap’s RNA-net.

Input

Load the gene coordinate annotations for protein expressions and genomic CNA.

library(Biobase)
yinfo <- pData(featureData(readRDS(file = "data/prot-expression-set.rds")))
xinfo <- pData(featureData(readRDS(file = "data/cna-expression-set.rds")))

Load the Boot.Vote CNA-protein network.

library(Matrix)
scggm <- readRDS(file = "model-fits/scggm_boot.rds")
net <- scggm[c("yy","xy")]
rownames(net$xy) <- xinfo$id; colnames(net$xy) <- yinfo$id;
rownames(net$yy) <- yinfo$id; colnames(net$yy) <- yinfo$id;

Load the degree distributions for the ensemble of bootstrapped networks.

bdeg <- scggm[c("deg_xy", "deg_yy")]
names(bdeg) <- c("xy", "yy")
colnames(bdeg$xy) <- xinfo$id; colnames(bdeg$yy) <- yinfo$id;

Load the Gene Ontology mappings for enrichment analysis.

go2eg <- readRDS("data/prot-go-bp-to-entrez-gene-list.rds")
library(AnnotationDbi)
#human readable 
process_alias <- Term(names(go2eg))

Map Annotations

Convert the Boot.Vote network into an igraph object and map the annotations onto the network.

library(spacemap)
ig <- spacemap::adj2igraph(yy = net$yy, xy = net$xy, yinfo = yinfo, xinfo = xinfo)

If we query the attribute names of the nodes in the graph, we notice that the columns of xinfo and yinfo have been applied.

vertex_attr_names(graph = ig)
## [1] "name"        "alias"       "chr"         "start"       "end"        
## [6] "strand"      "description" "levels"

The igraph package has a number of ways to access the annotation information. For example, if we wish to confirm the chromosome location of GRB7, we can easily query:

vertex_attr(graph = ig, name = "chr", index = V(ig)[alias %in% "GRB7"])
## [1] "chr17"

Hub Analysis

We first prioritize the CNA- and protein- hubs. If the bdeg argument is specified, then we rank the hubs according to the average degree rank. Accordingly, the most highly ranked hubs will have the most consistently high degree across network ensemble.

To rank the protein nodes, use the rankHub command and simply specify the level = "y" argument.

ig <- rankHub(ig = ig, bdeg = bdeg$yy, level = "y")

To rank the CNA nodes, specify the level = "x" argument.

ig <- rankHub(ig = ig, bdeg = bdeg$xy, level = "x")

Alternatively, if the bdeg argument is not available, the ranking is according to degree in the final network (coded by “ig”).

tmp <- rankHub(ig = ig,  level = "x")

Identify cis and trans

Next label \(x-y\) edges as being regulated in cis or in trans. The GenomicRanges R package and the genomic coordinates chr,start,end columns of xinfo and yinfo are required for this step.

Now we can label the \(x-y\) edges with either “cis” or “trans” in the cis_trans edge attribute of ig.

ig <- cisTrans(ig = ig, level = "x-y", cw = 2e6)

Report

We then report a well-organized table, as seen in Table 3 of the spaceMap publication. The top argument allows us to control how many hubs are displayed.

xhubs <- reportHubs(ig, top = 40, level = "x")
vidx <- V(ig)[match(rownames(xhubs), V(ig)$name)]
m <- vertex_attr(graph = ig, name = "mean_rank_hub", index = vidx)
s <- vertex_attr(graph = ig, name = "sd_rank_hub", index = vidx)
st <- vertex_attr(graph = ig, name = "start", index = vidx)
end <- vertex_attr(graph = ig, name = "end", index = vidx)
rnk <- paste0(round(m, 2), " (", round(s,2), ")")
xhubs$`cis genes`
tosupp <- cbind(xhubs$hub, st, end, xhubs$`# cis/ # trans`, xhubs$`Potential # cis`, xhubs$`cis genes`, rnk)
library(xtable)
print(xtable(tosupp), include.rownames = FALSE)
hub # cis/ # trans Potential # cis cis genes
16q22.1-22.2 (68-71 Mb) 0 / 17 13
17q12 (38-38 Mb) 4 / 6 29 ERBB2, GRB7, PNMT, MIEN1
16q21-22.1 (64-68 Mb) 0 / 4 8
10p15.1-15.3 (0.42-4 Mb) 0 / 12 10
5p12-5q11.1 (45-50 Mb) 0 / 13 4
16q23.1-24.3 (78-89 Mb) 0 / 6 11
17p11.2 (20-20 Mb) 1 / 5 16 SHMT1
5q35.3 (180-180 Mb) 0 / 6 6
16p12.1-12.3 (19-27 Mb) 1 / 5 19 DCTPP1
17p11.2 (20-21 Mb) 0 / 10 16
16q12.1 (51-53 Mb) 0 / 1 6
15q13.1-15.1 (29-43 Mb) 1 / 1 20 IVD
1q21.2-23.1 (150-160 Mb) 0 / 1 47
17p11.2-12 (14-19 Mb) 0 / 5 17
16q22.1 (68-68 Mb) 0 / 8 8
19p13.3 (1.9-2.4 Mb) 0 / 2 16
16q22.2-22.3 (72-73 Mb) 0 / 1 10
16p11.2-12.1 (27-32 Mb) 1 / 0 18 DCTPP1
8q21.2-22.1 (87-96 Mb) 0 / 1 13
5q14.3 (88-89 Mb) 0 / 6 2
5q35.2-35.3 (180-180 Mb) 0 / 3 8
5q34 (160-170 Mb) 0 / 21 0
16q21 (63-64 Mb) 0 / 1 4
16q12.1 (49-51 Mb) 0 / 3 5
17q11.2 (28-29 Mb) 3 / 0 14 KRT23, BLMH, NSRP1
8q24.3 (140-150 Mb) 1 / 2 7 THEM6
8p11.23 (38-38 Mb) 0 / 1 9
11q22.2-22.3 (100-100 Mb) 0 / 4 4
17p11.2 (19-19 Mb) 0 / 4 16
17q21.32 (46-46 Mb) 2 / 2 15 MAPT, MAPT
17p11.2 (19-20 Mb) 0 / 1 16
11q22.1 (100-100 Mb) 0 / 2 3
9p13.3 (33-34 Mb) 0 / 1 9
11q14.3 (89-91 Mb) 0 / 1 5
17q12 (38-38 Mb) 2 / 3 29 GRB7, MIEN1
22q13.1-13.2 (40-41 Mb) 0 / 2 7
5q35.2 (170-180 Mb) 0 / 3 8
9p13.3 (34-36 Mb) 0 / 4 9
5q34 (170-170 Mb) 0 / 1 0
11p15.2 (13-14 Mb) 0 / 3 1

Similarly, we can report the top 10 protein hubs, their degrees in the final network, and a description of each hub, if the description column was specified in yinfo.

yhubs <- reportHubs(ig, top = 10, level = "y")
hub degree description
PLIN1 13 perilipin-1
ADH1B 10 alcohol dehydrogenase 1B
PDLIM7 8 PDZ and LIM domain protein 7 isoform 4
GPD1 13 glycerol-3-phosphate dehydrogenase
PDLIM7 11 PDZ and LIM domain protein 7 isoform 1
EHD2 23 EH domain-containing protein 2
KNG1 28 kininogen-1 isoform 1 precursor
KNG1 26 kininogen-1 isoform 2 precursor
ITIH4 19 inter-alpha-trypsin inhibitor heavy chain H4 isoform 2 precursor
ITIH4 17 inter-alpha-trypsin inhibitor heavy chain H4 isoform 1 precursor

GO-neighbor percentage

A CNA neighborhood comprises all protein nodes that are directly connected to a CNA hub by an edge. CNA neighborhoods represent direct perturbations to the proteome by amplifications or deletions in the DNA. To quantify their functional relevance, we compute a score called the GO-neighbor percentage. Two protein nodes are called GO-neighbors if they share a common GO term in the same CNA neighborhood. We postulate that a high percentage of GO-neighbors within a CNA neighborhood associates the CNA hub with more functional meaning. These scores can be generated with a GO mapping to the proteins as follows.

hgp <- xHubEnrich(ig = ig, go2eg = go2eg)

Module Analysis

There are many criteria to define modules of a network. This toolkit allows users to import the module membership information by themselves (see mods argument in modEnrich).

In the spaceMap publication, we use the edge-betweenness algorithm in igraph.

library(igraph)
mods <- cluster_edge_betweenness(ig)

The main goal of module analysis is identifying modules that are functionally enriched. The modEnrich function will test for significantly over-represented GO-terms (or any other valid functional grouping) within a module using hyper-geometric tests.

In the current example, only the protein nodes have functional mapping and we specify this through the levels = "y" argument. If both predictors and responses have functional mapping in the go2eg argument, then we can specify levels = c("y","x"). Other arguments are available to control the enrichment testing; see modEnrich for more details.

outmod <- modEnrich(ig = ig, mods = mods, levels = "y", go2eg = go2eg, process_alias = process_alias, prefix = "C")

The output of the module analysis is a list of 3 objects.

names(outmod)
## [1] "ig"   "etab" "eraw"
  • ig is the input igraph network object updated with a “process_id” attribute for nodes affiliated with a significant GO-term. The “process_id” and “module” attributes together are useful for visualizing which nodes are enriched for a specific biological function.

  • etab is the polished module enrichment table to report significant GO terms, the representation of the GO term in the module relative to the size of the GO term, and which CNA hubs belong to the module. The top ten hits as in Table S.9 of the spaceMap publication’s supplementary materials are as follows:

outmod$etab[1:20,]
Module (size) GO Category GO Obs./ Total X-hubs (hits)
C10 (21) extracellular matrix organization 11/125
glycosaminoglycan metabolic process 5/15
collagen catabolic process 6/29
extracellular matrix disassembly 7/57
cell adhesion 8/105
carbohydrate metabolic process 5/43
axon guidance 5/58
cytokine-mediated signaling pathway 5/59
C4 (17) ER to Golgi vesicle-mediated transport 5/20
protein N-linked glycosylation via asparagine 5/20
membrane organization 5/23
post-translational protein modification 5/43
axon guidance 5/58
cellular protein metabolic process 6/99
C5 (34) muscle organ development 5/17
cell-matrix adhesion 5/35
C6 (68) transcription from RNA polymerase II promoter 8/20 5q34 (20); 10p15.1-15.3 (12)
positive regulation of transcription from RNA polymerase II promoter 9/52
C7 (47) muscle contraction 13/26 6q25.3 (5)
movement of cell or subcellular component 7/15
  • eraw contains details for each (module, GO-term) pair that was subjected to the hyper-geometric test. This output gives the user more control by reporting all tests, the relative over-representation of a GO-term in that module, the raw P-value, and the adjusted P-value.
outmod$eraw[1:5,]
Module process_id process_alias GO Obs./Total P_value Adjusted_P_value
C1 GO:0007596 blood coagulation 5/105 0.0019854 0.0675412
C1 GO:0002576 platelet degranulation 3/35 0.0036786 0.1089927
C1 GO:0001889 liver development 2/16 0.0091840 0.2219862
C1 GO:0030168 platelet activation 3/51 0.0106856 0.2467104
C1 GO:0051260 protein homooligomerization 2/20 0.0142285 0.2774912

Export for Visualization

There are many tools for network visualization. In the publication of spaceMap, we exported the annotated igraph network object ig to the “graphml” format, which maintained all the attributes associated with nodes when read into Cytoscape for visualization.

filename <- "~/repos/neta-bcpls/neta/spacemap-scggm-boot-vote.graphml"
#delete nodes without edges from the graph object
vis <- delete_vertices(graph = outmod$ig, v = V(outmod$ig)[igraph::degree(outmod$ig) == 0])
igraph::write_graph(graph = vis, file = filename, format = "graphml")

Here we list all the attributes associated with the nodes that can be used in tandem with Cytoscape’s filtering functions to identify specific nodes of interest.

vertex_attr_names(outmod$ig)
##  [1] "name"             "alias"            "chr"             
##  [4] "start"            "end"              "strand"          
##  [7] "description"      "levels"           "rank_hub"        
## [10] "mean_rank_hub"    "sd_rank_hub"      "ncis"            
## [13] "ntrans"           "regulates_in_cis" "potential_cis"   
## [16] "module"           "C"

We describe some of the most useful attributes for visualization:

  • ‘name’: the unique node ID
  • ‘alias’: the node alias (e.g. gene symbol ERBB2)
  • ‘chr,start,end,strand’: the gene coordinates of nodes
  • ‘description’: any note (e.g. breast cancer oncogene)
  • ‘levels’: indicates whether the node belongs to predictors “x” or responses “y”
  • ‘rank_hub’: the rank of the hub within its level (e.g. a value of “1” for a node of level “x” means that it is the most consistently high degree \(x\) node in the network. )
  • ‘regulates_in_cis’: list of genes regulated in cis
  • ‘module’: the module ID that the node belongs to.
  • ‘process_id’: the significant GO-term(s) associated with the node.

Also the edge attributes are exported to ‘graphml’ format.

edge_attr_names(outmod$ig)
## [1] "levels"    "cis_trans"
  • ‘levels’ indicates whether an edge is \(x-y\) or \(y-y\).
  • ‘cis_trans’ indicates whether an edge is regulated in cis or in trans.

Summary

This toolkit is useful for analyzing and summarizing the output of the spaceMap model fit in the context of integrative genomics. The biological context mapped onto the network object can easily be exported to existing network visualization tools like Cytoscape.

Copyright © 2017 Regents of the University of California. All rights reserved.