The goal of this vignette is to illustrate the network analysis toolkit in the spaceMap analysis pipeline. We use the fitted spaceMap networks to the bcpls data as an example.
Before conducting the network analysis, you should have completed the model fitting step such that you have a network ensemble or a final network to analyze (e.g. output from bootVote). To fully realize the capability of the toolkit, you should also have annotations for the nodes.
Load the bcpls
list.
library(spacemap)
data("bcpls")
#Attach "bcpls" data objects to the R Global Environment
attach(bcpls)
Next we list the contents of bcpls
.
names(bcpls)
## [1] "net" "yinfo" "xinfo" "bdeg" "go2eg"
The following bullets go into detail of how each item of bcpls
is used by the toolkit.
net: a list of two adjacency matrices that is output from bootVote. The protein–protein edges are encoded with 1’s in net$yy
. Likewise, the CNA–protein edges are encoded with 1’s in net$xy
.
data.frame
of annotations, whose rows should correspond to the rows of and net$yy
. We organize the columns by level of necessity for the toolkit. Columns with missing values are denoted with ‘NA’.
See the contents of yinfo
for the BCPLS data set below.
head(yinfo,5)
id | alias | chr | start | end | strand | description |
---|---|---|---|---|---|---|
NP_112598 | EPPK1 | chr8 | 144939491 | 144952632 | - | epiplakin |
NP_001714 | DST | chr6 | 56322784 | 56819426 | - | bullous pemphigoid antigen 1 isoform 1e precursor |
NP_002465 | MYH11 | chr16 | 15796991 | 15950887 | - | myosin-11 isoform SM1A |
NP_001035202 | MYH11 | chr16 | 15796991 | 15950887 | - | myosin-11 isoform SM2B |
NP_001070654 | MYH14 | chr19 | 50706884 | 50813801 | + | myosin-14 isoform 1 |
yinfo
but the optional variables differ. See the contents of xinfo
for the BCPLS data set below.head(xinfo,5)
id | alias | chr | start | end |
---|---|---|---|---|
1_3218610_6972307_0 | 1p36.31-36.32 | chr1 | 3218610 | 6972307 |
1_6990271_14007849_0 | 1p36.21-36.31 | chr1 | 6990271 | 14007849 |
1_14028633_28692959_0 | 1p35.3-36.21 | chr1 | 14028633 | 28692959 |
1_28695597_30387702_0 | 1p35.2-35.3 | chr1 | 28695597 | 30387702 |
1_30392887_31908168_0 | 1p35.2 | chr1 | 30392887 | 31908168 |
go2eg
will be used for conducting functional enrichment analysis, but is otherwise optional.head(go2eg,2)
## $`GO:0000122`
## [1] "NP_009330" "NP_644671" "NP_001120" "NP_056082"
## [5] "NP_001041666" "NP_005321" "NP_055398" "NP_036233"
## [9] "NP_001441" "NP_001257972" "NP_001177666" "NP_002492"
## [13] "NP_005587" "NP_061119" "NP_002022" "NP_060733"
## [17] "NP_005310" "NP_005312" "NP_005311" "NP_005313"
## [21] "NP_006051" "NP_001002295" "NP_116027" "NP_005987"
## [25] "NP_005985" "NP_066285" "NP_001138627" "NP_073576"
## [29] "NP_001001548" "NP_004022" "NP_000336" "NP_001963"
## [33] "NP_005323" "NP_004771" "NP_001744" "NP_006262"
## [37] "NP_620590"
##
## $`GO:0000165`
## [1] "NP_001020029" "NP_000338" "NP_003117" "NP_006149"
## [5] "NP_004439" "NP_005219" "NP_005226" "NP_003740"
## [9] "NP_001356" "NP_005535" "NP_057726" "NP_001553"
## [13] "NP_000213" "NP_001744" "NP_000791" "NP_620590"
We can also create a biologically informative alias for each GO ID, which will be useful for reporting enriched hubs/modules later.
library(AnnotationDbi)
library(GO.db)
process_alias <- AnnotationDbi::Term(names(go2eg))
head(process_alias,3)
## GO:0000122
## "negative regulation of transcription from RNA polymerase II promoter"
## GO:0000165
## "MAPK cascade"
## GO:0000278
## "mitotic cell cycle"
bdeg
is provided in the function rankHub
, then hubs will be prioritized according to their average degree rank, so that highly ranked hubs would consistently have a large degree across the network ensemble. The contents of bdeg
are two matrices yy
and xy
where the bth row is:
yy[b,]
an integer vector representing the degree distribution of the proteins in the network fitted on the \(b\)th bootstrap replicate.xy[b,]
an integer vector representing the degree distribution of the CNAs.Convert the final network into an igraph object and map the annotations onto the network.
library(igraph)
ig <- spacemap::adj2igraph(yy = net$yy, xy = net$xy, yinfo = yinfo, xinfo = xinfo)
If we query the attribute names of the nodes in the graph, we notice that the columns of xinfo
and yinfo
have been applied.
vertex_attr_names(graph = ig)
## [1] "name" "alias" "chr" "start" "end"
## [6] "strand" "description" "levels"
The igraph package has a number of ways to access the annotation information. For example, if we wish to confirm the chromosome location of ERBB2, we can easily query:
vertex_attr(graph = ig, name = "chr", index = V(ig)[alias %in% "ERBB2"])
## [1] "chr17"
We first prioritize the CNA- and protein- hubs. If the bdeg
argument is specified, then we rank the hubs according to the average degree rank. Accordingly, the most highly ranked hubs will have the most consistently high degree across network ensemble.
To rank the protein nodes, use the rankHub
command and simply specify the level = "y"
argument.
ig <- rankHub(ig = ig, bdeg = bdeg$yy, level = "y")
To rank the CNA nodes, specify the level = "x"
argument.
ig <- rankHub(ig = ig, bdeg = bdeg$xy, level = "x")
Alternatively, if the bdeg
argument is not available, the ranking is according to degree in the final network (coded by “ig”).
tmp <- rankHub(ig = ig, level = "x")
Next label \(x-y\) edges as being regulated in cis or in trans. The GenomicRanges package and the genomic coordinates chr,start,end
columns of xinfo
and yinfo
are required for this step. If you don’t have the package GenomicRanges installed yet, try:
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")
library(GenomicRanges)
Now we can label the \(x-y\) edges with either “cis” or “trans” in the cis_trans
edge attribute of ig
.
ig <- cisTrans(ig = ig, level = "x-y")
We then report a well-organized table, as seen in Table 3 of the spaceMap publication. The top
argument allows us to control how many hubs are displayed.
xhubs <- reportHubs(ig, top = 50, level = "x")
hub | # cis/ # trans | Potential # cis | cis genes |
---|---|---|---|
17q21.32 (46-46 Mb) | 1 / 98 | 14 | LRRC59 |
5q34 (160-170 Mb) | 0 / 129 | 0 | – |
17q12 (38-38 Mb) | 5 / 47 | 34 | ERBB2, GRB7, KAT2A, PNMT, MIEN1 |
10p15.1-15.3 (0.42-4 Mb) | 0 / 86 | 10 | – |
15q13.1-15.1 (29-43 Mb) | 2 / 34 | 20 | IVD, JMJD7 |
16q22.1-22.2 (68-71 Mb) | 2 / 70 | 13 | CDH1, CYB5B |
11q13.5-14.1 (76-78 Mb) | 4 / 7 | 7 | RSF1, PGM2L1, GAB2, AAMDC |
17q23.1-23.2 (58-59 Mb) | 2 / 19 | 8 | HEATR6, PTRH2 |
16p12.1-12.3 (19-27 Mb) | 2 / 19 | 17 | KNOP1, DCTPP1 |
5p12-5q11.1 (45-50 Mb) | 1 / 24 | 4 | NNT |
16q22.1 (68-68 Mb) | 0 / 16 | 8 | – |
17p11.2 (20-21 Mb) | 0 / 5 | 11 | – |
17p11.2 (20-20 Mb) | 1 / 6 | 11 | SHMT1 |
12q21.1 (74-74 Mb) | 0 / 4 | 2 | – |
8q21.2-22.1 (87-96 Mb) | 1 / 0 | 10 | RIDA |
Similarly, we can report the top 10 protein hubs, their degrees in the final network, and a description of each hub, if the description
column was specified in yinfo
.
yhubs <- reportHubs(ig, top = 10, level = "y")
hub | degree | description |
---|---|---|
ARGLU1 | 5 | arginine and glutamate-rich protein 1 |
SERPINA1 | 8 | alpha-1-antitrypsin precursor |
KLKB1 | 5 | plasma kallikrein preproprotein |
MMP2 | 5 | 72 kDa type IV collagenase isoform a preproprotein |
C6 | 4 | complement component C6 precursor |
APOA1 | 6 | apolipoprotein A-I preproprotein |
HPX | 4 | hemopexin precursor |
ALB | 5 | serum albumin preproprotein |
C5 | 3 | complement C5 preproprotein |
CFI | 3 | complement factor I preproprotein |
A CNA neighborhood comprises all protein nodes that are directly connected to a CNA hub by an edge. CNA neighborhoods represent direct perturbations to the proteome by amplifications or deletions in the DNA. To quantify their functional relevance, we compute a score called the GO-neighbor percentage. Two protein nodes are called GO-neighbors if they share a common GO term in the same CNA neighborhood. We postulate that a high percentage of GO-neighbors within a CNA neighborhood associates the CNA hub with more functional meaning. These scores, as presented in Figure 5 of the publication, can be generated with a GO mapping to the proteins as follows.
hgp <- xHubEnrich(ig = ig, go2eg = go2eg)
hub | degree | neighbor_percentage |
---|---|---|
10p15.1-15.3 | 86 | 77.90698 |
11q13.5-14.1 | 11 | 45.45455 |
12q21.1 | 4 | 50.00000 |
15q13.1-15.1 | 36 | 63.88889 |
16p12.1-12.3 | 21 | 71.42857 |
16q22.1 | 16 | 43.75000 |
16q22.1-22.2 | 72 | 79.16667 |
17p11.2 | 7 | 71.42857 |
17p11.2 | 5 | 60.00000 |
17q12 | 52 | 78.84615 |
17q21.32 | 99 | 72.72727 |
17q23.1-23.2 | 21 | 33.33333 |
5q34 | 129 | 77.51938 |
5p12-5q11.1 | 25 | 60.00000 |
There are many criteria to define modules of a network. This toolkit allows users to import the module membership information by themselves (see mods
argument in modEnrich).
In the spaceMap publication, we use the edge-betweenness algorithm in igraph.
library(igraph)
mods <- cluster_edge_betweenness(ig)
The main goal of module analysis is identifying modules that are functionally enriched. The modEnrich
function will test for significantly over-represented GO-terms (or any other valid functional grouping) within a module using hyper-geometric tests.
In the current example, only the protein nodes have functional mapping and we specify this through the levels = "y"
argument. If both predictors and responses have functional mapping in the go2eg
argument, then we can specify levels = c("y","x")
. Other arguments are available to control the enrichment testing; see modEnrich for more details.
outmod <- modEnrich(ig = ig, mods = mods, levels = "y",
go2eg = go2eg,
prefix = "P", process_alias = process_alias)
The output of the module analysis is a list of 3 objects.
names(outmod)
## [1] "ig" "etab" "eraw"
ig
is the input igraph network object updated with a “process_id” attribute for nodes affiliated with a significant GO-term. The “process_id” and “module” attributes together are useful for visualizing which nodes are enriched for a specific biological function.
etab
is the polished module enrichment table to report significant GO terms, the representation of the GO term in the module relative to the size of the GO term, and which CNA hubs belong to the module. The top ten hits as in Table S.5 of the spaceMap publication’s supplementary materials are as follows:
outmod$etab[1:10,]
Module (size) | GO Category | GO Obs./ Total | X-hubs (hits) |
---|---|---|---|
P1 (244) | small molecule metabolic process | 61/280 | 5q34 (111); 10p15.1-15.3 (79) |
oxidation-reduction process | 28/106 | ||
transcription from RNA polymerase II promoter | 9/20 | ||
protein homotetramerization | 9/20 | ||
P10 (28) | transcription, DNA-templated | 11/77 | |
regulation of transcription, DNA-templated | 7/41 | ||
P2 (25) | negative regulation of cell growth | 6/22 | |
muscle organ development | 5/17 | ||
P3 (57) | cell proliferation | 6/37 | 17q12 (50) |
P4 (97) | ion transmembrane transport | 10/21 | 17q21.32 (79) |
eraw
contains details for each (module, GO-term) pair that was subjected to the hyper-geometric test. This output gives the user more control by reporting all tests, the relative over-representation of a GO-term in that module, the raw P-value, and the adjusted P-value.outmod$eraw[1:5,]
Module | process_id | process_alias | GO Obs./Total | P_value | Adjusted_P_value |
---|---|---|---|---|---|
P1 | GO:0044281 | small molecule metabolic process | 61/280 | 0.0002922 | 0.0139429 |
P1 | GO:0055114 | oxidation-reduction process | 28/106 | 0.0008225 | 0.0327023 |
P1 | GO:0006366 | transcription from RNA polymerase II promoter | 9/20 | 0.0010725 | 0.0398029 |
P1 | GO:0051289 | protein homotetramerization | 9/20 | 0.0010725 | 0.0398029 |
P1 | GO:0034641 | cellular nitrogen compound metabolic process | 17/61 | 0.0050060 | 0.1248049 |
There are many tools for network visualization. In the publication of spaceMap, we exported the annotated igraph network object ig
to the “graphml” format, which maintained all the attributes associated with nodes when read into Cytoscape for visualization.
filename <- "~/repos/neta-bcpls/neta/spacemap-prot-boot-vote.graphml"
#only keep vertices that have edges for visualization
vis <- delete_vertices(graph = outmod$ig, v = V(outmod$ig)[igraph::degree(outmod$ig) == 0])
igraph::write_graph(graph = vis, file = filename, format = "graphml")
Here we list all the attributes associated with the nodes that can be used in tandem with Cytoscape’s filtering functions to identify specific nodes of interest.
vertex_attr_names(outmod$ig)
## [1] "name" "alias" "chr"
## [4] "start" "end" "strand"
## [7] "description" "levels" "rank_hub"
## [10] "mean_rank_hub" "sd_rank_hub" "ncis"
## [13] "ntrans" "regulates_in_cis" "potential_cis"
## [16] "module" "process_id"
We describe some of the most useful attributes for visualization:
Also the edge attributes are exported to ‘graphml’ format.
edge_attr_names(outmod$ig)
## [1] "levels" "cis_trans"