Model Fitting

Fitting the spacemap model to 1662 genomic copy number alternations (CNA) and 1595 protein expressions across 77 heterogeneous breast cancer tumor samples requires a process that guards against overfitting. We used CV.Vote for tuning parameter selection and Boot.Vote for building an ensemble network that finds real biological signal amidst millions of possible interactions in the data set. This model fitting process demands a powerful computational framework that leverages parallelism.

Note: The following code is for llustration purposes only and it is not recommended to evaluate the CV.Vote (see cvVote) or Boot.Vote (see bootEnsemble and bootVote) steps on a simple laptop computer¹.

Setup

Load the ExpressionSet data objects containing protein expressions and genomic copy number, which was generated in the previous step.

suppressPackageStartupMessages(library(Biobase))
cnaset <- readRDS("data/cna-expression-set.rds") 
cna <- t(Biobase::exprs(cnaset))
protset <- readRDS("data/prot-expression-set.rds")
pexp <- t(Biobase::exprs(protset))
#standardize
pexp <- scale(pexp); cna <- scale(cna);

Load the cross validation test sets, which have balanced molecular subtypes across the test sets.

testSetIds <- readRDS(file = "data/prot_cv_10k_test_sets77.rds")
#create training sets.
allSampleIds <- seq_len(nrow(pexp))
trainSetIds <- lapply(testSetIds, function(testSetId) setdiff(allSampleIds, testSetId))

Load the last iteration of the tuning grid used in the CV.Vote step.

tmap <- readRDS(file = "data/prot_last_tune_grid.rds")

Establish a parallel backend to evaluate model fits efficiently.

library(doParallel)
library(parallel)
ncores <- detectCores()  - 1
cl <- makeCluster(ncores)
registerDoParallel(cl)

Create a directory to save all the model fits from cvVote.

rp <- "~/scratch-data/neta-bcpls/mfits/"
system(paste("mkdir -p ", rp))

CV.Vote

The CV.Vote step will estimate optimal tuning parameters for the BCPLS application, which are denoted as \(\hat \lambda^*_1, \hat \lambda^*_2, \hat \lambda^*_3\).

library(spacemap)
spmapcv <- cvVote(Y = pexp, X = cna, 
                  trainIds = trainSetIds, testIds = testSetIds, 
                  method = "spacemap", tuneGrid = tmap,
                  resPath = rp, aszero = 1e-4,
                  tol = 1e-4, cd_iter = 60e7)

The estimated optimal tuning parameters (previously computed) are:

##         lam1    lam2     lam3
## 976 86.76054 18.8506 56.80362

Boot.Vote

Fit the spaceMap model on bootstrapped data replicates subject to \(\hat \lambda^*_1, \hat \lambda^*_2, \hat \lambda^*_3\), which will create a bootstrap ensemble of networks. Then, construct a final network through majority vote on the bootstrap ensemble of networks.

ens <- bootEnsemble(Y = pexp, X = cna, tune = spmapcv$minTune, method = "spacemap", B = 1000, 
                    aszero = 1e-4, tol = 1e-4, seed = 55139L)
ensbv <- bootVote(ens)

Next Step in Analysis

Please see the Annotation article for the next step in the analysis.

Actual model fitting employed 124 workers on a computer cluster. The last tuning grid iteration took cvVote() 34 hours.↩

Model Fitting

Christopher Conley, Pei Wang, Jie Peng

2017-09-24

Setup

CV.Vote

Boot.Vote

Next Step in Analysis