“Harpists spend 90 percent of their lives tuning their harps and 10 percent playing out of tune.” Igor Stravinsky
The spaceMap model learns networks in the high-dimension-low-sample-size regime by imposing a sparsity assumption on the network topology. Three parameters are employed in spaceMap to control the amount of sparsity on different types of interactions, which were briefly introduced in the previous vignette Model Fitting Basics. These parameters ought to be tuned to find the appropriate amount of sparsity for each data set.
The computation effort spent on parameter tuning needs to be appropriately managed. This vignette illustrates a strategy that helps balance between computation time and network learning performance.
We begin with describing the purpose of each tuning parameter. The first tuning parameter \(\lambda_1\) controls the overall sparsity among response variable interactions \(y-y\) and the second tuning parameter \(\lambda_2\) controls that among the predictor-response interactions \(x-y\). Lastly, increasing the tuning parameter \(\lambda_3\) will result in networks with greater representation of hub predictor nodes.
Exploring a large grid of tuning parameters can be computationally demanding. Here, we illustrate a strategy for finding a good grid of tuning parameters:
Step 1: Cross validation of SPACE model1, which is a special case of the spaceMap model when only \(\bf Y\) input data is specified and \(\lambda_2=\lambda_3=0\). Hence, tune the SPACE model across a one-dimensional grid and identify the best performing neighborhood for \(\lambda_1\).
Step 2: Cross validation of spaceMap model with input data \(\bf X\) and \(\bf Y\) across a three-dimensional grid of \(\lambda_1, \lambda_2, \lambda_3\). The neighborhood of \(\lambda_1\) is specified from step 1. Explore a broad range of values for \(\lambda_2, \lambda_3\) (arguments lam2
and lam3
).
Step 3: Repeat step 2 while zooming into a smaller neighborhood of the grid if further refinement of the tuning parameters is needed.
We illustrate the above strategy with an example from simulation sim1.
library(spacemap)
data(sim1)
Obtain the response data \(\bf Y\), a \(150 \times 171\) matrix..
Y <- sim1$Y
Obtain the predictor data \(\bf X\), a \(150 \times 14\) matrix.
X <- sim1$X
Store the dimensions and sample size.
N <- nrow(X)
P <- ncol(X)
Q <- ncol(Y)
Extract the true network where \(x-y\) edges are stored in the truth$xy
adjacency matrix as 1’s and \(y-y\) edges are stored in the truth$yy
adjacency matrix.
truth <- sim1$truth
Tuning will be much faster if parallel computation is leveraged. If you choose to set up a parallel back-end (in this case for a multicore machine), it will use all available cores minus 1.
#if dopar==true, then model tuning done in parallel
dopar <- FALSE
if (dopar) {
library(doParallel)
library(parallel)
ncores <- detectCores() - 1
cl <- makeCluster(ncores)
registerDoParallel(cl)
}
lam1
Tune lam1
by fitting the SPACE model2 to \(Y\) over a one-dimensional tuning grid. We use a result from Meinshausen and Buhlmann (2006)3 to initialize \(\lambda_{1}\)4.
#initialize lam1 according to Meinshausen and Buhlmann
lam1start <- function(n, q, alpha) {
sqrt(n) * qnorm(1 - (alpha/ (2*q^2)))
}
#value of alpha is meant to control the false discovery rate
#in our experience alpha should be set very conservatively
#to obtain an initial value closer to the CV-selected lam1.
lam0 <- lam1start(n = floor(N - N*.10), q = Q, alpha = 1e-5)
lam0
## [1] 72.94889
Take the initial grid search for lam1
to range from [80% of lam0
, 120% of lam0
].
#initial grid size.
ngrid <- 30
#80% of lam0
eps1 <- 0.8
#grid should be a data.frame
tsp <- expand.grid(lam1 = seq(lam0*eps1, lam0*(1 + (1 - eps1)), length = ngrid))
summary(tsp)
## lam1
## Min. :58.36
## 1st Qu.:65.65
## Median :72.95
## Mean :72.95
## 3rd Qu.:80.24
## Max. :87.54
In preparation of cross validation, we encourage the user to determine the split of the data by themselves since the data may have some special underlying population structure that needs to be balanced across the hold-out sets. Below we illustrate one way of splitting the data through the caret
R package.
#for generating cross-validation folds
library(caret)
#number of folds
K <- 10L
set.seed(265616L)
#no special population structure, but create randomized dummy structure of A and B
testSets <- createFolds(y = sample(x = c("A", "B"), size = N, replace = TRUE), k = K)
trainSets <- lapply(testSets, function(s) setdiff(seq_len(N), s))
Conduct the cross-validation through the cvVote function by specifying method = "space"
. Also input the data \(\bf Y\), the lists of test and training sample splits, and a grid for the tuning parameter.
cvspace <- cvVote(Y = Y,
trainIds = trainSets, testIds = testSets,
method = "space", tuneGrid = tsp)
The CV-selected \(\lambda^*_1\) is reported as follows:
minLam1 <- cvspace$minTune$lam1
minLam1
## [1] 65.40245
The \(y-y\) edges are stored in the adjacency matrix cvspace$cvVote$yy
. The number of \(y-y\) edges is reported as:
nyy <- nonZeroUpper(cvspace$cvVote$yy,0)
nyy
## [1] 162
The tuneVis function performs diagnostics stored as a list:
#the size of each test set
nsplits <- sapply(testSets, length)
#required for plots
library(ggplot2)
cvVis <- tuneVis(cvOut = cvspace,
testSetLen = nsplits,
tuneParam1 = tsp$lam1,
tuneParam1Name = "lam1")
Below the plot on the left shows the log(CV score) versus lam1
where the vertical line denotes the minimizer. On the right is the average number of \(y-y\) edges across CV training splits which is decreasing with increasing lam1
. The intersecting lines indicate that the optimal lambda produces a CV.vote model of 162 \(y-y\) edges, which is about 15 edges less than the average no. of edges due to the stabilizing effect of the CV.vote procedure.
#for combining plots
library(gridExtra)
#geom_vline/hline for vertical/horizontal lines
grid.arrange(cvVis[[1]] +
geom_vline(xintercept = minLam1),
cvVis[[3]] +
geom_hline(yintercept = nyy) +
geom_vline(xintercept = minLam1)
, ncol = 2)
Since the true network is known, we can evaluate the learned SPACE network against the truth.
spacePerf <- cvPerf(cvOut = cvspace, truth = truth, method = "space")[1:3]
spacePerf
## power fdr mcc
## 0.7820513 0.2469136 0.7648620
The “mcc” measure combines the power and FDR. In this case, the power (or sensitivity) is reasonable, but the FDR (or specificity) is a little high. Next, we will see the effect of conditioning on \(\bf X\) through the spaceMap model.
lam2
and lam3
Now shifting our attention to tuning all three parameters.
Here we make use of the information from step 1 to reduce the grid search time. Input a neighborhood of \(65 \leq \lambda_1 \leq 75\) into the 3-D grid since we know that produces a reasonable output for the \(y-y\) part of the network from step 1.
The initial neighborhood for \(\lambda_2, \lambda_3\) is guided by practical considerations such as the maximum number of \(x-y\) edges one might expect. In this case, since there are only 14 \(x\) variables, we do not expect there to be more than say, 200 \(x-y\) edges, but no less than 50 \(x-y\) edges in a sparse network. When \(P \leq Q\) we have noticed in our experience that the optimal \(\lambda^*_1\) from step 1 serves as a good upper bound for initializing \(\lambda_2,\lambda_3\), although this may not always be the case. Also, our experience suggests that the network size is more sensitive to \(\lambda_2\) than to \(\lambda_3\); therefore, finding an appropriate neighborhood for \(\lambda_2\) is more critical.
In the following, take \(15 \leq \lambda_2 \leq 60 (< \lambda^*_1)\) and fix \(\lambda_3=15 > 0\):
#define grid
tmap1 <- expand.grid(lam1 = minLam1,
lam2 = seq(15, 60, by = 5),
lam3 = 15)
We can obtain an approximate lower bound for \(\lambda_2\) by applying the initFit function to get a sense of the number of \(x-y\) edges (or \(y-y\) edges) for a specific tuning set. This function does not do any cross-validation, but simply fits the model once for each tuning parameter combination in the grid.
ntmap1 <- initFit(Y = Y, X = X, method = "spacemap", tuneGrid = tmap1)
library(ggplot2)
qplot(x = tmap1$lam2, y = ntmap1$nxy,
ylab = "No. of X->Y edges", xlab = "lam2") +
theme_bw()
The above plot suggests restricting our attention to \(20\leq \lambda_2 \leq 35\) to limit the number of \(x-y\) edges to the range that we are targeting.
Now that we have reasonable ranges for \(\lambda_1,\lambda_2\), we also need to find an appropriate range for \(\lambda_3\). We could use the initFit function to guide this process; however, since we expect that \(x\)-hubs are likely to exist in the network, the lower bound for \(\lambda_3\) should be sufficiently far away from 0.
Now define the 3-D tuning grid:
tmap2 <- expand.grid(lam1 = seq(65, 75, length = 5),
lam2 = seq(21, 35, length = 5),
lam3 = seq(10, 40, length = 5))
Apply cross validation with method = "spacemap"
to learn the optimal network in the tuning grid. For this data set, this will take about 4 minutes on a single processor.
cvsmap <- cvVote(Y = Y, X = X,
trainIds = trainSets, testIds = testSets,
method = "spacemap", tuneGrid = tmap2)
The CV.vote spaceMap network is encoded into adjacency matrices cvsmap$cvVote[c("xy", "yy")]
. The optimal tuning parameters from the tuning grid are:
cvsmap$minTune
## $lam1
## [1] 70
##
## $lam2
## [1] 28
##
## $lam3
## [1] 17.5
Next we diagnose the suitability of the tuning grid based on the output through tuneVis
.
cvVis1 <- spacemap::tuneVis(cvOut = cvsmap, testSetLen = nsplits,
tuneParam1 = tmap2$lam1, tuneParam1Name = "lam1")
cvVis2 <- spacemap::tuneVis(cvOut = cvsmap, testSetLen = nsplits,
tuneParam1 = tmap2$lam2, tuneParam1Name = "lam2")
cvVis3 <- spacemap::tuneVis(cvOut = cvsmap, testSetLen = nsplits,
tuneParam1 = tmap2$lam3, tuneParam1Name = "lam3")
Visualizing the CV scores across the 3-D grid shows that the selected tuning parameters are away from their respective boundaries, which suggests the grid is suitable. If a selected parameter is on or near the boundary, it suggests that we may increase (or decrease) the upper (or lower) bound for that parameter on the grid.
library(gridExtra)
grid.arrange(cvVis1[[1]], cvVis2[[1]],cvVis3[[1]], ncol = 2)
This whole tuning process took about 6 minutes on a single processor. The performance of the final network is listed below
smapPerf <- cvPerf(cvOut = cvsmap, truth = truth, method = "spacemap")[1:8]
smapPerf
## power fdr mcc powerXY fdrXY powerYY
## 0.76666667 0.07471264 0.84049251 0.87037037 0.07843137 0.73076923
## fdrYY mccYY
## 0.07317073 0.82133452
Note that, the spaceMap model dramatically lowers the \(y-y\) FDR when compared with the SPACE model with only a small drop in power.
This vignette discusses how to choose tuning parameters for the spaceMap model. We recommend looking at the next vignette which illustrates how to further control FDR through an ensemble of networks learned on bootstrap replicates of the data.
Peng, Wang, Zhou and Zhu (2009). Partial Correlation Estimation by Joint Sparse Regression Models, Journal of the American Statistical Association , Vol. 104, No. 486, 735-746↩
Peng, Wang, Zhou and Zhu (2009). Partial Correlation Estimation by Joint Sparse Regression Models, Journal of the American Statistical Association , Vol. 104, No. 486, 735-746↩
Meinshausen, Nicolai; Bühlmann, Peter. High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 (2006), no. 3, 1436–1462. doi:10.1214/009053606000000281↩
applies when \(\bf Y\) has been standardized to have unit variance for all \(Q\) variables↩