Parallelism for Model Training (using the caret package)

In a previous post we discussed how to achieve parallel programming in R using the doParallel package. This post provides an application of parallelism for training a model using the caret package. We use repeated cross validation to train a support vector machine (note the iris data is small and doesn't require such exhaustive training; it was merely used for illustration) .

 


#PACKAGES
library(caret)
library(kernlab)
library(e1071)
library(doParallel)

#DATA
data <- iris

#FUNCTION
unregister <- function() {
env <- foreach:::.foreachGlobals
rm(list=ls(name=env), pos=env)
}

#MAIN

cvcontrol <- trainControl(method='repeatedcv', number=100, repeats=100)

ptm <- proc.time()
cl <- makeCluster(detectCores())
registerDoParallel(cl)
model <- train(data[,c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width')], data[,"Species"], method='svmLinear', trControl=cvcontrol)
stopCluster(cl)
registerDoSEQ()
unregister
proc.time() - ptm

ptm <- proc.time()
model <- train(data[,c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width')], data[,"Species"], method='svmLinear', trControl=cvcontrol)
proc.time() - ptm