I’ve never been entirely clear on random number generation in parallel within R. So I’m just making a quick post on this so I don’t forget, and in case others find it useful.
Using the R package parallel
, we may wish to generate random numbers on each worker, and moreover, require it to be reproducible. For example, We might be running simulations and/or generating data. When others wish to use our work, we want them to generate the same data!
The standard approach to generating reproducible random numbers is to use set.seed()
; for example:
## Set the seed
set.seed(42)
## Generate a random uniform number
runif(1)
## [1] 0.915
## Set the seed
set.seed(42)
## Generate a random uniform number
runif(1)
## [1] 0.915
which you can clearly see that we get the same result in both cases.
Unfortunately this approach doesn’t work when using parallel
:
## Load the parallel library
library(parallel)
## Let parallel know it can use half the available cores
options(mc.cores = parallel::detectCores()/2)
## Set the seed
set.seed(42)
## Generate 4 random numbers using mclapply
mclapply(1:4, function(i) runif(1))
## [[1]]
## [1] 0.429
##
## [[2]]
## [1] 0.951
##
## [[3]]
## [1] 0.693
##
## [[4]]
## [1] 0.0254
## Set the seed
set.seed(42)
## Generate 4 random numbers using mclapply
mclapply(1:4, function(i) runif(1))
## [[1]]
## [1] 0.763
##
## [[2]]
## [1] 0.759
##
## [[3]]
## [1] 0.26
##
## [[4]]
## [1] 0.102
The generated numbers are clearly different! This is not too good for our reproducibility! What we need is a seed that allows us to reproduce results across all cores. The parallel
package has that ability baked in; we just need to set the seed using the "L'Ecuyer"
option (see the ?RNGkind
help file for details):
## Set the seed
set.seed(42, "L'Ecuyer")
## Generate 4 random numbers using mclapply
mclapply(1:4, function(i) runif(1))
## [[1]]
## [1] 0.868
##
## [[2]]
## [1] 0.417
##
## [[3]]
## [1] 0.102
##
## [[4]]
## [1] 0.889
## Set the seed
set.seed(42, "L'Ecuyer")
## Generate 4 random numbers using mclapply
mclapply(1:4, function(i) runif(1))
## [[1]]
## [1] 0.868
##
## [[2]]
## [1] 0.417
##
## [[3]]
## [1] 0.102
##
## [[4]]
## [1] 0.889
There we go! Reproducible random numbers in parallel. There’s a slight caveat though: by default, mclapply
calls mc.reset.stream
by default after all workers/streams have finished. This means that if you run mclapply
again without setting a new seed (or generating some other random numbers in-between), you’ll get the same results:
## Generate 4 random numbers using mclapply (no seed set)
mclapply(1:4, function(i) runif(1))
## [[1]]
## [1] 0.868
##
## [[2]]
## [1] 0.417
##
## [[3]]
## [1] 0.102
##
## [[4]]
## [1] 0.889
If you need to generate some new random numbers, you’ll need to set a new seed (if you want them reproducible); if you don’t want them to be reproducible, reset your random number generator kind back to R’s default: RNGkind("Mersenne")
.