Suppose we want to simulate the outcome of a single continuous random
variable
with known distribution
,
to study how the outcome of a
model
depends on the distribution of
.
We could do this by using
simple random sampling, i.e. randomly drawing values from
.
However,
if
is expensive to evaluate, other sampling strategies may be more
efficient: it is for instance well known that stratified sampling can
characterise the population equally well as simple random sampling with
a smaller sample size. Stratified sampling works as follows: (i) the
distribution of
is divided into
segments, (ii) the distribution of
samples over these segments is proportional to the probabilities of
falling in the segments, and (iii) each sample is drawn from its
segment by simple random sampling. Maximal stratification takes place
when the number of segments (strata)
equals the number of data
,
and when
has probability
of falling in each segment, and
this is the most efficient.
When the model depends on two variables, like
,
then
values of both
and
can be drawn by simple random sampling or
by stratified random sampling. The most efficient sample is maximally
stratified for both
and
simultaneously. This means that a
sample of size
of pairs
is marginally
stratified
for both
and
.
Such a sample is called a Latin hypercube sample
[16].
When, in a spatial context, multiple Gaussian simulations are created,
the simulations are independent in the sense that they are a random
sample from the ensemble of all realisations that were possible under the
model specified: at a specific location
the subsequently realised
values of the variable
,
are a simple
random sample from the distribution of
.
(The values of
and
,
may be spatially dependent
when both simulations were conditioned on a common data set.) When
we want a stratified sample of
,
then this could simply be
drawn when
were known. However, this is less trivial when
and
,
still have to obey the prespecified
spatial correlation. A procedure for obtaining a Latin hypercube sample
in this context (multiple, spatially correlated variables) is given in
[23] and this procedure has been implemented in gstat.
The procedure requires the marginal distribution of
.
In case of
unconditional simulations, this distribution will be constant everywhere,
and when simple point kriging is used to obtain the Gaussian simulations,
this distribution is
with
the
simple kriging mean (sk_mean) and
the sill
of the variogram used. If for instance 3 variables are defined in
a command file, and they have marginal distributions of respectively
,
and
,
then this is
defined in a command file by the command
marginals: 4, 3, 2, 5, 7, 2.5;
If no such command is given, marginal distributions for variable
will
be taken as
,
a normal distribution with
mean
and variance
(the sill of the variogram for
variable
). If simulations are conditional to known point data, then
the marginal distributions are location specific, and are obtained from
simple kriging (prediction and prediction variance) of the conditioning
data. If, for 2 variables, such marginals reside in the maps pr1,
var1, pr2 and var2, denoting marginal distributions
(as maps)
pr1,var1
and
pr2,
var2
,
then this is defined in a command file as
marginals: 'pr1', 'var1', 'pr2', 'var2';
In case of stratified simulation, only the two maps with the (stratified) marginals have to be specified in the marginals command. Note that the marginals construct limits the definition of location specific marginal distributions (for conditional simulation with Latin hypercube sampling) to gridded simulation.
A Latin hypercube sample is constructed from a simple random sample if the command
set lhs=1;
is set. The size of the (Latin hypercube) sample is set to 1000 by
set nsim=1000;
In order to obtain correct results, the sample size (the number of simulations) must be large: in small samples the spatial correlation may be disturbed by the Latin hypercube procedure [21].