[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: backlog



It's been a bit quiet on the list, so I'll get back to Yetta's
questions of a few weeks ago:

Yetta Jager wrote (Jul 1):
> 
> Gstat users,
> 
> We're having very good success using Gstat to generate alternative
> realizations that have similar structure to an original map.  Now,
> we're trying to understand how to control the error rates and what
> the lower bounds on generated map errors are.

Depending on how you define lower bound exaclty, this may require quite
a few simulations.

> 
> This question resurfaced during a meeting yesterday.  During conditional
> simulation, we subsample a grid or mask to provide conditioning data.
> Gstat won't continue when there is overlap between the input data sites
> and the output mask sites.  We have been tricking Gstat into giving us
> predictions by scooting them over slightly.

This will (hopefully) be corrected in the next version. The dificult
point
is always how to judge whether a prediction variance is zero (we're at a
data location) or is almost zero (we're close to a data location). In
any
case, you can now `trick' gstat with setting the zero `threshold' to a
larger
value, e.g.

set zero = 1.0e-10; # or larger, or smaller!

instead of moving your conditioning data (that will not be reproduced
when
moving them away from simulation grid locations!)

> 
> Is there any reason, in principle, why we shouldn't be able to
> get predictions at zero distance?  We've also been discussing
> whether to expect "honoring of the data" at these locations when
> there is a nugget in simulations -- I suppose the kriging variance is
> still zero?
> 
There's a very good discussion in Cressie's book (2nd edition, 1993) on
Statistics for spatial data. The issue here is: should we conceive the
nugget variance as true short distance spatial variation or as
measurement
error, or as a mixture. In case of spatial variation, we should
reproduce
data values, in case of meaurement error, prediction will usually aim at
the measurement-error free process (and not reproduce the measured value
exactly anymore). So, it's a matter of model decision, and some nowledge
of
measurement errors helps.

Gstat (2.0g and up) can deal with all three cases:

variogram(xx): 1 err() + 1 exp(300); # won't reproduce data values
variogram(yy): 1 nug() + 1 exp(300); # this one will
variogram(xy): 0.5 nug() + 0.5 err() + 1exp(300); # half-way in between!

Hope this helps.
--
Edzer