[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: distribution transformations and Gstat



First a bit of theory and practice that might be new to a few people on the
list.... then everything I know about the GSLIB approach to the problem, which
isn't much! 8)

Gaussian simulation methods assume that the random field model is
Multigaussian; see (Goovaerts 1997, p. 265-284 for more on the Multigaussian
model and its application).

A normal score transform ensures that data perfectly reproduce the normal
distribution. It accomplishes this by ranking the data and assigning a normal
score using the identical quantile of a standard normal distribution.

I don't know too much about GSLIB's approach, but here's a bit that may help.

Running GSLIB's nscore program calculates the normal scores for a data set and
produces a file that includes an additional column with the score.

See the following example:
**** the top few lines of the output from nscore ****
Normal Score Transform:Clustered 140 primary data
 4
Xlocation
Ylocation
Primary
Normal Score value
39.5 18.5   .06   -2.525
 5.5  1.5   .06   -2.112

*****
The original dataset consisted of 3 columns: Xloc, Yloc, and an attribute. The
new 4th column is a score -- essentially the number of standard deviations
from the mean. As this example shows, scores are signed. A practical note: if
there are few sample data observations the risk of underestimating the length
of the tails or simply mispecifying the population distribution is quite
large. There are ways to try to account for this, but it's important to be
aware of it; see Deutsch & Journel, 1998, pp. 224-226.

Variography is then performed on the transformed data (column 4 in the
example) and sequential gaussian simulation follows using transformed data as
well as the covariance model for the transformed data. The simulation values
are then backtransformed; see Deutsch & Journel, 1998, p. 226-7 for a very
brief discussion of this. GSLIB's program backtr does this using a lookup
table it created when initially transforming the data.

I hope this helps either in the development of new code or at least in some
users' understandings about the model assumptions we make when we use
geostatistics. From a Gstat development perspective, I understand a line has
to be walked between usability and multifunctionality. The user has few (if
any) decisions to make for the transform. The GSLIB code itself is pretty
short, and mostly consists of data input/output. It is fortran, though! I'd
think implementation would be pretty straightforward..

Ashton Shortridge

"Edzer J. Pebesma" wrote:

> Ashton Shortridge wrote:
> > performing gaussian simulation using Gstat is remarkably straightforward
> > - it's a great resource for spatial modeling! However often the
> > underlying datasets are not normally distributed. A workaround is to
> > transform the data, do the simulation, and backtransform, as can be done
> > in GSLIB. I'm writing to see if anyone has developed this approach for
> > Gstat, and if so, what additional data processing tools have been
> > employed.
>
> There is no such thing automatic in gstat, and the only transformations
> provided are log and indicator transform. However, results are not
> transformed back, so it's only pre-processing that's been done by
> gstat.
>
> I don't know how this is done by GSLIB -- do they use a parametric
> transform, or do they use rank order transforms? In the latter case,
> how do they transform back values outside the observed range? (I have
> the GSLIB 2 book here, but you may know it straight away?)
>
> Is this something we could automate easily, or is would that take
> too much responsibility away from the user? Any opinions?
> --
> Edzer