next up previous contents index
Next: Prediction modes and change Up: Getting started Previous: Simulation   Contents   Index

Subsections


Linear models in gstat

Defining a model

In ordinary and simple kriging each observation $z(s_i)$ (the value of variable $z$ at location $s_i$) is represented by the model

\begin{displaymath}
Z(s_i) = m + e(s_i)
\end{displaymath} (2.1)

with, in case of ordinary kriging $Z(s)$ intrinsically stationary and $m$ an unknown (locally) constant trend, or, in case of simple kriging, $Z(s)$ a second order stationary and $m$ a known, constant trend.

A wider class of models is obtained when the observation $z(s_i)$ is modelled as the sum of a spatially non-constant (i.e. non-stationary) trend $m(s_i)$ and an intrinsically stationary error $e(s_i)$:

\begin{displaymath}
Z(s_i)=m(s_i)+e(s_i)
\end{displaymath} (2.2)

In the universal kriging model such a trend is modelled as a linear function in $p$ known base functions $f_j (s) = (f_j (s_1),...,f_j(s_n))'$ and $p$ unknown constants $\beta_i$, which yields, for the observation at $s_i$

\begin{displaymath}Z(s_i)=\sum_{j=1}^{p} f_j (s_i) \beta_j + e(s_i) \end{displaymath}

and for all observations

\begin{displaymath}Z(s)=\sum_{j=1}^{p} f_j(s)\beta_j + e(s)\end{displaymath}

which can be written in matrix notation as
\begin{displaymath}
Z(s)= F \beta + e(s)
\end{displaymath} (2.3)

with $F = (f_1 (s),...,f_p (s))$ and $\beta = (\beta_1 ,...,
\beta_p)'$. Ordinary kriging (2.1) is the special case where this model has only an intercept ($p=1$, $f_1 (s) = 1, \forall x$ and $\beta_1 = m$).

Gstat calculates prediction under the multivariable universal kriging model [26] when base functions $f_i(s)$ and variogram(s) for $e(s)$ are specified (see appendix A.2 for the prediction equations). An intercept (the constant value as in the ordinary kriging model) in (2.1) is assumed for each variable by default, and only non-intercept base functions need to be specified. Base functions can be polynomials of the coordinates (e.g. $x$, $x^2$, $xy$ etc.) or user-defined.


Coordinate polynomial base functions

For a data variable in two dimensions, a first order linear trend in the coordinates is defined by

data(x): 'file', x=1, y=2, v=3, X=x&y;

or, as a shorthand for this, the coordinate polynomial trend order degree can be specified:

data(x): 'file', x=1, y=2, v=3, d=1;

(Note that d=1 is equivalent to X=x for one-dimensional, X=x&y to for two-dimensional and to X=x&y&z for three-dimensional data.) Values of coordinate polynomial base functions at observation and prediction locations are obtained from the (standardised) location coordinates $s_i$ (see also example 17).


User-defined base functions

Non-coordinate polynomial, user-defined functions can also be specified as base functions. Because they are not known, they should be defined as column numbers in a data file (example 13), like

data(x): 'file', x=1, y=2, v=3, X=4&5;

User-defined and coordinate polynomial base functions may be intermixed.

When binary (e.g., 0/1) variables are used as base functions, and the sum of these functions coincides with an intercept (i.e., summed row-wise, the columns equal a column with a constant), the default intercept has to be overridden. This is done by specifying -1 as the first column number of the base functions (example 14).

Specification of the user-defined base function values at prediction locations is necessary, since they are needed in the prediction. For the prediction locations they are needed too, and for map prediction locations they are defined as a list of mask maps containing the base functions. For the data() prediction locations they are defined as the X column numbers in the corresponding file. In both cases the number of base functions thus specified and the order in which they appear should match the order in which the (non-intercept and non-coordinate polynomial) X columns appear in subsequent data( id) commands.

If more than one variable is defined and only direct variograms are specified, multiple universal kriging is done. If in addition to direct variograms cross variograms are specified, multivariable universal kriging is done [26] (universal cokriging, section 3.3).


Ordinary and weighted least squares trend prediction

If base functions are specified but no variograms are specified, the default prediction method is the (multiple) regression prediction, (ordinary least squares, OLS) assuming that the $e(s)$ are independently identically distributed (IID). In this case the prediction variance is the classical regression prediction variance for a single observation (example 15 and example 17), or for the mean value when the block size is non-zero (section 3.5).

If the errors are assumed to be independent with different variances, $v_i \sigma^2$ then specifying the constants $v_i$ will result in weighted least squares (WLS) prediction. The values $v_i$ are not variances but merely relate an individual residuals variance to $\sigma^2$. For instance, if an observation is an average of $n_i$ measurements, then, assuming the variance of individual measurements is constant, $v_i$ can be set to $n^{-1}_i$. In the unweighted case, all $v_i$ are 1, and for prediction variances to make sense, the $v_i$ should be related to this unity value ($\sigma^2$ should be ``the'' residual variance).


Generalised least squares trend prediction

If at prediction locations $s_0$, for some reason not the kriging prediction but the generalised least squares estimate (or BLUE, best linear unbiased estimate) of the trend $f(s_0 ) \hat \beta$ and its estimation variance are needed, then this is obtained by overriding the default method (ordinary or universal kriging) using

method: trend;

Setting $f_i (s_0)$ to 1 and all other $f(s_0)$ to 0 yields the generalised least squares estimate (BLUE) of $\hat{\beta}_i$. See appendix A.2 for details on weighted or combined weighted and generalised least squares prediction.


Residual variograms

If base functions are specified but no prediction locations are specified, then the sample direct or cross variogram and covariogram is calculated from ordinary least squares (OLS) residuals, as obtained from the linear model with IID errors. If generalised least squares residuals are preferred to OLS residuals, the (initial) variograms should be set, and gls should be set to 1 (section 4.4).


next up previous contents index
Next: Prediction modes and change Up: Getting started Previous: Simulation   Contents   Index
Edzer Pebesma
1999-08-31