next up previous contents index
Next: Prediction equations Up: Equations Previous: Equations   Contents   Index

Subsections


Spatial dependence


Sample variogram and covariogram

All variograms and covariograms are calculated from predicted residuals $\hat e (s_i ) = z(s_i ) - \hat m (s_i )$, with $\hat m (s_i)$ the ordinary least square estimates of $m(s_i)$, fitted globally (all data of the variable are used in a linear model assuming IID errors), unless one of dX, noresidual or gls is set. The sample variogram is calculated from residuals from a single realization $z$ for regular distance intervals $[h_j ,h_j + \delta]$. by:

\begin{displaymath}
\hat \gamma (\bar{h}_j ) = \frac{1}{2 N_j}
\sum_{i=1}^{N_j} ...
...\hspace{.5cm}\forall (s_i ,s_i + h): h \in [h_j ,h_j +
\delta]
\end{displaymath}

with $\bar{h}_j$ the average of all $N_j$ $h$'s. A covariogram is modeled by fitting a model to the sample covariogram $\hat C (h)$ calculated by:

\begin{displaymath}
\hat C ( \bar{h}_j ) = \frac{1}{N_j}
\sum_{i=1}^{N_j}
\hat ...
...\hspace{.5cm}\forall (s_i ,s_i + h): h \in [h_j ,h_j + \delta]
\end{displaymath}

The sample cross variogram $\hat{\gamma}_{kl} (h)$ is calculated from sample data by:

\begin{displaymath}
\hat{\gamma}_{kl} ( \bar{h}_j ) = \frac{1}{2 N_j}
\sum_{i=1...
...hat{e}_k (s_i + h))
( \hat{e}_l (s_i )- \hat{e}_l (s_i + h)),
\end{displaymath}


\begin{displaymath}
\forall (s_i ,s_i + h): h \in [h_j ,h_j + \delta]
\end{displaymath}

The sample pseudo cross variogram $g_{kl}(h)$ is calculated from sample data by

\begin{displaymath}
\hat{g}_{kl} ( \bar{h}_j ) = \frac{1}{2 N_j}
\sum_{i=1}^{N_...
...hspace{.5cm}\forall (s_i ,s_i + h): h \in [h_j ,h_j + \delta].
\end{displaymath}

The sample cross covariogram $\hat{C}_{kl} (h)$ is calculated by the sample cross covariance


\begin{displaymath}
\hat{C}_{kl} ( \bar{h}_j ) = \frac{1}{N_j}
\sum_{i=1}^{N_j}...
...\hspace{.5cm}\forall (s_i ,s_i + h): h \in [h_j ,h_j + \delta]
\end{displaymath}

Gstat provides calculation of sample variogram, covariogram, cross variogram, pseudo cross variogram and cross covariogram, where width $\delta$, number of intervals $j$ and direction of $h$ can be controlled. When some cross variogram is requested, gstat decides which one should be calculated: the first, `classic' cross variogram is calculated when the two variables have the same number of observations and identical coordinates and order, in any other case the pseudo cross variogram is calculated, this information is written to the second line of the file with the sample variogram.

Estimation of variogram model parameters

Gstat provides several methods for estimating variogram model parameters. Fitting of a variogram model to the sample variogram is done by iteratively reweighted least squares (WLS, [5]), minimizing

\begin{displaymath}
\sum_{j=1}^{n}
w_j ( \hat \gamma (\bar{h}_j ) - \gamma(\bar{h}_j))^2
\end{displaymath}

with $w_j$ either equal to $N_j$ or to $\gamma(\bar{h}_j)^{-2}N_j$.

Fixing parameters in a weighted least squares fit can be done by putting an @ before the range or the sill parameter to fix: e.g. fitting the variogram 1 Sph(@ 0.2) will only fit the sill parameter (1), fitting @1 Sph(0.2) will only fit the range parameter (0.2).

Within gstat, iterative fitting stops when the number of steps exceeds 50 (or the value set by iter) or when the fit has converged. A fit is considered as `converged' when the change in the weighted sum of squares of differences between variogram model and sample variogram becomes less then $10^6 \times$ the last value of this sum of squares (this number is controlled with fit_limit).

Both gstat and gnuplot fix the fitting weights during iterations. For this reason, when the fitted model strongly differs from the initial (starting) model, another fitting round may result in a substantially different end model, due to the different weights. Reiterated weighted fitting may very well result in never converging cycles. Gstat uses Gauss-Newton fitting with (mostly) analytical derivative functions; gnuplot uses Levenberg-Marquardt fitting with numerical derivatives.

Finally, gstat provides REML (restricted maximum likelyhood) estimation of partial sill parameters [15,2] from sample data: this method is equivalent to a full weighted least squares fitting of the variogram (sill parameters) to the pairwise products of the observations (the covariogram cloud). REML estimation is shown to be equivalent to iterated MIVQUE (minimum variance quadratic unbiased estimation), and to iterated MINQUE (minimum norm quadratic unbiased estimation) with an Euclidean norm [2]. REML may be slow for moderate to large data sets (more than 100 observations).


next up previous contents index
Next: Prediction equations Up: Equations Previous: Equations   Contents   Index
Edzer Pebesma
1999-08-31