[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: assorted questions



Yetta Jager wrote:
> providing his software and support, but thought it would be
> better to use the listserver to give everyone access to the
> answers in case others have the same questions.  Here goes:

Thanks, this is exaclty the intent of the list.

> 
> 1. This is a statistical question.  We are using truly categorical
> data and I'm not clear whether constructing semivariograms for
> cumulative thresholds is correct (order=4).  It seems to me that we want
> to avoid having mis-classified locations tend to be in a category
> with an adjacent index.  We found a place in the code where the
> inequality could be replaced with an equality, but we don't know if
> there are other implications of such a change.

We had a discussion on this a few years ago, during the PhD work of a
former colleage, Marc Bierkens. The advantage of using cumulative indicators
for an ordinal variable is that you get better variograms for the middle
classes. The disadvantage is that the results depend on the order you choose,
which is arbitrary. I would prefer to use categorical indicators for this
reason (order=2 or 3, depending on whether the set is closed).

> 2. We are also interested in cross-validating and tried Xvalid=1.  We're
> not sure what the output is.  It gave an additional predicted indicator
> variable that seems to correspond to the first category.  We were kind of
> hoping for a summary of mean square errors and average kriging variance for
> each category predicted one at a time.  Is this already available or should we
> write a program to read the original data and kriged probabilities to
> do the comparison?

Cross validation is only done for the first variable in a command file. If
necessary (when all cross variograms are specified), secondary variables are
used in a co-kriging setting. All variables at the cross validation location
are ignored for the prediction. So, you can get cross validation results for
each variable by putting each data(xi) variable as the first one in a command
file and running gstat on it each time.

I believe that cross validation is one of the worst documented parts of gstat.
I thought about it a little, and probably adding an `xvalid' field to the
data(xx)
definition to indicate that this variable should (as well) be used to cross
validate would be an easy extension, programming-wise.

> 3. Apparently the program doesn't like being asked to predict at a location
> exactly matched by one in the input mask.  Is there a switch to set that will
> handle this situation?

What is an `input mask'? At some stage (a year ago?) I added the option that
simulation locations were allowed to coincide with data locations (I'm
guessing
you refer to this situation). Does
 set zero = 1e-7;
resolve the problem?

How did you notice gstat doesn't like this?

Best regards,
--
Edzer