[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Kriging the data with geological faults
From: Konstantin Malakhanov <kosta@iwwnt.iwwl.rwth-aachen.de>
>I don't know if are on the gstat mailing list. So I repeat my question
>and the answer from Edzer J. Pebesma:
>So would it be possible to get yout code? to merge it into gstat?
>Otherwise I will have to code it.
Hi Konstantin,
I was digging through my old E-Mails to Edzer about this 'kriging with
boundaries' and discovered that Indeed I did promise to integrate it into
GSTAT. At the end I got distracted and started working on quick search
routines. It's a shame (but typical for me!), since much has changed since
GSTAT1.9c where I had the original mods. It didn't happen because I had
built-in dependencies on a programming library with a remote sensing
software called ER-Mapper. That library was a nightmare to link together
with GSTAT at the time. I intended to remove the ER-Mapper dependencies
before I turned over the code (It is used for handling the vector file I/O
and internal structures).
I never used the POLYGON routines that are hidden in GSTAT. In the end I
wrote something myself that was quite similar I think, but maybe a bit
messy.
I think the best spot for this modification is to put a single function call
in select.c AFTER the selection is done by distance and BEFORE the check on
the minimum number of points. The function should first find all the
boundaries in the vicinity of the estimation point (within the selection
radius), then for each boundary, check the line of sight between the
estimation point and all the data points. Reject a data point of the line
of sight crosses any boundary an odd number of times. You also need a way
to specify the data file containing the boundaries and read them in.
I can give you the code I used for this. All of my boundary-specific
routines were in a separate source file edges.c. My code still has the
ER-Mapper dependencies, but it is reasonable straightforward to see how they
store the vector data.
Unfortunately I don't have the time to make the proper mods now. If I were
to do it again, I would use the POLYGON structure already in data.c and
probably use Edzer's point-in-polygon routine instead of my own
edge-crossing code.
I should say too that we have a guy here called Jörgen Wallerman that is
still working on this problem. He is now taking a look at variogram
estimation and how the models look in the presence of these boundaries.
He's not using GSTAT for that though, but he is using my edge-crossing code.
I'd like to see a hard-core statistician do a proper theoretical analysis of
this problem. I think it is very interesting and relevant in many fields.
The results should fit naturally somewhere between stratified kriging and
ordinary kriging.
I can attach some early dialogue between myself and Edzer on this topic so
you can see how it progressed. I'll talk to you later about the code if you
want it. I also put a blurb on the web a long time ago at
http://www.resgeom.slu.se/~fjasj/iCC-97.htm
Cheers,
SPJ
Steve Joyce, mailto:steve.joyce@resgeom.slu.se
Dept. of Forest Resource Management and Geomatics,
Swedish University of Agricultural Sciences,
S-901 83, Umeå, SWEDEN
phone:+46 90 7866957 fax:+46 90 141915
-----------------------------------------------------------------------
Steve Joyce wrote:
I've been thinking a lot about kriging in the presence of known
boundaries and want to make some software changes to address this
problem. In our case, we are using kriging to estimate forest
parameters on a raster grid based on a number of sample plots. We know
there are natural and man-made boundaries in the landscape, e.g. edges
of clear-cuts, agricultural fields, roads, lakes, etc. These boundaries
ane not necessarily closed polygons so stratified kriging isn't really
appropriate. What I would like to do is add the capability to
accept/reject candidate sample points for kriging based on the spatial
arrangement of the estimation point and the known boundaries. i.e don't
use samples for kriging that are on the opposite side of a known sharp
boundary. I've already got the boundary detection figured out and can
produce a map of one pixel wide labelled edges with a known magnitude
and direction. So my question is what is the best way to integrate this
idea into gstat? Basically, after the candidate points for kriging are
selected based on the max search radius, I would like to check each one
based on the spatial arrangement with the known edges. Then you can go
on and select maxpoints from the valid ones.
If you have any other ideas about this I would love to hear about it.
One thing that comes to mind immediately is that if we use these edges
for selecting points in kriging, we should also use them in the
variogram modelling stage. This may not be trivial to implement.
SPJ
------------------
>
> About the boundaries in interpolation: this is an interesting problem,
> I had never given it a serious thought what to do when boundaries are
> known and do not form closed polygons.
>
> I can imagine a simple implementation where, after the distance selection,
> data points get thrown out when the line between prediction location
> and data location intersects with one of the boundary segments.
This is the same first thought I had as well, but in practice there are
lots of cases where obstruction from a straight line-of-sight should not
necessarily be rejected. With complex boundary shapes, you end up with
lots of prediction locations that have no data points within sight.
Sometimes a slight deviation from a straight line will yield a clear
path from prediction to data location. But how far do you allow it to
deviate? Then of course you might want to use the indirect path as your
distance in the variogram model. It gets messy.
----------------------------------
Hi Edzer,
I Just wanted to report I have made some good progress with this
"Kriging with Boundaries" problem, and have hacked away at your nice
gstat code and turned it into an unrecognizable mess.
On kriging with boundaries: After a lot of head-scratching, I ended up
coming back around to this line-of-sight criteria with a bit of a twist
that ends up being quite similar to your point-in-polygon routine. If I
check a line of sight between an estimation point and a data point, I
can reject the data point if it crosses the same boundary an odd number
of times (easier said than done, but I don't need to tell you that!).
So now I convert the output of my raster edge-detection routines to a
vector "chain-code", then do some smoothing and remove redundant
points. The resulting set of polylines and polygons I feed into gstat.
This has the advantage of allowing you to do some editing on the vector
coverage or add vectors from other sources.
I defined a new keyword in the command file called "edges:" which
expects a single string containing the name of a file to read edge data
as a mix of polygons and polylines. I decided to do it this way because
this datasource is really different from points or masks, and maybe I
will use them in variogram calculation as well sometime later.
In select.c: after points are selected based on the search radius
(d->sel), and if an edge coverage was read, I call a routine to cull
some of the candidate points based on the edges and the estimation point
location. In this routine, I change only d->sel and d->n_sel. Then the
selection of maxpoints from d->sel continues as usual.
In my edge checking routine, I first create a list of all the edges that
are within d->sel_rad of the estimation point. Then for each edge, I go
through each target-data pair and check the number of crossings. Even
numbers are ok, odd numbers the data point is culled (and you don't have
to check the rest of the edges).
The results are quite striking, and make sense intuitively. I can show
you some samples if you want to see them.
I think other people would be interested in this, especially
soil-mapping types and other geologists. If you want to integrate this
properly into Gstat, I suggest we get together and do it. What do you
think?
Cheers,
Steve.
------------------------
Edzer Pebesma FGB wrote:
>
> I saw that you've started from 1.9c, from july last year.
It took me that long to build it with Microsoft C!
>
> So, we should merge our code somehow. Since you know what's going on, I
> suggest you do this, and I'll thoroughly check it afterwards (that is,
> the edges part, I don't have ER-Mapper).
As with most things like this it is only after I have it working that I
realize how to do it properly. So I think I will change the
implementation a bit in the next round. Right now it is a bit tied to
ER-Mapper vector data structures, but it doesn't really need to be. I
will get the newest version of GStat when its available and put the
changes in in a sensible way.
One thing you may have already caught, we are really using the wrong
variogram models in our kriging-with-boundaries. Sample variograms were
calculated without regard to the boundaries so our autocorrelation is
probably underestimated. I would like to experiment with using the
boundaries in variogram modelling as well. I expect it won't be a big
problem because I can use the same edge checking code.
I have run some co-kriging with the boundaries, using satellite image
pixel values as a covariate. The results look quite sensible, although
we haven't had time to do a real statistical evaluation of whether it
improves point estimation accuracy. For co-kriging, I'm just doing the
same boundary check for the covariate with its own search radius.
I really need a faster neighbourhood search routine for raster input
variables. Do you have one, or do you intend to implement one? If I
can't interest you in it, then I will probably do it myself anyway. I
made one for co-kriging with GSLIB so maybe I can use parts of it.
P.S: .....and that was the end of it (SPJ)