Spatial Interpolation – IDW

 

In order to perform IDW, many parameters need to be set. Most importantly, these are the “Neighbor” and “Power” settings. The neighbor parameters defines how many neighboring points will taken into account for each sector while the power setting defines the strength of the influence that known values have on unknown values based on their distance. For the power parameter, this means that weights will be assigned proportionally inverse to the distance taken to the specific power (*).

 

Since these parameters have to be set on a case by case basis, the suitability and thus the validity of the parameters need to be established. For this purpose, ArcGIS provides the cross-validation method in the Geospatial Wizard in the Inverse Distance Weighting section.

 

This method works by removing one of the points from the dataset and consequently interpolating the value for this point as if it was unknown. Repeating this for every single datapoint results in the mean error of the predictions. Graphing the measured value for each point against the predicted values results in a graph where under perfect conditions, the points should be on a perfect line. Since some errors are certainly made, the actual points are not along that line but a bit off. Creating a regression line, the error can be made visible by comparing the deviation of the regression line with the perfect 1:1 line.

Image 1: Screenshot of the cross-validation graph as created by the Geospatial Wizard, showing the measured values against the predicted values as well as the regression line.

 

Practical application of IDW

In this exercise, a point layer with temperature measurements spanning most of Europe as well as some neighboring regions is given. Based on a choice of AOI, the temperatures is interpolated for the whole AOI using the IDW method in ArcGIS Pro.

As the area of interest, the Fenoscandian nations of Sweden, Norway and Finland were chosen. Mainly because of their large north-south extent as well as the climate differences near the Atlantic cost compared to the near-continental climate of eastern Finland, quite large differences in temperature are to be expected. Additionally, in some areas the measurements are quite sparse, which might lead to the clearer manifestation of differences in IDW parameter settings.

Additionally, a 25 km buffer was drawn around the borders of the nations, since many measurement points are just outside of the vector boundary on small islands. Of course the IDW implementation in ArcGIS Pro creates the interpolation values and then clips the interpolated area to the boundary, but it is certainly of interest to enlarge the AOI a few km into the sea in order to visualize how the algorithm deals with the linear placement of the measurement points.

Image 2: On the left, the points outside of the boundary are visible. In order to also see the interpolation results between those points along the coast, a buffer was created in order to encompass also the space between them (right).

Based on the points and the boundary layer, the IDW was performed multiple times with changing parameters. The interpolation was performed 4 times with varying Neighbor and 4 times with varying Power parameters:

Table 1: Different IDW parameters, 4x varying power values and 4x varying neighbor values.

 

For the visualization, a single color ramp made out of shades of green was chosen. Multiple colors make it harder to visualize the continuety, since putting all colors on a scale of warmer to colder does not arrive naturally to every user. These shades of green seemed to be easily distinguishable as well as easing the comprehension that darker equals wormer. The classes were defined in 3° steps starting from 0 up to 15, which is just above the maximum value.

Image 3: GIF cycling through different Neighbor settings, displaying the resulting differences in temperature areas.
Image 4: GIF cycling through different Power settings, displaying the resulting differences in temperature areas.

 

 
Next up, the Interpolation results are exported as tiffs and histograms created via Python & matplotlib with the following code.
import numpy as np
#import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
Image.MAX_IMAGE_PIXELS = None

paths=[
"tiffs/IDW_Neighbors_12_Power_0,5.tif",
"tiffs/IDW_Neighbors_12_Power_2.tif",
"tiffs/IDW_Neighbors_12_Power_3.tif",
"tiffs/IDW_Neighbors_14_Power_2.tif",
"tiffs/IDW-Neighbors_12_Power_1.tif",
"tiffs/IDW-Neighbors_16_Power_2.tif",
"tiffs/IDW-Neighbors_18_Power_2.tif"]

def extract_title(path):
    name = path[6:-4]
    name = name.replace("_"," ")
    return(name)
    

def to_np(path):
    im = Image.open(path)
    im_array = np.array(im)
    im_array = im_array[~np.isnan(im_array)]
    im_array[im_array == 0] = np.nan
    return im_array


def graph(im_array,name):
    """
    set bins to either same bins as in Arc or per degree
    """
    bins = [0.0001,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
    #bins = [0.0001,3,6,9,12,15]
    bins = np.arange(0,16,0.2)
    # draw histogram
    n, bins, patches = plt.hist(im_array[~np.isnan(im_array)], bins=bins,color="red", alpha=0.75, rwidth=0.85)
    #set y grid
    plt.grid(axis='y', alpha=0.75)
    # calculate mean
    im_mean = im_array[~np.isnan(im_array)].mean()
    # add mean line
    plt.axvline(im_mean, color='k', linestyle='dashed', linewidth=1)
    # Set ticks, labels and title
    plt.xlabel('Temperature (deg. C)')
    #plt.ylabel('Frequency')
    plt.ylim(top=2700000)
    plt.title('Histogram of '+name+"\nMean: "+str(round(im_mean,4)))
    plt.xticks([0,3,6,9,12,15])
    #set output name to sth computer readable and save
    output_name = name.replace(" ","_")+"_1degree_bins.png"
    plt.savefig(output_name,dpi=200)
    print("saved: ",output_name)
    # clear figure after saving
    plt.clf()
    plt.close("all")



for path in paths:
    title = extract_title(path)
    graph(to_np(path),title) 

 

Regarding the Neighbors parameter, the differences between 12, 14, 16 and 18 are not very big. Apparently, extending the “radius” to more neighboring points does not change the resulting interpolated map on this scale. This is also confirmed by inspecting the histograms, which show very similar mean values and distributions.

In this area of interest, changing the Power parameter on the other hand has a high influence the interpolation result. With a lower value, the areas exhibit a larger clustering and a larger spatial auto-correlation than with higher values (to test that claim, the Moran’s I value will also be calculated at the end of the exercise). Since raising the power value strengthens influence of spatially nearer known points, the map becomes much more fragmented. To visualize this phenomenon better, the Geospatial Wizard was used to quickly visualize the interpolation areas with extremely high and extremely low power.

Image 5: Very low power values lead to smoother edges (left), while very high values lead to sharp edges, almost resembling the “nearest neighbor” map (right) because the spatial distance is that heavily weighted.

IDW quality assessment of AOI

Now it is time to look at the quality assessment for our Area of Interest. Using the Geospatial Wizard and the above-described method, we look at the mean averages for our most extreme cases for the power parameter, since that shows the biggest difference.

Image 6: Geospatial Wizard cross-validation result for the power setting of 3, showing a mean error of 0,05.
Image 7: Geospatial Wizard cross-validation result for the power setting of 1, showing a mean error of only 0,02. This indicates to a high degree of accuracy of the IDW for this parameter.

On closer inspection of the quality assesment methods, it is questionable how valid this method is. The regression line is closer to the expected line for the power 1 setting, but the mean error worse. This might be because both “upwards” and “downwards” errors are taken to calculate the mean value. Therefore, a higher standard deviation equally distributed both upwards and downwards might be evened out and lower error rate shown than is actually warranted.

Out of interest, I also performed the following steps which are not part of the assignment submission

 

Moran’s I applied to IDW results

Trying to verify the previous assumption that a higher power value more sharply fragments the result, the Moran’s I spatial auto-correlation values were calculated for the power parameters 1 and 20. Since I previously worked with this method and already created the code a while ago, it seems interesting to try it out on this dataset. Unfortunately, I was not able to draw any conclusions from this method. It returns a value very close to +1 for every tiff I throw at it, even ones where I would say the spatial autocorrelation should be lower, more towards 0 and thus “random” distribution. I would be happy to hear some feedback on the method, which is described in detail below.

 

Firstly, the raster images were reclassified.

 Image 8 & 9: Reclassified tiffs in 3°C steps. the visualization is not important in this case because the pixel values themselves, which represent the classes, are loaded into the python method.

 

The Moran’s I value is calculated with the following code, which also creates the charts:

import pysal
from skimage.io import imread
from libpysal.weights import lat2W
import numpy as np
from esda.moran import Moran
from skimage.color import rgb2gray
from splot.esda import moran_scatterplot
import matplotlib.pyplot as plt

"""Loading images as numpy arays, removin nan values"""
im_array1 = imread(r'data/Reclass_P0_5.tif')
im_array1 = np.nan_to_num(im_array1)
im_array2 = imread(r'data/Reclass_P20.tif')
im_array2 = np.nan_to_num(im_array2)
print("image/s loaded")



def Morans_I(data,out_name):
    data_gray = data
    col,row = data_gray.shape[:2]
    WeightMatrix = lat2W(row,col)
    #WeightMatrix = np.nan_to_num(WeightMatrix)
    WeightMatrix = lat2W(data_gray.shape[0],data_gray.shape[1])
    MoranM= Moran(data_gray,WeightMatrix)
    fig, ax = moran_scatterplot(MoranM, aspect_equal=True)
    print(MoranM)

    print("Raster Dimensions:\t" + str(data_gray.shape))
    print("Moran's I Value:\t" +str(round(MoranM.I,4)))
    plt.savefig(out_name,dpi=200)
    plt.show()
    
    

Morans_I(im_array1,"MI_P0_5.png")
Morans_I(im_array2,"MI_P20.png") 

The graphs and values returned from the method are as follows:

Images 10 & 11: IDW Power 2 (left) and IDW Power 20 (right) create almost identical results. The method was also recreated for extremely highly fragmented tiffs as well as tiffs with few classes that are spatially clustered, but the results did not change.

I am quite sure that the implementation of the Moran’s I algorithm works properly, as seen in my GitHub repository which runs the caculation on some sample images:

What needs to be changed in order for this to work? Or do I have a misunderstanding about how the Moran’s I works?

My ideas are:

  • playing around with the weight matrix
  • If big areas of confirmity with all their pixels within are counted as high positive spatial autocorreclation, then the next steps would be:
    • increase the number of classes during reclassification in order change how many neighboring pixels are on average of a different class
    • changing the algorithm to vector-based analysis to eliminate the pixel-based problem all together


I’m looking forward to your feedback! 

Further Reading
Recent Updates