Training data collection

When conducting a classification of land cover or land use, it’s helpful to have training data that can be used as input in statistical classification algorithms, or simply as a support for yourself. These points can be collected on the ground, by visiting different locations and taking GPS points, which is a rather time, energy and money consuming business (unless your study area is in your neighbours back yard), especially since you should also have validation points that you can evaluate your classification with.

Another option for collecting training (and maybe also validation) data is to use Google Earth or Bing maps for visual interpretation. Find a good example of an agricultural field, a city, or a spot of bare soil and go nuts.

I’m sure this could be done in Google Earth by creating points and exporting them as KML files, but I’ve used QuantumGIS and its wonderful OpenLayers plugin that allows me to put in layers from Google satellite, Bing and Open Street Map as background data (this might be the main reason I’m using QGIS, actully). I could also add my ground truth points to compare my visual interpretation with, as well as the Landsat data that I’m planning to use in the classification, to see what for example orchards looks like with 15 or 30 m resolution.

Collecting 260 ground truth points in Duhok governorate took a bit less than a week, collecting 300 points using Google Earth or Bing aerial could be done in a few hours.

But this is not a problem free approach, and some would argue that its scientific accuracy is questionable, although it has been used and discussed by  several researchers (see for example Clark & Aide 2011, Goodchild 2007, and Knorn 2009. The first problem I came across was that I realized that the background data from GE changed positions relative to my shapefile data when I zoomed or panned. It could be a problem related to the “on the fly” projection that needs to be activated when using openlayers, or it could be a rendering problem. In any way, the layer seemed to find its right place after some more panning around, but one should be careful when creating points to make sure that the background data isn’t shifted.

A more general problem is that we don’t know much about the image dates, or the processing that has been applied to the images we use. At different zoom levels and in different areas, different images will be shown and I have yet to figure out if there’s a way to know exactly from which date the image I use is. A simple method I tried was to compare urban areas, such as Duhok City, to my Landsat data. The extent was almost exactly the same (apart from a few buildings) and I could therefore draw the conclusion that this image was from 2012-2013. However, the images are probably mosaicked and therefore probably not from the exact same date everywhere.

So I tried using this approach to increase my sample of training points, and I made sure to double check with the Landsat panchromatic and an NDVI image based on Landsat, to see if what I saw in the high resolution image was similar to the data I was going to base my classification on. Relying solely on Google Earth and Bing data, not knowing what it looks like in Landsat resolution might be a bad idea, but using it together allowed me to find features that I could never find by just looking at the Landsat data.

I would say that this is an interesting area that needs more attention. One step to make it more valid would be if more metadata was given in the OpenLayers plugin (maybe there is but I just haven’t found it?). Furthermore, research is needed to determine the validity of using this approach, compared to using only ground truth data. However, in inaccessible areas with mountains, land mines or other features affecting the ability to collect ground data, this approach could be a good alternative.

Any thoughts on this are welcome!


One thought on “Training data collection

  1. Pingback: First classification done | Population and Environment in the Middle East

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s