Your guide to the “data jungle”: A summary of the data I have used or reviewed in my research

Data can be one of the biggest challenges for a researcher, especially if you’re junior or entering a new field/subject area. During my PhD I remember thinking about why we didn’t learn how and where to find data during the courses I took in remote sensing and GIS as an undergrad. When working on my paper about drought, I had no idea where to find data, or even what data were available. There were so many different abbreviations and versions that I didn’t know where to start. Asking my colleagues I got some ideas but I realized that few people could provide me a good overview of the most common data products. Looking at what other people use is a good way to get an idea of what people prefer, and during the writing of the paper on scale in environmental migration research, I learned quite a lot about some of the most-used datasets. Based on the data review of that paper, and after adding a few more interesting datasets, I want to share an overview of the data that I’ve worked with so far.

Dataset Temporal Coverage (version) Temporal Resolution Spatial Coverage Spatial Resolution Comments
Climate Research Unit Time Series (CRU TS) version 1.0 – 3.2

1901-1995 (1.0)

1901-2000 (2.0)

1901-2002 (2.1)

1901-2012 (3.21)

Monthly Global (Land Areas) 0.5° Precip. and temp. based on historical weather stations. No of stations used varies for each year. Iraq had almost no stations used for drought year 2008. Refs: [1-4]

1950-2000 Monthly averages for the whole period Global 1 km Monthly Precip. and temp. averages, i.e. not a time series. Generated though interpolation of climate data from weather stations. Refs: [5, 6]
IPCC 1961-1990 Monthly Global (289 countries) Country Precip. based on CRU, aggregated to countries (weighted mean). Refs: [7]
Tropical Rainfall Measuring Mission (TRMM)

1998-2015 3 hours Latitudes 60°N-S 0.25° Satellite based precipitation data. Refs: [8]
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS)

1981-present daily, pentadal, and monthly 50°S-50°N (and all longitudes) 0.05° Sarellite based precipitation combined with rain gauge data. Refs: [9]
Global Inventory Modeling and Mapping Studies (GIMMS)

1981-2006 Bi-monthly Global 8 km NDVI based on data from the Advanced very-high-resolution radiometer (AVHRR). Refs: [10]
Moderate Resolution Imaging Spectroradiometer (MODIS)

2000-present 8-day Global 250 m NDVI – widely used and also offers Enhanced Vegetation Index (EVI) and spectral bands. Refs: [11]

Be critical about data

Some people use data way to carelessly. They seem to think that just because the dataset exists, and is being used, it has no flaws, but all data have their problems. Scale issues are what I have focused on in the scale paper, and this covers quite a lot. The way the data was developed is important to know and understand in order to be able to judge the usability of a dataset for your specific use. Take for example the CRU dataset which is based on interpolated stations data. It has an impressive coverage, both temporally and spatially, and a fairly ok spatial resolution. We might want to use this to understand the rainfall patterns in the Middle East, since it can be difficult to obtain meteorological data there. If you take a look at how many stations were used in the interpolation, and where they are located we can see that there are large gaps in the Middle East, especially in Iraq and the Arabian Peninsula. The image below is from May 2008 but it can be considered representative for more dates. The difficult question is how this affects the validity of the interpolated data. Is this dataset ok to use for the Middle East/Iraq?


Number of stations per gridcell used in the CRU precipitation dataset (TS 3.1). Blue = 0, green = 1 and pink = 2. Image created using the IRI Data Library

Take home message: Understanding for the data is key to understanding the limitations of your research and finding ways to overcome those limitations.

Feel free to share, comment and question! Oh, and speaking about free, all of the data are freely available!


  1. New, M., M. Hulme, and P. Jones, Representing twentieth-century space-time climate variability. Part II: Development of 1901-96 monthly grids of terrestrial surface climate. Journal of Climate, 2000. 13(13): p. 2217-2238.
  2. Mitchell, T.D., et al., A comprehensive set of high-resolution grids of monthly climate for Europe and the globe: the observed record (1901–2000) and 16 scenarios (2001–2100). Tyndall Centre for Climate Change Research Working Paper, 2004. 55: p. 25.
  3. Mitchell, T.D. and P.D. Jones, An improved method of constructing a database of monthly climate observations and associated high-resolution grids. International Journal of Climatology, 2005. 25(6): p. 693-712.
  4. Harris, I., et al., Updated high‐resolution grids of monthly climatic observations–the CRU TS3. 10 Dataset. International Journal of Climatology, 2013.
  5. Hijmans, R.J., et al., Worldclim – Global Climate Data. 2005.
  6. Hijmans, R.J., et al., Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 2005. 25(15): p. 1965-1978.
  7. Mitchell, T.D., M. Hulme, and M. New, Climate Data for Political Areas. Area, 2002. 34(1): p. 109-112.
  8. Kummerow, C., et al., The Tropical Rainfall Measuring Mission (TRMM) Sensor Package. Journal of Atmospheric and Oceanic Technology, 1998. 15(3): p. 809-817.
  9. Funk, C., et al., The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific data, 2015. 2.
  10. Tucker, C., J. Pinzon, and M. Brown, Global inventory modeling and mapping studies. Global Land Cover Facility, University of Maryland, College Park, Rep. NA94apr15b, 2004(11).
  11. Solano, R., et al., MODIS Vegetation Index User’s Guide (MOD13 Series). 2010, Vegetation Index and Phenology Lab.

One thought on “Your guide to the “data jungle”: A summary of the data I have used or reviewed in my research

  1. Pingback: The problem with Climate Data in the Middle East | Population and Environment in the Middle East

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s