For some of the last few evenings, I have been learning about MS Access’s data import functionality in order to interrogate the data of the Global Historical Climatology Network-Monthly dataset. This dataset holds records of temperature readings dating back to the beginning of the eighteenth century from more than seven thousand stations around the globe.
The data are in text files with a fixed width format. It’s very straightforward to set up the import format (although there are many columns!) and save the import specification so that future datasets can be imported as they are released with ease. The largest data files have almost half a million rows, yet Access can import the data in a few seconds. The resulting table of readings of average, minimum and maximum temperatures for each month for each station has more than one million rows. I have not experimented with adding indexes yet but, in spite of this, I can run queries that are not painfully slow.
Now that I have the data into an Access database, I hope to start analysing the data. I have produced a couple of charts for a single station but am yet to run any serious calculations. I have read of a similar project at:
The algorithm that caerbannog (the blogger) uses in his C++ program to smooth the data and calculate averages is fairly simple. Something similar should be possible (or even easy) using the MS Office tools.
Ultimately, my aim for this is to make this the basis of an ICT lesson or project. As caerbannog notes “Never in history has science been more accessible to the general public than it is now.” The quantities of data that can be accessed for free are as enormous as the power of the computers that are now cheaply available. I hope that my students will be inspired to look deeply into the issue and this will help develop their sense of empirical curiosity.