Wednesday, March 3, 2010

Playing with Anomalies

When you see plots you will mostly see them as an anomaly. What this means is a base line of the dataset is obtained, and all subsquent runs are subtracted from that baseline. The choice of the baseline is arbitrary. It will be a range between two years.

The UK Met office explains anomaly as:

Absolute temperatures are not used directly to calculate the global-average temperature. They are first converted into ‘anomalies’, which are the difference in temperature from the ‘normal’ level. The normal level is calculated for each observation location by taking the long-term average for that area over a base period. For HadCRUT3, this is 1961–1990.

For example, if the 1961–1990 average September temperature for Edinburgh in Scotland is 12 °C and the recorded average temperature for that month in 2009 is 13 °C, the difference of 1 °C is the anomaly and this would be used in the calculation of the global average.

One of the main reasons for using anomalies is that they remain fairly constant over large areas. So, for example, an anomaly in Edinburgh is likely to be the same as the anomaly further north in Fort William and at the top of Ben Nevis, the UK’s highest mountain. This is even though there may be large differences in absolute temperature at each of these locations.

The anomaly method also helps to avoid biases. For example, if actual temperatures were used and information from an Arctic observation station was missing for that month, it would mean the global temperature record would seem warmer. Using anomalies means missing data such as this will not bias the temperature record.

So they calculate a base line for each station for each month. The claim is that this levels out the absolute temps.

Here is what the range of temps was for July 1980 for Southern Ontario. This year was picked because all but 3 stations have data for that year, the highest of all the years.

You can see that there is a 5C difference around the province and those differences tend to be geographic, well, somewhat. This then could be considered the "type" temperature regime for the region for July. The yellow dot is the average of all those stations and represents the balance point for the region. Were these weights and not temps, you could balance the region on the end of your finger on that point. (one would have to check every month to see if it follows the same pattern, wanna bet it doesn't?). But this is only the "type", that location is only the balance point, because 1980 happens to be only 1 of 3 years, out of more than 100 years, that has this many stations represented.

In this example Harrow is the hottest location, however that station only has a small number of years. This means for years that Harrow is not represented, that balance point will have moved. The fewer the stations, the move skewed the balance point can be.

So to counter this problem, the anomaly is calculated for each region. So let's see how that works.

For Ottawa (stn 4333) I calculated the "normal" base line using their date range (1961-1990) for each month. I then linked the raw average data for that station to this recordset and got the temperature monthly anomaly above. The moving average is 120 months (10 years).

Then I averaged the monthly baseline to give a baseline for the entire years 1961-1990, and plotted that with the average of the yearly means to give this anomaly. The moving average is 10 years.

Hold on a second. How can the slopes be different? The yearly anomaly trend regression line slope is TEN TIMES larger than the monthly regression line. However, look at the two on top of each other.

The reason for the slope difference is because of the length of points on the x-axis. The monthly normal is 12 * 109 years, where as in the yearly normal graph it just 109 years. So we can use either of these methods to generate the anomaly graphs. So for simplicity we will use the yearly normal plots.

As we have seen in other posts what is important is not the trend of the average of the yearly means, but the trends of the hottest and coldest days of the year. Those two converging trends that must be part of a cycle. Let's see anomalies of those:

This is the anomaly for the max temps for each year:

This is the average of the actual max temps for each year:

They match up prettly close, especially their slops and R2. I would expect this. Except one thing. This is claiming, both the average and the anomaly, that the max temps of the year are increasing. But for this location, they are not. The summer temps are decreasing. This average in the max temps is for the entire year, and we know that the winters are warming, so that is what will increase that max temps. In other words, this is not depiction of range, but the same old problem of AVERAGES being influenced by changes in the ranges within each year.

So we can check that. Let's see what these anomalies look like using the highest of the maximum temperatures. This is the anomaly of each years highest maximum (summer) temps.

It shows that the hottest temperatures of the year are increasing, getting hotter. But when you plot the ACTUAL highest maximum temperatures in each year you get this:

It's decreasing!! Thus the anomalies, because it is a comparison to a "base line" of years, can produce a trend that in reality is not there. Thus anomalies cannot show us what is physcially going on within the extreme ranges of the years. And it is those extreme ranges that AGW proponents claim is changing for the worst. It's not. Cooler summers (not as many heat waves) and not as cold winters (fewer deep freeze days) is not bad at all, but quite good.

Let's see what happens if we choose a different base line, say the entire range of data (why they choose 1961-1990 is a mystery).

Pretty close to the average mean anomaly in the second graph from the top. The slopes are close, so is the R2. So why restrict the base line to a subset of the records, and not the entire set if they give the same results? Let's try a real short range for the base, say the period where temps were increasing (the range they choose of 1961-1990 is inside the drop in temps during the 1945-1975 period). So we will choose the range 1980-2000. The so called hottest period.

Notice no change in the slope, the shape of the graph and the R2. All that has changed is the B part of the equation, which is expected as all you are doing by changing the base years is shifting the graph up or down.

So there is no mathematical reason to choose a specific range of years. If you want to see the anomaly for any given station to compare to other stations in different regions, then one should use the entire range of years to get the base line.

Part two of this will show how the anomalies differ from region to region to see if their premise of anomalies as a tool to see the over all change is true.

No comments:

Post a Comment

About Me

jrwakefield (at) mcswiz (dot) com