Sunday, November 28, 2010

Anomalies Revisted

I posted a link to this blog at this other blog, which BTW is excellent, http://judithcurry.com/2010/11/26/skeptics-make-your-best-case


Someone objected to my lack of use and description of anomalies stating:


Let me explain for you what anomaly means and how it is calculated.To define
a “normal” an analyst selects a period of time, typically 30 years but it could
be the entire length of the time series. I can tell you that it doesnt matter
which period you pick the answer you get is the same. Given your time
period you then create averages for that time period: You average all jan, all
feb etc etc. Then you can create an anomaly or deviation from that normal. This
mathematical operation does not change the shape of curve. it doesnt change
slopes or the magnitudes of difference between points. It merely “scales” the
curve by subtracting a constant. Why do we do this? well, you noted that some
canadian records are short and others are long and you worried about the
weighting. Well anomalies help us with that problem. If you want to actually
know how an anomaly works and why you should use them just write.

As we recall the most number of records reported at stations peaked in the 1980's. Prior to the 1950's there are very few stations. So if I wanted to get the trend for, say, Southern Ontario as a whole, and not just individual stations, this person claims anomalies will somehow compensate for the 80% missing records prior to 1980.

My position is that missing records means the data is biased by the remaining stations and skews the results into a non-realistic view of what happened.

Well, this is easy enough to test to see if missing records makes a difference.

First, we know that even in the regional area of Southern Ontario we have a wide range of temps on the same day throughout the area, as much as 5 or more degrees diff. Harrow, for example is close to the lowest point in Canada near Windsor, right on Lake Erie. Ottawa is about 7 hours drive east and some north. If you tried to average the anomalies between these two locations, and Harrow is missing more than half the years, how can the data NOT be biased towards Ottawa?


So I decided to test this with 15 stations that have data back to 1900 from all across the country (assuring a wide range of temps on the same day). To keep the records short I chose only July TMax temps.

The goal is to get the anomaly for these 15 with full intact records and plot the trend. Then for half of the lowest temp records remove all the records prior to 1980 and replot the anomaly trend to see if there is a biasing difference.


Just to make sure we are on the same page here. Each station's full range average was produced with one query (Access SQL). This became the baseline for each station. I then subtracted the daily TMax from each station matched with its baseline to get the daily anomalies.

From there, the max, average and min of each station's daily anomaly was aggregated together to produce this yearly plot for all 15 stations:



The first thing you will notice is there isn't one line. Every anomaly graph you see presented by climate scientists is just one line, how come I get three? Because I took each day's TMax from the average of the full set of TMax's for each station. TMax has a wide swing of daily temps (recall this is JUST July).

Here is an example. This is just one of those station's TMax temps for July of each year showing the highest TMax, averaged TMax and lowest TMax. In other words, the range of July TMax temps for each year:

Here is the same data zero base lined (anomaly): You will see that the shape of the lines is identical. But importantly, the range of the data is also the same. When climate scientists use anomalies, they are using the AVERAGED data line (the center one) only! So contrary that anomalies don't loose detail, they in fact do lose detail if only average is used. Think of the range they just throw out as error bars. Their anomaly actually is supposed to have "error bars" showing the full range of anomalies for each year. This removal of data from their graphs is greatly troubling. Their sloppiness to detail is also troubling. If engineers were as sloppy with detail, threw out crucial data, as climate scientists do, we would have bridges and buildings collapsing.

In fact, that is exactly what is happening. Their sloppiness to detail is forcing government policy based on sloppy data, which is contributing to the economic problems we face.

Only in climate science are scientists allowed to get away with throwing away data.

Thus, this brings us back to what this site is all about, showing ALL the data, not just the averaged.

So what about the other claim, that anomalies can "fix" missing data?

After I deleted the data on half the stations from 1900 to 1979 and replotted the anomalies I get this graph:



If you look carefully it's different. In fact, it is VERY different. The max anomaly that was dropping is now rising.

Best to subtract the two highest anomalies to see this difference:

The bias is obvious.
Let's see how that biasing affects the number they use, just the average.

The top blue graph is the average anomaly for the full dataset, no missing records. The red graph is with the missing records. Notice it changes the slope (trend) very much when data is missing.

This test clearly shows two things about using anomalies.

1) the use of just the averaged anomalies loses crucial data that shows trends not seen in the average, same as we see in just the raw temperature numbers. The trends of TMax and TMin are vital to know what is going on. Same with the use of the anomalies. Losing the highest and lowest ranges of the anomalies loses the same trends. Throwing out data is scientifically criminal.

2) anomalies cannot fill in missing data. The fewer the stations going back in time, the more the data is biased towards those stations. There is no getting away from this. Worse, there is no way for climate scientists to know how much the biasing is. They can't go back 100 years and built the missing stations and rerun the climate.

This leaves us back to what this site has been doing all along -- showing ALL the data one station at a time. It's the only way to see what has physically happened.

You cannot trust averages to give you a trend. You cannot trust anomalies to give you a trend. They only way you can get a true picture of what's going on is to look at the full range of each individual station. NO massaging of data, NO "cleaning" of the data, NO "tricks" will change that.
























No comments:

Post a Comment

About Me

jrwakefield (at) mcswiz (dot) com