Welcome to the Cumulus Support forum.

Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024

Cumulus MX V4 beta test release 4.0.0 (build 4018) - 28 March 2024

Legacy Cumulus 1 release v1.9.4 (build 1099) - 28 November 2014 (a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

Some issues in NOAA figures

From build 3044 the development baton passed to Mark Crossley. Mark has been responsible for all the Builds since. He has made the code available on GitHub. It is Mark's hope that others will join in this development, but at the very least he welcomes your ideas for future developments (see Cumulus MX Development suggestions).

Moderator: mcrossley

User avatar
HansR
Posts: 5871
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bullseye
Location: Wagenborgen (NL)
Contact:

Some issues in NOAA figures

Post by HansR »

During some experimenting I calculated some averages for Temp, Max Temp and Min Temp using the standard method in C# for listhandling:

Code: Select all

        float average = yearlist.Select(x => x.AverageTempThisDay).Average();
        float mintemp = yearlist.Select(x => x.MinTemp).Average();
        float maxtemp = yearlist.Select(x => x.MaxTemp).Average();
For the year 2020, I compared this to the NOAA report of 2020 (status of 7 march, so up to 6 march).
The results are:

Code: Select all

                  Temperature (°C), Heat Base: 18,3  Cool Base: 18,3
                          Dep.  Heat  Cool                       Max  Max  Min  Min
        Mean  Mean        From  Deg   Deg                        >=   <=   <=   <=
 YR MO  Max   Min   Mean  Norm  Days  Days  Hi  Date  Low  Date 25,0 30,0  0,0  -5,0
------------------------------------------------------------------------------------
 20  1   7,8   3,2   5,6   2,5   391     0  12,2   9  -0,9    1    0   31    3    0
 20  2   8,8   3,7   6,3   3,0   349     0  14,1  16  -1,1    5    0   29    1    0
 20  3   8,4   1,7   4,8  -1,4    81     0   9,2   3  -1,0    5    0    6    1    0
------------------------------------------------------------------------------------
         8,3   2,9   5,6   1,4   820     0  14,1feb.  -1,1 feb.    0   66    5    0
And my own results are:

Code: Select all

[2020, Average: 5.82, Min Av: 3.29, Max Av: 8.31],
Both use dayfile.txt as source. I assume these numbers should be the same but they are not. The average is 0.2 degr. off and the minimum is 0.4 degrees off. The maximum is OK. I can't find the cause when I look in the code (NOAA.cs, CreateYearlyReport) apart from the fact CumulusMX does all the arithmetic itself while I use standard methods. Also CumulusMX uses doubles, I use floats. Can't image that making a (rounding) difference in the first decimal.
0.4 off is a lot so the questions is how to interpret this. It's not about validity of measurement, it's about validity of math by CumulusMX or me (not only in NOAA).

Any similar observations?
Any opinions?
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
sfws
Posts: 1183
Joined: Fri 27 Jul 2012 11:29 am
Weather Station: Chas O, Maplin N96FY, N25FR
Operating System: rPi 3B+ with Buster (full)

Re: Some issues in NOAA figures

Post by sfws »

I know your posting is in Cumulus MX sub-forum, and I dont use that. However, I did notice some first decimal place rounding differences between web tags and dayfile.txt output for some statistics that I checked in my previous use of Cumulus 1 for roughly a decade. This applied to both individual day and monthly periods; although I don't claim to have studied this nor to have done through testing. Given that I do not expect my Fine Offset to yield perfect measurements, I have not worried when I see some discrepancies in calculated statistics!
I know that MX initially ported a lot from Cumulus 1 as machine code routines, so it may be some rounding errors were perpetuated, but I would be interested to see any replies to the original post from anyone who does know how calculations are done. I have (again in past when I was learning coding) attempted to replicate a lot of the cumulus calculations that convert temperature to other statistics with help of wind speed and humidity, so it is not just averaging that I am interested in.

EDIT: I typed "web tags" above, of course I meant "NOAA reports". As Mark's posting earlier today (8 March) confirms what I meant was same problem occurs in Cumulus 1.

Just as an aside, I wonder whether it would make sense to have a re-organisation of the sub-forums now that Cumulus MX is dominant, should it not have sub-forums to make it easier to find related postings? And in my opinion the CM annoucements could be split between those by Steve Loft, and those by Mark Crossley to match with the Wiki download of MX treatment of before and after Sandaysoft. Given that Mark has now added in many of the features that Steve had in Cumulus 1 but not in his MX, and MX has moved out of beta; I think it could be confusing to read the early Steve MX annoucements.
Last edited by sfws on Sun 08 Mar 2020 4:19 pm, edited 1 time in total.
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

Hi Hans,

I just imported my dayfile into Excel and created a pivot table to analyse it. All the values for the first three months of 2020 match exactly what the NOAA report shows, but the year figures do not...

Code: Select all

                  Temperature (°C), Heat Base: 15.5  Cool Base: 18.3
                          Dep.  Heat  Cool                       Max  Max  Min  Min
        Mean  Mean        From  Deg   Deg                        >=   <=   <=   <=
 YR MO  Max   Min   Mean  Norm  Days  Days  Hi  Date  Low  Date 27.0  0.0  0.0 -18.0
------------------------------------------------------------------------------------
 20  1   9.2   4.0   6.6   2.4   277     0  13.4   7  -2.0   19    0    0    3    0
 20  2   9.3   2.7   5.9   1.4   280     0  12.4  24  -1.8    6    0    0    2    0
 20  3   8.9  -0.3   3.9  -2.4    70     0  10.7   6  -3.3    6    0    0    4    0
 20  4
 20  5
 20  6
 20  7
 20  8
 20  9
 20 10
 20 11
 20 12
------------------------------------------------------------------------------------
         9.1   2.1   5.4   0.4   626     0  13.4 Jan  -3.3  Mar    0    0    9    0

Excel...
Capture.PNG
So, some digging required....
You do not have the required permissions to view the files attached to this post.
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

I see the problem, it is summing the monthly values then just dividing by the number of months. It takes no account of the number of days that went into each monthly value. So March is having a disproportionately large effect.

I look at fixing it, who knows I may even convert it to use some LINQ ;)
User avatar
HansR
Posts: 5871
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bullseye
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

@sfws: With respect to measurement statistics I normally do worry about discrepancies (and rightly so if I read Marks answer) if the problem is algorithmic. Those calculations should be correct as errors here will propagate through other calculations and it is hard to know the effect.

I don't regard calculation, as e.g. apparent temperature, as a statistic but as a kind of meteorological derivative (not in the mathematical sense), nice to haves. Those calculations can be (vastly) impacted by errors in measurement statistics. But I do think we both agree, the numbers should be correct. I don't think rounding errors should/can be visible in an application like this (either using doubles or floats), so if errors are noticed, suspicion to algorithmic errors are logical.

With respect to your aside, that the forum could use a review, I agree. Even some moderators cannot be contacted anymore. Would you be willing to take this up with me (in communication with Mark who apparently is the main moderator now). If so, contact me by PM/CC: Mark, and we could take a look where and how to go. As another aside - which Mark launched recently - with respect to documentation for the Wiki btw, I think native English speakers are required.

@mcrossly:
See also my reply to @sfws above.
mcrossley wrote: Sat 07 Mar 2020 9:32 pm I see the problem, it is summing the monthly values then just dividing by the number of months. It takes no account of the number of days that went into each monthly value. So March is having a disproportionately large effect.
Thanks for looking into it, I suspected something like this.
mcrossley wrote: Sat 07 Mar 2020 9:32 pm I look at fixing it, who knows I may even convert it to use some LINQ ;)
I am not sure it is worth the effort for v3, I assume the v4 guys will use the proper techniques to prevent these pitfalls. I noted the error so copying parts of code would not be done blindly, reviewing is a requirement. V4 is still going strong isn't it? Btw, using LINQ also has some things to take care of - I sometimes fall in the trap of multiple full table scans - but it definitely is worth using it. Powerful technique.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

You are right, path of least effort for v3 - I'll take a look at fixing it tonight.

Looking at complete years the residual error is small, so the figures appear to be correct, this only really manifests itself near the start of a month, and especially the start of a month early in the year - good spot.
sfws
Posts: 1183
Joined: Fri 27 Jul 2012 11:29 am
Weather Station: Chas O, Maplin N96FY, N25FR
Operating System: rPi 3B+ with Buster (full)

Re: Some issues in NOAA figures

Post by sfws »

mcrossley wrote: Sat 07 Mar 2020 9:32 pm I see the problem, it is summing the monthly values then just dividing by the number of months. It takes no account of the number of days that went into each monthly value. So March is having a disproportionately large effect.

I look at fixing it, who knows I may even convert it to use some LINQ ;)
Now I remember, that was what I guessed might be the issue when (some years ago) I was experiencing the issue. My php scripts calculate "statistics"
HansR wrote: Sun 08 Mar 2020 9:05 am I don't regard calculation, as e.g. apparent temperature, as a statistic but as a kind of meteorological derivative (not in the mathematical sense), nice to haves. Those calculations can be (vastly) impacted by errors in measurement statistics.
by taking daily summary data and multipying it by 23, 24 or 25 hours (taking clock changes GMT/BST into account), together with today's data multiplied by hours so far today, and then divide that summation by the total number of hours.
So that is how I do the best I can with what my weather station outputs. I have recently revisited the (January 2015 vintage) script that does most of such calculations for me, and rewrote it, reducing it by about a thousand lines of code by recognising I was doing separate calculations for each non cumulus web tag meteorological derivative, whilst they fell into just 4 types and each could potentially be handled by a function.
HansR wrote: Sun 08 Mar 2020 9:05 am With respect to your aside, that the forum could use a review, I agree. Even some moderators cannot be contacted anymore. Would you be willing to take this up with me (in communication with Mark who apparently is the main moderator now). If so, contact me by PM/CC: Mark, and we could take a look where and how to go. As another aside - which Mark launched recently - with respect to documentation for the Wiki btw, I think native English speakers are required.
Since I don't yet use MX, I'm not the person to decide the best forum arrangement for inexperienced users, nor am I the person to tackle the documentation backlog. During Steve Loft's time writing the software and answering questions in the forum; I did a massive amount of documenting on Wiki based partly on his answers (I was going back to earliest answers on forum as well as reading latest) and partly on my experiments with Cumulus and learning about web pages. I still from time to time consult the Wiki, and try to correct it or add to it so the bits I edit do reflect both Cumulus 1 and Cumulus MX, but these are based on recent points made in forum; I don't have time to look at all forum or all wiki. I have recently been focussed on unpacking and sorting in my new home, and you have to take breaks from that! Now with more spring like weather I hope to soon finish the indoor work and get out to enjoy my new neighbourhood.
What I will say is that documentation does need to be a collaboration between those who understand the code and those who are inexperienced and know who needs documenting; I have been retired for a long time, but I did for a short period work in analysis, design, and documentation, for computer projects during the 1980s. I deliberately try not to remember those days!
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

Oh, and for reference, I used that same day file with Cumulus 1 - and unsurprisingly it has the same bug.
sfws
Posts: 1183
Joined: Fri 27 Jul 2012 11:29 am
Weather Station: Chas O, Maplin N96FY, N25FR
Operating System: rPi 3B+ with Buster (full)

Re: Some issues in NOAA figures

Post by sfws »

I have just edited the Wiki entry for "Average Temperature" adding reference to MX builds in anticipation of Mark's next release! I found that on 1 June 2013 I updated that entry with details of the NOAA report and said (back then) that only the monthly NOAA figures reflected the "intregated means" as I called them (rather confusingly in hindsight?). I did not explicitly say back then that the annual figures could be arithmetrically wrong, it was implied by omission!

It was comforting to find I do document issues I find by experience, and since I am often saying I have a bad memory (and some "friends" suggest dementia is on its way), it was also reassuring that there is more evidence of my previous encounter with this issue. By the way, I have also edited my earlier post in this thread as I typed "web tags" instead of "NOAA reports", that however is evidence I don't always write the right words! The other evidence is that in last hour, or so, I got out my long-term back up disc and found copies of scripts I used while checking the NOAA reports! I have been deleting lots of old scripts as my short-term back up disc is nearly full, but not gone back that far yet.

Anyway, I have clarified the postion for annual averages by my added text (in bold) at https://cumuluswiki.org/a/Average_tempe ... us_outputs.
User avatar
HansR
Posts: 5871
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bullseye
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

@mcrossley: at least there's continuity in the system :D
@sfws: I think that covers it.

I don't really get the paragraph alternatives, but that's probably because I don't have C1. And further in that paragraph: I think the median is not very often used in meteorology and the average if the min/max temperature I think, is at best a first order estimate of the true average and nothing more.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

OK, I have posted ver 3.4.4 to fix this.

I went back and regenerated all my annual reports since 2010... Many were OK, with no change, some it made 0.1 difference to some figures. So like I said, the error on a full year was small.
alexlc13
Posts: 24
Joined: Thu 22 Aug 2013 8:35 pm
Weather Station: Davis Vantage Pro2
Operating System: Windows 10
Location: Piedmont, SD

Re: Some issues in NOAA figures

Post by alexlc13 »

I work for the National Weather Service in the USA and thought I'd chime in on how calculations are made for US climate stations (in case anyone is curious).

There has been much discussion as to whether to average months or days to get annual temperature averages. The NWS and NCEI (National Centers for Environmental Information) calculate annual temperature averages by averaging the months. It is agreed that averaging days is more accurate, but because averaging months is what has always been done historically, then that's how the calculations will continue to be made.
sfws
Posts: 1183
Joined: Fri 27 Jul 2012 11:29 am
Weather Station: Chas O, Maplin N96FY, N25FR
Operating System: rPi 3B+ with Buster (full)

Re: Some issues in NOAA figures

Post by sfws »

mcrossley wrote: Sun 08 Mar 2020 8:56 pm Many were OK, with no change, some it made 0.1 difference to some figures. So like I said, the error on a full year was small.
Yes, it is during the year that the (small) difference is noted, especially in Spring in a non leap year, but of course some might say our climate is so wild that seasonal patterns are gone for ever?
alexlc13 wrote: Sun 08 Mar 2020 9:41 pm There has been much discussion as to whether to average months or days to get annual temperature averages. The NWS and NCEI (National Centers for Environmental Information) calculate annual temperature averages by averaging the months. It is agreed that averaging days is more accurate, but because averaging months is what has always been done historically, then that's how the calculations will continue to be made.
Thank you for that, I deduce Steve Loft knew that and therefore deliberately made his NOAA style reports work like that, as I know he was trying to imitate the USA report contact (hence the naming as "NOAA" style).
I still believe that Mark is right having swapped to the more accurate calculation for the future, it is more consistent with how other parts of Cumulus work. It is also likely to match what happens in most countries outside USA.
HansR wrote: Sun 08 Mar 2020 6:05 pm I don't really get the paragraph alternatives, but that's probably because I don't have C1. And further in that paragraph: I think the median is not very often used in meteorology and the average if the min/max temperature I think, is at best a first order estimate of the true average and nothing more.
Good to hear your opinions, as they match mine.

The page history tells me I added that paragraph on 30 May 2013, my memory is not good enough to recall the exact rationale, but I believe there was a lengthy discussion comparing two means in one of the sub-forums prior to this, with some expressing preference for one, and some for the other. I presume somebody raised the question about median, I know some users of Cumulus show median and standard deviation either in a table or on a graph on their websites. I am guessing I tried to record the balance of the views, as I said I added a lot to Wiki attempting to summarise forum discusions as back then lots of people were searching the Wiki for help (Steve at that time had a notice on forum - Please read this first - the gist of which was that people before asking for help on forum see if your question is answered on the wiki, most simple questions being in FAQ section of wiki).

a) For each day, adding maximum and minimum that day, then dividing that sum by twice times the number of days. I have seen this quoted in weather station reports, especially in the observer manual reading days, not so much with automated stations.
b) What I called then the integrated mean, applying it in the mathematical sense of combining (summing) all available values, this is what this thread has been about. In terms of Cumulus processing it is the one represented by the sum and count parameters in today.ini

In searching the forum, I found viewtopic.php?f=4&t=11905&p=95228 dating from 2014 (i.e. after that paragraph went into Wiki), that has some relevance, although its focus is on monthly not annual figures. Unfortunately, the forum won't let me see anything pre 20 May 2013, so I imagine that lengthly conversation I mentioned is lost. It is around the time that Steve Loft switched the forum onto a new platform and he observed that some content was lost.
User avatar
HansR
Posts: 5871
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bullseye
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

mcrossley wrote: Sun 08 Mar 2020 8:56 pm OK, I have posted ver 3.4.4 to fix this.
Thnx for the quick reaction Mark :!:
alexlc13 wrote: Sun 08 Mar 2020 9:41 pm I work for the National Weather Service in the USA and thought I'd chime in on how calculations are made for US climate stations (in case anyone is curious).

There has been much discussion as to whether to average months or days to get annual temperature averages. The NWS and NCEI (National Centers for Environmental Information) calculate annual temperature averages by averaging the months. It is agreed that averaging days is more accurate, but because averaging months is what has always been done historically, then that's how the calculations will continue to be made.
Thanks for the contribution!

I checked with the KNMI (Dutch responsible Meteo organisation) and they follow the WMO (just as I think the USA does) and I do not see any discrepancies. I have difficulty in displaying formula's but for temperature it goes as follows (assuming full data presence):
  1. The day temp average is using hourly measurements (ooh, 01h...23h); sum and divide by 24;
  2. The month average is Sum(day average temp)/(nr of days in month);
  3. The year average is Sum(month average temp)/12.
To be honest, I don't see any difference with a calculation with Sum(all day temp average)/(year nr of days).

There shouldn't be because:
  • Sum(all days temp januari)/31 + Sum(all days temp february)/28 = Sum(all days Jan and Feb)/(31+28).
Effectively it is the average of the days. It is only if you take rounding in the different steps always to first decimal you start making errors and transferring those errors to subsequent calculation introduces errors in the final value. So errors are rounding errors and that we can avoid by using computers.

So, in my opinion, averaging on day value is correct. Where it starts to differ is when you are taking the year average with a current month as sfws described in the Wiki.

The only real issue is the daily average. The WMO described the hour values, meaning you actually take point values and that may not be correct if temperature fluctuates heavily during the day (yes, it happens). However, that is historical and changing that would probably mean a break with existing time series. That is possible, but requires a lot of calculation and above all a lot of explaining as has happened in the Netherlands recently (on a slightly different issue concerning time series).

As we all have automated weather stations, with different intervals, it does make sense to actually use all samples in the calculation of the day average and take that as an estimate of true average day day temperature. I think this full sapling is legitimate for PWSs. This would be the only difference, from there on, month and year average follow.

Not taken into account is missing data for instance when the station fails. That is when averages start being really off. There is no method in Cumulus for estimating the averages when that happens. I guess there should be some statistics/estimation necessary, but for the current discussion that would be off track.

@sfws: Thnx for updating and deepening the discussion and thinking about the subject of averaging.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12695
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

HansR wrote: Mon 09 Mar 2020 7:54 am
I checked with the KNMI (Dutch responsible Meteo organisation) and they follow the WMO (just as I think the USA does) and I do not see any discrepancies. I have difficulty in displaying formula's but for temperature it goes as follows (assuming full data presence):
  1. The day temp average is using hourly measurements (ooh, 01h...23h); sum and divide by 24;
  2. The month average is Sum(day average temp)/(nr of days in month);
  3. The year average is Sum(month average temp)/12.
To be honest, I don't see any difference with a calculation with Sum(all day temp average)/(year nr of days).

There shouldn't be because:
  • Sum(all days temp januari)/31 + Sum(all days temp february)/28 = Sum(all days Jan and Feb)/(31+28).
Effectively it is the average of the days. It is only if you take rounding in the different steps always to first decimal you start making errors and transferring those errors to subsequent calculation introduces errors in the final value. So errors are rounding errors and that we can avoid by using computers.
There is a difference because the sample size for each month is slightly different - they have different numbers of days each, therefore each month should be weighted if it is exactly match the sum(all days)/count(all days) calculation - again assuming samples for all days.

With Cumulus we cannot assume that we have a full data set - indeed for the current year we never will, so I think the only sensible approach is to average by day rather than month. Best we can do, and there is so much annual variation I don't think it matters too much in the scheme of things anyway, maybe once we have been running Cumulus for 200 years...!
Post Reply