Welcome to the Cumulus Support forum.

Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024

Cumulus MX V4 beta test release 4.0.0 (build 4019) - 03 April 2024

Legacy Cumulus 1 release 1.9.4 (build 1099) - 28 November 2014
(a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

Some issues in NOAA figures

From build 3044 the development baton passed to Mark Crossley. Mark has been responsible for all the Builds since. He has made the code available on GitHub. It is Mark's hope that others will join in this development, but at the very least he welcomes your ideas for future developments (see Cumulus MX Development suggestions).

Moderator: mcrossley

User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

Just because it remains in my head the following.
mcrossley wrote: Mon 09 Mar 2020 9:16 am There is a difference because the sample size for each month is slightly different - they have different numbers of days each, therefore each month should be weighted if it is exactly match the sum(all days)/count(all days) calculation - again assuming samples for all days.
I understand what you are saying but I disagree and think it is not true (as I have shown, the final average for the period is the same by whichever calculation).
It is somewhat confusing, but weighing is not required because we are not sampling the month: we are using all days and all months with the values of the day. It is only when a month or year is not finished we get discrepancies. Weighing is required when we sample a population with different ratios from the full population.

The weight of the month is irrelevant because we are dealing with a daily estimator of the average temperature. Therefore, since we have no continuous temperature measurement with a corresponding integration method to arrive at a true mean, we use discrete measurements and we use the statistical mean as an estimator for the day, which becomes a population value for the month. The mean (average) of the month is calculated in a full population of numbers (the day estimates) irrelevant whether it has 28 or 31 days. I do not see a role of the weight of a month or how even to use a weight in subsequent calculations in a full population. As I have shown, the average of two months is mathematically identical to the average of all days included. Again, we are not sampling here.
mcrossley wrote: Mon 09 Mar 2020 9:16 am With Cumulus we cannot assume that we have a full data set - indeed for the current year we never will, so I think the only sensible approach is to average by day rather than month. Best we can do, and there is so much annual variation I don't think it matters too much in the scheme of things anyway, maybe once we have been running Cumulus for 200 years...!
Agreed. We have an incremental dataset, we calculate up to yesterday, which is a known population of days. So in the end (at the end of the year) everything is OK.

(But what if I am on holiday and my station fails, me coming back only in three weeks are those 20 days taken into the calculation? :twisted: :? Never mind :mrgreen: )
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12766
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

Reduce it to really simple maths with just three months with a small number of days per month...

Month1 temps = 1, 1, 1 -> Avg = 1
Month2 temps = 2, 2, 2 -> Avg = 2
Month3 temps = 3, 3, 3 -> Avg = 3

Avg of the months = avg(1, 2, 3) = 2
Avg of days = avg(1,1,1,2,2,2,3,3,3) = 2


Now with not the same number of days each month

Month1 = 1, 1 -> Avg = 1
Month2 = 2, 2 -> Avg = 2
Month3 = 3, 3, 3 -> Avg = 3

Avg of months = avg(1,2,3) = 2
Avg of days = avg(1,1,2,2,3,3,3) = 2.14

So not the same? Granted the difference in the number of days per month is in reality a smaller proportion than the example so the resulting difference is less, which is why the previous CMX method of annual calculation only made a small difference (at 1dp) to full years.

But if you fail to maintain your station and go off for holiday for three weeks, one of more months will have significantly less days! :lol: And those days would have a disproportionate effect on the result if averaging by months.

Or are we talking at cross purposes here?
User avatar
billy
Posts: 255
Joined: Mon 30 Nov 2015 10:54 am
Weather Station: WLL / Davis VP2+
Operating System: RPi bullseye
Location: Gooseberry Hill, Western Australia

Re: Some issues in NOAA figures

Post by billy »

I was halfway through preparing an almost identical response to HansR that Mark has just posted, but I think it might be worth adding one other comment that *may* help.

Without becoming too technical, and with due respect, in statistical terminology these are all statistics (ie estimates taken from samples) and are NOT the unknown parameters we wish we could discover - but never will. Each day, each month and each year are all SAMPLES of the population of ALL the days, months and years. Of course by the time we get a really good estimate, climate change will have made the estimate redundant ;)
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

I think you make the error here because you change the population of the month. That does not happen in our case.
You must come up with an example which has the same amount of numbers in both examples:

So:
Month1 temps = 1, 1, 1 -> Avg = 1
Month2 temps = 2, 2 -> Avg = 2
Month3 temps = 3, 3, 3 -> Avg = 3

Avg of the months = avg(1, 2, 3) = 2
Avg of days = avg(1,1,1,2,2,3,3,3) = 2

Agreed, but now you must come with an example where the numbers differ, but the amount of numbers stay the same because you change the population size which in reality does not occur (forget lap years for the sake of argument). Again: population size is always the same, we are not sampling days.

e.g:

Month1 temps = 1, 2, 3 -> Avg = 2
Month2 temps = 5, 4 -> Avg = 4.5
Month3 temps = 6, 7, 8 -> Avg = 7

Avg of the months = avg(2, 4.5, 7) = 4.5
Avg of days = avg(1,2,3,5,4,6,7,8) = 4.5

I don't think you can find a set of numbers where the average calculated through the averages of the months differs from the average via the days.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

billy wrote: Mon 09 Mar 2020 11:07 am I was halfway through preparing an almost identical response to HansR that Mark has just posted, but I think it might be worth adding one other comment that *may* help.

Without becoming too technical, and with due respect, in statistical terminology these are all statistics (ie estimates taken from samples) and are NOT the unknown parameters we wish we could discover - but never will. Each day, each month and each year are all SAMPLES of the population of ALL the days, months and years. Of course by the time we get a really good estimate, climate change will have made the estimate redundant ;)
We make an estimate of temperature, not of days or months, that is exactly the error in thinking.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
billy
Posts: 255
Joined: Mon 30 Nov 2015 10:54 am
Weather Station: WLL / Davis VP2+
Operating System: RPi bullseye
Location: Gooseberry Hill, Western Australia

Re: Some issues in NOAA figures

Post by billy »

HansR wrote: Mon 09 Mar 2020 11:15 am Month1 temps = 1, 2, 3 -> Avg = 2
Month2 temps = 5, 4 -> Avg = 4.5
Month3 temps = 6, 7, 8 -> Avg = 7

Avg of the months = avg(2, 4.5, 7) = 4.5
Avg of days = avg(1,2,3,5,4,6,7,8) = 4.5
A nice sequence but try interchanging the 5 and 6 in the original data of your example and and see what you get ;)
freddie
Posts: 2477
Joined: Wed 08 Jun 2011 11:19 am
Weather Station: Davis Vantage Pro 2 + Ecowitt
Operating System: GNU/Linux Ubuntu 22.04 LXC
Location: Alcaston, Shropshire, UK
Contact:

Re: Some issues in NOAA figures

Post by freddie »

HansR wrote: Mon 09 Mar 2020 7:54 am I checked with the KNMI (Dutch responsible Meteo organisation) and they follow the WMO (just as I think the USA does) and I do not see any discrepancies. I have difficulty in displaying formula's but for temperature it goes as follows (assuming full data presence):
  1. The day temp average is using hourly measurements (ooh, 01h...23h); sum and divide by 24;
  2. The month average is Sum(day average temp)/(nr of days in month);
  3. The year average is Sum(month average temp)/12.
The day temp average isn't based on hourly values - it is (24HrMax + 24HrMin)/2
Freddie
Image
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

@freddie: That is certainly not the case I just learned from a phone call to KNMI. To get the precise way of calculation I had to make a formal question which will take some time to get a response. (24HrMax + 24HrMin)/2 is a possibility by the WMO, but is at best a first order estimate of the average.
[Edit:] a quick formal response confirms my initial description of how day average is calculated: 24 hourly observations averaged.

@billy:
Month1 temps = 1, 2, 3 -> Avg = 2
Month2 temps = 6, 4 -> Avg = 5
Month3 temps = 5, 7, 8 -> Avg = 6.67

Avg of the months = avg(2, 5, 6.67) = 4.56
Avg of days = avg(1,2,3,6,4,5,7,8) = 4.5

Agreed, my bad. Numbers are delusional :(
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
freddie
Posts: 2477
Joined: Wed 08 Jun 2011 11:19 am
Weather Station: Davis Vantage Pro 2 + Ecowitt
Operating System: GNU/Linux Ubuntu 22.04 LXC
Location: Alcaston, Shropshire, UK
Contact:

Re: Some issues in NOAA figures

Post by freddie »

HansR wrote: Mon 09 Mar 2020 12:33 pm @freddie: That is certainly not the case I just learned from a phone call to KNMI. To get the precise way of calculation I had to make a formal question which will take some time to get a response. (24HrMax + 24HrMin)/2 is a possibility by the WMO, but is at best a first order estimate of the average.
I have worked in Meteorology and Climatology for 35 years, and the formula I gave hasn't changed in all that time.
Freddie
Image
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

In the Netherlands?
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
freddie
Posts: 2477
Joined: Wed 08 Jun 2011 11:19 am
Weather Station: Davis Vantage Pro 2 + Ecowitt
Operating System: GNU/Linux Ubuntu 22.04 LXC
Location: Alcaston, Shropshire, UK
Contact:

Re: Some issues in NOAA figures

Post by freddie »

HansR wrote: Mon 09 Mar 2020 12:46 pm In the Netherlands?
No - the UK, which follows WMO guidance (as you said in an earlier reply that KNMI does).
Freddie
Image
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

Yes, but the docu I linked does not say that. In fact it is said here. In 4.8.5 it says:
There are many methods for calculating an average daily temperature. These include methods that average a daily maximum and daily minimum, 24 hourly observations, [...] The best statistical approximation of an average is based on the integration of continuous observations over a period of time; the higher the frequency of observations, the more accurate the average. [...] For comparative purposes, a standard processing methodology is desirable for all stations worldwide, with the number of stations maximized.

Hence, the recommended methodology for calculating average daily temperature is to take the mean of the daily maximum and minimum temperatures. Even though this method is not the best statistical approximation, its consistent use satisfies the comparative purpose of normals. An NMHS should also calculate daily averages using other methods if these calculations improve the understanding of the climate of the country.
KNMI does the last apparently which I interpreted as conform WMO.

Lost in translation probably... :|
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
freddie
Posts: 2477
Joined: Wed 08 Jun 2011 11:19 am
Weather Station: Davis Vantage Pro 2 + Ecowitt
Operating System: GNU/Linux Ubuntu 22.04 LXC
Location: Alcaston, Shropshire, UK
Contact:

Re: Some issues in NOAA figures

Post by freddie »

HansR wrote: Mon 09 Mar 2020 1:19 pm Yes, but the docu I linked does not say that. In fact it is said here. In 4.8.5 it says:
There are many methods for calculating an average daily temperature. These include methods that average a daily maximum and daily minimum, 24 hourly observations, [...] The best statistical approximation of an average is based on the integration of continuous observations over a period of time; the higher the frequency of observations, the more accurate the average. [...] For comparative purposes, a standard processing methodology is desirable for all stations worldwide, with the number of stations maximized.

Hence, the recommended methodology for calculating average daily temperature is to take the mean of the daily maximum and minimum temperatures. Even though this method is not the best statistical approximation, its consistent use satisfies the comparative purpose of normals. An NMHS should also calculate daily averages using other methods if these calculations improve the understanding of the climate of the country.
KNMI does the last apparently which I interpreted as conform WMO.
Yes, and so does the UK Met Office. But the pertinent part of that sentence is... the recommended methodology for calculating average daily temperature is to take the mean of the daily maximum and minimum temperatures. Even though this method is not the best statistical approximation, its consistent use satisfies the comparative purpose of normals. So, the meteorologically-accepted standard is the formula I gave - despite it not being the best statistical approximation. But it IS the standard formula. NMHSs are encouraged to use the integration of continuous observations in addition to the standard formula. At some point in the future the amount of data that the integration method is applicable to will exceed that which only the standard formula can be used. When we reach that point then the integration method will probably become the standard method.
HansR wrote: Mon 09 Mar 2020 1:19 pmLost in translation probably... :|
Your English is far better than my Dutch, so I am grateful that you can speak (and type) English :)
Freddie
Image
User avatar
HansR
Posts: 5965
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: Some issues in NOAA figures

Post by HansR »

Hmm... so far you did not hear me speak ;)
Language sometimes takes strange turns.

OK, enough for today, thanks to all in this discussion on arithmetic with some food for thought.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12766
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Some issues in NOAA figures

Post by mcrossley »

HansR wrote: Mon 09 Mar 2020 11:15 am I think you make the error here because you change the population of the month. That does not happen in our case.
But that was my point - it does happen in our case. Some months have 30 days, some 31, and one has 28 or 29 - we'll ignore leap seconds ;)

If every month was 30 days long then I agree there would be no difference.

Re the daily average. I thought for climatic purposes everyone uses Max+Min/2, because we do not have historic hourly data for large numbers of locations. So this is the only way of comparing like with like - but not the best way for current data for sure.

Anyway - What do I do, leave CMX calculating using daily values? It gets my vote for the reasons I stated above regarding possibly incomplete data, and to evaluate the current incomplete year.
Post Reply