Page 1 of 1
[b3215] Multiple issues
Posted: Wed 07 Dec 2022 6:34 am
by chrishudson
I currently have a few issues with CMX
- Excess CPU usage by Cumulus
- Daily rollover not happening every day
- Issues with Month.ini and today.ini randomly being set to zero bytes
The daily rollover problem I have previously reported and in that discussion, it was noted that my Pi was running hot [84C]. So I started digging.
I have been keeping an eye on the CPU usage on the pi and have noticed that mono slowly spawns more threads over time. I have a feeling this may be why I have the other issues but I can't be certain yet as there is not enough logging for me to tell when it or why it happens. I do know that the problem with the month.ini being set to a zero byte file has been happening for a few months now as my last 3 monthyyyymm.ini files are all zero bytes.
This is what I am seeing just after starting cumulus
2022-12-05 21_55_35-Pileus (pileus) - VNC Viewer.png
This is what shows the next morning. Notice that one thread is running at 98% CPU
2022-12-06 09_00_54-Pileus (pileus) - VNC Viewer.png
This is how it looks that afternoon of the same day. Now two threads running at close to 100% CPU
2022-12-06 17_35_26-Pileus (pileus) - VNC Viewer.png
This is how it looks after the second night. Notice that there are now 3 threads running close to 100% CPU. Also you will notice that month.ini has dropped to zero bytes when it was 1.6KB the day before.
2022-12-07 09_23_27-Pileus (pileus) - VNC Viewer.png
Also notice that the number of threads/processes running has increased slowly over the two days. There were 16 threads/processes running just 7 mins after starting Cumulus and by day two there are now 27 running.
This Pi is devoted to running Cumulus and has no other applications running.
Re: [b3215] Multiple issues
Posted: Wed 07 Dec 2022 10:48 am
by HansR
One remarkable thing I see is that there are multiple instances of CMX running. No idea where that comes from but I would tick:
Code: Select all
Settings=>Program Settings=>General Options=>Stop Second Instance
Stop CMX and reboot after ticking this and any second (or beyond) instance of CMX will be stopped after this.
Then you can start looking where the multi start of CMX comes from .
Re: [b3215] Multiple issues
Posted: Wed 07 Dec 2022 12:21 pm
by mcrossley
It appears to be something unique to your configuration, the problem becomes tracking the source.
I see you have lots of extra files being processed.
Do you have anything else configured for rollover - external task, MySQL, custom HTTP?
If you had debug logging enabled, could you post the log file that covers the first day rollover?
If not, can you enable it, restart CMX and post the log file after the first rollover.
Re: [b3215] Multiple issues
Posted: Fri 09 Dec 2022 10:33 am
by chrishudson
Have added a couple of MXDiag files that cover a few days. Also added some logs where I tracked changes to the month.ini file and the threads used by CMX.
I've tried stepping through the logs myself, following along in the code, and tried to work out what is going on, but my C# knowledge is very basic [but getting better each time I do this

].
A few things I've noticed.
- The times when the month.ini drops to zero bytes correspond to nothing obvious in the log. It just appears to be a normal write. However once it is zero bytes it still gets "updated". ie the date of the file changes but nothing is written, which makes me think that the values CMX is storing internally are the problem rather than the actual writing to the file.
- The time the ini drops to zero happens at different times. ie I thought it was something to do with the rollover given the backups had the zero-byte files but it happens at other times during the day and the backup just reflects the file it had at the time.
- The month.ini is written to every minute even though nothing is changed. But sometimes there can be gaps of several minutes where no writes occur. For example, on the 8th December between 22:50 and 23:21
- Stopping CMX using Ctrl-C and restarting it sometimes triggers a proper write of the month.ini. For example, on the 9th December at 9:08
- Even though I am running b3215 and now have my station logging interval set to the same 10 mins as CMX, it still reads thousands of archive records from the station at startup. Unless I am misunderstanding the release notes for b3215.
- Some high CPU threads come and go within a few minutes. Others seem to stick around. It's the ones that stick around that appear to be the problem. May I suggest that when a new background worker is started that it writes a debug log entry so we can see when the threads are started?
- The style of logging varies considerably in the code. eg Some log statements include the calling method, while others don't, which makes it hard to track what is going on. Other times what gets written to the debug log is a different message than what gets written to the console even though they follow each other in the code. I'm sure this is on your TODO list to fix.
- A lot of things trigger a write to the month.ini. To narrow down where the issue is, it may be worthwhile adding a log entry in the code just before calling WriteMonthIniFile to show why it was called. See example below
Code: Select all
if (OutdoorHumidity > ThisMonth.HighHumidity.Val)
{
ThisMonth.HighHumidity.Val = OutdoorHumidity;
ThisMonth.HighHumidity.Ts = timestamp;
LogDebugMessage("New High Monthly Humidity value" + OutdoorHumidity)
WriteMonthIniFile();
}
Re: [b3215] Multiple issues
Posted: Fri 09 Dec 2022 10:47 am
by chrishudson
As for why I have so much extra files processing, some of that goes back to the days when I used to use cheap SD cards in my Pi which used to die a lot so I started backing up stuff like the ini files, monthly logs and other stuff to the FTP server to make it easier to recover from a dead Pi. I haven't had that issue for a while now, so maybe I could drop how many files get backed up and how often. Mind you, I have been doing this for 4-5 years or so and it hasn't been an issue.
Have also attached a copy of my cumulus.ini in case it is useful.
Re: [b3215] Multiple issues
Posted: Fri 09 Dec 2022 1:35 pm
by mcrossley
I'll examine this in more detail over time, but here are some initial comments...
A few things I've noticed.
The times when the month.ini drops to zero bytes correspond to nothing obvious in the log. It just appears to be a normal write. However once it is zero bytes it still gets "updated". ie the date of the file changes but nothing is written, which makes me think that the values CMX is storing internally are the problem rather than the actual writing to the file.
That is of note, quite what it means I don't know yet.
I doubt it is the source values, they are core to the functioning of MX and nothing much would work if the main weather data values were corrupt.
The time the ini drops to zero happens at different times. ie I thought it was something to do with the rollover given the backups had the zero-byte files but it happens at other times during the day and the backup just reflects the file it had at the time.
OK, so we know it is not a rollover issue.
Even though I am running b3215 and now have my station logging interval set to the same 10 mins as CMX, it still reads thousands of archive records from the station at startup. Unless I am misunderstanding the release notes for b3215.
A bug - your last data time of 09:05, got rounded to 09:10, but the time was 09:08. As 09:10 did not exist in the station logger, the station sent the whole logger contents.
If the last data time is later than the last log entry, then the download should skip the archive request.
Some high CPU threads come and go within a few minutes. Others seem to stick around. It's the ones that stick around that appear to be the problem. May I suggest that when a new background worker is started that it writes a debug log entry so we can see when the threads are started?
Maybe, it would add a lot of extra logging though.
The style of logging varies considerably in the code. eg Some log statements include the calling method, while others don't, which makes it hard to track what is going on. Other times what gets written to the debug log is a different message than what gets written to the console even though they follow each other in the code. I'm sure this is on your TODO list to fix.
I've gradually been updating the comments, most "functions" now log the function name. Some messages (mainly during start-up) do not, as the source is often self-evident.
The differences in the console messages and the log file entries are on whole intended. They serve difference audiences, different purposes.
A lot of things trigger a write to the month.ini. To narrow down where the issue is, it may be worthwhile adding a log entry in the code just before calling WriteMonthIniFile to show why it was called. See example below
I need to see why you have batches of writes every so often. The code needs optimising and some of the writes consolidating at a higher level - it will help SD disk wear for rPi users too!
Re: [b3215] Multiple issues
Posted: Fri 09 Dec 2022 10:04 pm
by chrishudson
I need to see why you have batches of writes every so often. The code needs optimising and some of the writes consolidating at a higher level - it will help SD disk wear for rPi users too!
I saw those [sometimes up to 4-5 in a row] but had just assumed there were multiple reasons for the file to be updated. For example, a new minimum temp and a new minimum apparent temp at the same time. So there are two separate writes to the month.ini. Maybe while checking the data set a flag to indicate the month.ini needs updating and do a write just once after checking all the values if the flag is set.
For example [excuse my poor code]
Code: Select all
WriteMonthIni = False;
if (OutdoorHumidity > ThisMonth.HighHumidity.Val)
{
ThisMonth.HighHumidity.Val = OutdoorHumidity;
ThisMonth.HighHumidity.Ts = timestamp;
LogDebugMessage("New High Monthly Humidity value" + OutdoorHumidity)
WriteMonthIni = True;
}
... all the other checks along the same lines ...
if WriteMonthIni
{
WriteMonthIniFile();
WriteMonthIni = False
}
Or you could do it even higher up in the code
For example:
Code: Select all
WriteMonthIni = False;
DoApparentTemp(timestamp);
DoFeelsLike(timestamp);
DoHumidex(timestamp);
DoCloudBaseHeatIndex(timestamp);
if WriteMonthIni
{
WriteMonthIniFile();
WriteMonthIni = False
}
Re: [b3215] Multiple issues
Posted: Sat 10 Dec 2022 4:13 pm
by Nossie
for the rpi image you might wish to have log2ram installed - the same might be a good idea for those not using SSDs
Re: [b3215] Multiple issues
Posted: Sat 10 Dec 2022 9:41 pm
by mcrossley
The write consolidation is on the todo list, but the question remains why you have zero length files.
I notice that the times the zero-length file is created the file write takes a loong time - 1 second - the normal writes take a few hundredths of a second. And every write after that takes one second as well.
One thing I would be interested in trying. If you get a zero-length file, try deleting it and not restarting CMX. It may recreate the full file.
We see this on FTP servers sometimes, where they cannot overwrite a zero-length file for some reason.
The question I cannot answer now is why this is happening on your system and no one else's.
Re: [b3215] Multiple issues
Posted: Mon 12 Dec 2022 11:05 am
by chrishudson
One thing I would be interested in trying. If you get a zero-length file, try deleting it and not restarting CMX. It may recreate the full file.
You were right. As soon as I noticed the month.ini was zero-length I deleted it while CMX was still running and within a minute CMX had recreated it with the right content.
Am now playing around with auditd to see if I can tell what is using the file when it goes to zero-bytes.
Re: [b3215] Multiple issues
Posted: Wed 12 Apr 2023 10:56 pm
by chrishudson
Upgraded to CMX 3.24.2 on the 5th of April from 3.23.1. I just wanted to inform you that with the latest version of CMX the issues I've had for the latest few months since upgrading to build 3.22.2 [b3215 - zero-byte files, slowly increasing number of threads, missing overnight rollovers and high CPU ] have gone.
No more logging into the Pi on a daily basis and deleting any zero-byte files and waiting for them to be regenerated by CMX.
No more restarting CMX once I had a complete set of INI files so that I had a backup.
No more 84+°C CPU temps on the Pi.
Something done to the code between 3.23.1 and 3.24.2 has resolved those issues. You may not know what was done, but thank you for whatever it was you did.
Re: [b3215] Multiple issues
Posted: Wed 26 Apr 2023 9:16 am
by mcrossley
Not sure what it was that fixed this, but I'm glad it is resolved.