Page 1 of 1

Stuck with high CPU usage after some days until restarted

Posted: Sun 08 Nov 2020 4:34 pm
by af7567
Hi
I have had problems with Cumulus MX processes suddenly using 100%+ CPU and staying there. It usually happens after it's been running a few days and has been happening with all releases in the last 6 months or so from when I started using it. Cumulus still continues to work normally and I only notice there's a problem when I check the CPU usage on the server. The problem can occur multiple times so for example after a few days the Cumulus MX CPU usage will be shown as 100% but if left to run then after a few more days it will get stuck at 200% then 300% (as displayed by htop). When running normally it uses 1-2%.

The server is running Slackware Linux on an i5-4570 with mono 6.10.0.104 currently but it also happened with older versions of mono. The weather station is a USB Fine Offset WH1080. Synchronise Fine Offset reads is enabled in the settings.

I enabled extra logging in Cumulus but that doesn't show any error messages at the times when the CPU usage jumps up. I also have the CPU usage graphed so I can see exactly what times it happens. There are also no errors shows at those times in dmesg or in syslog so it's not a USB problem.

I have attached the MXdiags log from the last run where the CPU usage jumped up at the following times:
2020-11-06 14:00 - 100%
2020-11-07 17:50 - 200%
2020-11-08 07:50 - 300%
I can't see any errors in there at those times.

Re: Stuck with high CPU usage after some days until restarted

Posted: Mon 09 Nov 2020 4:50 am
by jlmr731
So you are seeing the process "mono CumulusMX.exe" shows 100% then goes up later to 200 and so on. I see your logging at 5 min increments but you say no data is being lost?
Now i see this the first time is 2020-11-06 01:29:27.736 Sensor contact lost; ignoring outdoor data and repeats after that but still show writing log (dont know how to decode the fine offset entry) and later you get some ignoring bad data entries, till we get to here 2020-11-07 22:02:00.147 *** Data input appears to have stopped and that continues till you stop cumulus. so at that point its stops logging everything
My first thought is something with your usb connection, maybe try a new cable if possible. dont think its software related but ive been wrong before ... many times
Mark or someone else will have a more in-depth answer
Also you can try running in debug mode maybe you will get more logging info should be like sudo mono CumulusMX.exe -debug

Re: Stuck with high CPU usage after some days until restarted

Posted: Mon 09 Nov 2020 5:10 pm
by mcrossley
Agreed, there are issues with the station.

Long long periods where it is reporting sensor contact lost.

At one point the rain counter from the sensor made a massive jump in the tip count, stayed like that for some time, then reverted back to what it was before.
Could it be picking up another transmitter in your area?

Invalid temperature and wind readings being sent.

Sorry, but it's looking pretty poorly. :(

Re: Stuck with high CPU usage after some days until restarted

Posted: Wed 18 Nov 2020 11:41 am
by af7567
I had thought it was a USB connection problem first of all but there's no USB disconnects or errors in the Linux logs like you would normally see when a USB device misbehaves. I thought that the sensor contact lost message just means the USB receiver hadn't received any data from the transmitter outside for a while (it probably needs new batteries) but not a serious error that would cause something to get stuck in a loop.
The rain sensor jump is because of a windy day, it's a bit wobbly and we never trusted that anyway :) only really use it for the temperatures now.

I will try with a different USB cable and new batteries in the transmitter to see if that makes a difference. I was thinking that it was a software error though because nothing should cause the software to get stuck with high CPU usage, if there was an error it should give an error or timeout.

Re: Stuck with high CPU usage after some days until restarted

Posted: Wed 18 Nov 2020 1:11 pm
by mcrossley
You are right in that it shouldn't get stuck with high CPU usage. This is the first case of this symptom that I have heard of though, and the lack of error or crash makes me suspect something lower in the stack hanging up.