Mono/Cumulus process getting into a high CPU state and losing comms
Posted: Tue 09 Nov 2021 12:20 am
Hi there
We are experiencing a problem with CumulusMX with the Mono instance that is running Cumulus MX as a service. The problem started about 2-3 months ago and is random, but when it happens the Mono/CumulusMX process starts consuming all available CPU power and at the same time it loses comms with the weather station.
Some background info:
- Cumulus MX Version: 3.13.6—b3152 (configured to run as a service)
- Mono Version: 6.12.0.107
- OS: Centos 7 – Kernel 3.10.0-1160.42.2.e17.x86_64
- Weather Station: Davis Vantage Pro2 with a USB Logger [/list]
Looking at the various logs and configuration files…
- The USB interface is configured such that it does not go into sleep mode or anything like that.
- The operating system is not reporting any issues with the USB interface and is not reporting it as disconnecting.
- The timing of the increase in CPU usage happens shortly before the “No input from station” entries start appearing in the Service Console Log.
Other Observations
- The issue has been occurring over the last couple of builds B3148-b3152.
- When we normally stop the CumulusMX service, it usually stops within a second or two. When the Mono/CumulusMX process gets into the high CPU state, it will take around 30 seconds to stop the service.
- The CumulusMX service can be immediately restarted and it works perfectly again without having to do anything. It has no problems with communicating with the weather station after a restart.
- The time between the problem happening varies from a couple of hours to a couple of days.
- When the Mono/CumulusMX process gets into a high CPU state, it's memory usage doesn't appear to increase, even if left for 12 hours or more.
- A restart or the server doesn’t appear to make any difference.
As mentioned this problem started a few months ago. Prior to that CumulusMX was super stable on the same server and would run for months between restarts.
I'm happy to do some more in depth testing and post/send logs as required to help work out the issue.
We are experiencing a problem with CumulusMX with the Mono instance that is running Cumulus MX as a service. The problem started about 2-3 months ago and is random, but when it happens the Mono/CumulusMX process starts consuming all available CPU power and at the same time it loses comms with the weather station.
Some background info:
- Cumulus MX Version: 3.13.6—b3152 (configured to run as a service)
- Mono Version: 6.12.0.107
- OS: Centos 7 – Kernel 3.10.0-1160.42.2.e17.x86_64
- Weather Station: Davis Vantage Pro2 with a USB Logger [/list]
Looking at the various logs and configuration files…
- The USB interface is configured such that it does not go into sleep mode or anything like that.
- The operating system is not reporting any issues with the USB interface and is not reporting it as disconnecting.
- The timing of the increase in CPU usage happens shortly before the “No input from station” entries start appearing in the Service Console Log.
Other Observations
- The issue has been occurring over the last couple of builds B3148-b3152.
- When we normally stop the CumulusMX service, it usually stops within a second or two. When the Mono/CumulusMX process gets into the high CPU state, it will take around 30 seconds to stop the service.
- The CumulusMX service can be immediately restarted and it works perfectly again without having to do anything. It has no problems with communicating with the weather station after a restart.
- The time between the problem happening varies from a couple of hours to a couple of days.
- When the Mono/CumulusMX process gets into a high CPU state, it's memory usage doesn't appear to increase, even if left for 12 hours or more.
- A restart or the server doesn’t appear to make any difference.
As mentioned this problem started a few months ago. Prior to that CumulusMX was super stable on the same server and would run for months between restarts.
I'm happy to do some more in depth testing and post/send logs as required to help work out the issue.