Welcome to the Cumulus Support forum.

Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024

Cumulus MX V4 beta test release 4.0.0 (build 4018) - 28 March 2024

Legacy Cumulus 1 release v1.9.4 (build 1099) - 28 November 2014 (a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

OutOfMemoryException starting in build 3152

From build 3044 the development baton passed to Mark Crossley. Mark has been responsible for all the Builds since. He has made the code available on GitHub. It is Mark's hope that others will join in this development, but at the very least he welcomes your ideas for future developments (see Cumulus MX Development suggestions).

Moderator: mcrossley

Post Reply
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

OutOfMemoryException starting in build 3152

Post by spatula »

Starting with build 3152, I've begun having problems with CumulusMX throwing an OutOfMemoryException after running nonstop for about a week each time. At the time it originally starts, it's only using about 40MB; evidently something is leaking and creeping up memory usage over time.

Once an OutOfMemoryException is thrown, CumulusMX effectively stops working even though it's still running- it can no longer communicate with the station, nor can it send out updates.

I've upgraded to the latest build, and I'm keeping a closer eye on it for now to try to quantify the problem better. Unfortunately I only thought I had the stack trace copied when I restarted, and lost it after the restart (the console log cycled out after I upgraded to the latest). But it looks like the exception ends up getting thrown from random places in the code (just whatever happens to run next after it runs out of memory), so it probably wouldn't be that helpful anyway.

This is running in a Windows 10 environment with a Davis Vantage Vue station, sending data to APRS-IS, Weather Underground, PWSWeather, and Windy.

Happy to grab a memory dump and/or thread dump if it would be helpful and someone can point me to instructions on how to do it.
User avatar
mcrossley
Posts: 12694
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: OutOfMemoryException starting in build 3152

Post by mcrossley »

OK, that is a new one. I know about a memory leak when running under Mono with the Davis WLL and Ecowitt GW1000 stations, but nothing under Windows.
Let's see if the latest build still has the problem, I do not want to spend any effort looking at old builds.
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

Still had the leak in 3154, and just updated to 3160.

Whatever is happening does not appear to be a slow leak (keeping an eye on real memory use and it seems to be fairly constant), but possibly one that happens quickly by some triggering event. On this most recent occasion, Cumulus MX was started on Dec 11, and blew up on Dec 26.

Just prior to this, this was logged:
2021-12-26 00:24:34.456 SendLoopCommand: Starting - LPS 2 1
2021-12-26 00:24:34.549 SendLoopCommand: Starting - LOOP 50
2021-12-26 00:25:00.078 DoLogFile: Writing log entry for 12/26/2021 12:25:00 AM
2021-12-26 00:25:00.080 DoLogFile: log entry for 12/26/2021 12:25:00 AM written
2021-12-26 00:25:00.080 Writing today.ini, LastUpdateTime = 12/26/2021 12:25:00 AM raindaystart = 7.76 rain counter = 7.76
2021-12-26 00:25:00.082 Updating CWOP
2021-12-26 00:26:00.255 *** Data input appears to have stopped
(then a lot of repeats of the previous line until I noticed and restarted it)

This was the first OOME sent to the console:
2021-12-26 00:01:50.629 Error opening serial port - Exception of type 'System.OutOfMemoryException' was thrown.

So it seems like something had already started going wrong at 00:26, and by 01:50, maybe it had gone wrong enough to blow the heap. (The previous message in the console log was the daily reminder that I wasn't on the most current build, and just that daily message for about the prior two weeks.)

Whatever it is tends to take a week or two before it happens. I'm going to turn on debug logging to see if something more useful will appear next time.
User avatar
mcrossley
Posts: 12694
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: OutOfMemoryException starting in build 3152

Post by mcrossley »

Could you attach the full log file please. I'll probably need one with debug logging though. There may be a clue in there, and it will give me information about how your station is configured.
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

Caught another one of these this morning. The puzzling thing is that at the time the OutOfMemoryException began, Cumulus was using a whopping 9.7 MB of memory (though I might have missed it growing large and then having a major GC event clearing things up).

When this occurs, the alarm emails do *not* get sent.

This time I got some acceptable debug logs around the time it happened:

Code: Select all

2022-01-20 03:11:57.398 LOOP: Data - 50: 4C-4F-4F-14-00-C6-08-63-76-AA-02-3D-B1-01-00-00-E6-00-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-5F-FF-FF-FF-FF-FF-FF-FF-00-00-FF-FF-7F-00-00-FF-FF-00-00-1F-00-B0-03-00-00-00-00-00-00-FF-FF-FF-FF-FF-FF-FF-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-17-03-06-09-D1-02-B8-06-0A-0D-86-B3
2022-01-20 03:11:57.398 LOOP: 50 - Data packet is good
2022-01-20 03:11:57.398 SendLoopCommand: Starting - LPS 2 1
2022-01-20 03:11:57.398 WakeVP: Not required
2022-01-20 03:11:57.398 SendLoopCommand: Sending command LPS 2 1,  attempt 1
2022-01-20 03:11:57.399 SendLoopCommand: Wait for ACK
2022-01-20 03:11:57.399 WaitForACK: Wait for ACK
2022-01-20 03:11:58.400 WaitForAck: (1) Timed out
2022-01-20 03:11:59.413 WaitForACK: (2) Received - 4C
2022-01-20 03:11:59.414 WaitForAck: timed out
2022-01-20 03:11:59.414 SendLoopCommand: Sending command LPS 2 1,  attempt 2
2022-01-20 03:11:59.451 SendLoopCommand: Wait for ACK
2022-01-20 03:11:59.452 WaitForACK: Wait for ACK
2022-01-20 03:11:59.484 WaitForACK: (1) Received - 4F
2022-01-20 03:11:59.500 WaitForACK: (2) Received - 4F
2022-01-20 03:11:59.500 WaitForAck: timed out
2022-01-20 03:11:59.500 SendLoopCommand: Sending command LPS 2 1,  attempt 3
2022-01-20 03:11:59.500 SendLoopCommand: Wait for ACK
2022-01-20 03:11:59.500 WaitForACK: Wait for ACK
2022-01-20 03:11:59.500 WaitForACK: (1) Received - 14
2022-01-20 03:11:59.500 WaitForACK: (2) Received - 00
2022-01-20 03:11:59.500 WaitForAck: timed out
2022-01-20 03:11:59.500 SendLoopCommand: Failed to get a response after 3 attempts, reconnecting the station
2022-01-20 03:11:59.502 InitSerial: Connecting to the station
2022-01-20 03:11:59.529 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-20 03:11:59.529 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
(then the last 3 messages repeat forever until I manually restarted Cumulus MX)

I think this could be vaguely related to the other problem I've been having with the LOOP2 command failing to read the serial port in the other thread. Someone noticed years ago on MSDN about applications leaking memory prodigiously if a serial port goes away, at least with dotnet < 4.5, but not sure if this is the same or similar issue here.
User avatar
mcrossley
Posts: 12694
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: OutOfMemoryException starting in build 3152

Post by mcrossley »

Hmm, I have heard that the serial port code in .Net is supposed to be horrendous before.

The InitSerial() code does a .close and .dispose on the port before re-opening it, but it does sound like that is leaking memory.
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

I've swapped out my mini-USB cable, updated to the latest Silicon Labs drivers, and updated to the latest Cumulus build today, so hopefully some improvement follows. I'll try to set a reminder to follow up if it does in fact help. (Of course it's also vaguely possible that the chipset in the data logger in the unit itself has gone flaky, but I would suspect a bad cable before the data logger.)
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

So far so good; no further OOMEs or "Data input appears to have stopped" problems. For the next person to encounter these problems, especially with Davis equipment, here's the summary of what we did:

* New USB cable connected from the station directly to the computer (not through a hub)
* Updated to the latest Silicon Labs CP210x USB to UART Bridge drivers (11.0.0.509 as of right now)
* Ensured that the port using the above drivers has power management disabled
* Removed a really old version of the .Net framework (just in case)
* Updated Cumulus MX to build 3162

It vaguely seems like the OOME coming from the .Net framework was related to the port being flaky and having some significant resource leaks in .Net with serial IO, at least when ports are behaving badly/unexpectedly. Getting the port to be more stable and updating to build 3162 so far seems to solve the problem.
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

Well, I spoke too soon for sure, sadly.

Code: Select all

2022-01-30 13:31:10.657 LOOP: Data - 50: 4C-4F-4F-EC-00-82-00-00-76-BB-02-38-22-02-04-04-2D-00-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-FF-52-FF-FF-FF-FF-FF-FF-FF-00-00-FF-FF-7F-00-00-FF-FF-00-00-21-00-B2-03-00-00-00-00-00-00-FF-FF-FF-FF-FF-FF-FF-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-16-03-06-09-CA-02-C3-06-0A-0D-D0-42
2022-01-30 13:31:10.657 LOOP: 50 - Data packet is good
2022-01-30 13:31:10.657 SendLoopCommand: Starting - LPS 2 1
2022-01-30 13:31:10.657 WakeVP: Not required
2022-01-30 13:31:10.657 SendLoopCommand: Sending command LPS 2 1,  attempt 1
2022-01-30 13:31:10.657 SendLoopCommand: Wait for ACK
2022-01-30 13:31:10.658 WaitForACK: Wait for ACK
2022-01-30 13:31:11.659 WaitForAck: (1) Timed out
2022-01-30 13:31:12.652 WaitForACK: (2) Received - 4C
2022-01-30 13:31:12.652 WaitForAck: timed out
2022-01-30 13:31:12.652 SendLoopCommand: Sending command LPS 2 1,  attempt 2
2022-01-30 13:31:12.652 SendLoopCommand: Wait for ACK
2022-01-30 13:31:12.652 WaitForACK: Wait for ACK
2022-01-30 13:31:12.652 WaitForACK: (1) Received - 4F
2022-01-30 13:31:12.653 WaitForACK: (2) Received - 4F
2022-01-30 13:31:12.653 WaitForAck: timed out
2022-01-30 13:31:12.653 SendLoopCommand: Sending command LPS 2 1,  attempt 3
2022-01-30 13:31:12.653 SendLoopCommand: Wait for ACK
2022-01-30 13:31:12.653 WaitForACK: Wait for ACK
2022-01-30 13:31:12.653 WaitForACK: (1) Received - 3F
2022-01-30 13:31:12.653 WaitForACK: (2) Received - 00
2022-01-30 13:31:12.653 WaitForAck: timed out
2022-01-30 13:31:12.653 SendLoopCommand: Failed to get a response after 3 attempts, reconnecting the station
2022-01-30 13:31:12.654 InitSerial: Connecting to the station
2022-01-30 13:31:12.682 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-30 13:31:12.682 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
2022-01-30 13:31:42.683 InitSerial: Connecting to the station
2022-01-30 13:31:42.700 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-30 13:31:42.700 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
2022-01-30 13:32:12.700 InitSerial: Connecting to the station
2022-01-30 13:32:12.717 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-30 13:32:12.717 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
2022-01-30 13:32:42.718 InitSerial: Connecting to the station
2022-01-30 13:32:42.732 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-30 13:32:42.733 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
2022-01-30 13:33:00.454 SendEmail: Waiting for lock...
2022-01-30 13:33:00.454 SendEmail: Has the lock
2022-01-30 13:33:00.454 SendEmail: Sending email, to [redacted], subject [Cumulus MX Alarm], body ["A Cumulus MX alarm has been triggered.\r\nCumulus has stopped receiving data from y" +
    "our weather station."]...
2022-01-30 13:33:00.455 *** Data input appears to have stopped
2022-01-30 13:33:00.642 SendEmail: Releasing lock...
2022-01-30 13:33:12.734 InitSerial: Connecting to the station
2022-01-30 13:33:12.749 InitSerial: Error opening port - Exception of type 'System.OutOfMemoryException' was thrown.
2022-01-30 13:33:12.749 InitSerial: Failed to connect to the station, waiting 30 seconds before trying again
This time it never recovered, and it was throwing these OutOfMemoryExceptions when things weren't even remotely close to being out of memory. Restarting CumulusMX cured it immediately.
spatula
Posts: 19
Joined: Sun 27 Aug 2017 2:46 pm
Weather Station: Davis Vantage Vue
Operating System: Windows 10

Re: OutOfMemoryException starting in build 3152

Post by spatula »

Adding this just in case it's helpful to any other Davis Vantage Vue users or to us in continuing to chase this down: I appear to have a pretty old console, relatively speaking. The latest Davis firmware it can handle is 3.00, and the one that was present on the device was 2.14. 3.00 apparently fixes some serial communication problems, so I updated that this morning.

For anyone wishing to check their Davis console firmware version, just hold "done" and tap the "+" (or "up") button. Firmware updates are available on the Davis site, but be sure to get the update appropriate for your manufacturing code.

I also don't have high confidence in this PC & Windows installation at this point-- it's all getting kind of long-in-the-tooth and crufty, so the next thing I'm trying is a new PC build (which I was planning to do already) with a clean Windows 11 install. The problems I'm having seem a little too bizarre to pin on Cumulus, especially the OutOfMemoryException when trying to open a serial port when there's plenty of heap memory... that doesn't strike me as something that userland code should be able to cause just by opening a serial port, especially considering that when it happens, Cumulus tries to close the port before re-opening it; that shouldn't look any different from restarting Cumulus as far as the port is concerned.

Stay tuned...
Post Reply