Page 1 of 1

MX stops - nothing strange in MX diags

Posted: Tue 29 Dec 2020 11:20 pm
by Big Daddy
Hi,
I am running b3097 but every so often MX just seems to stop. I dont see anything strange in the MX diags (latest one attached) but I did notice when I did sudo systemctl status cumulusmx from the pi prompt it returned the following. I restarted MX and all seems to be ok now. Maybe 2nd or 3rd time its happened over the last couple of months.

21:18 seems to tie in with the MX diags. Any ideas please?

cumulusmx.service - CumulusMX service
Loaded: loaded (/etc/systemd/system/cumulusmx.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Tue 2020-12-29 21:18:53 GMT; 1h 41min ago
Docs: https://cumuluswiki.org/a/Main_Page
Process: 951 ExecStart=/usr/bin/mono-service -d:/opt/CumulusMX CumulusMX.exe -service (code=exited, status=0/SUCCESS)
Process: 1149 ExecStopPost=/bin/rm /tmp/CumulusMX.exe.lock (code=exited, status=0/SUCCESS)
Main PID: 952 (code=killed, signal=KILL)

Dec 29 21:18:52 Weather-Pi systemd[1]: cumulusmx.service: Main process exited, code=killed, status=9/KILL
Dec 29 21:18:53 Weather-Pi systemd[1]: cumulusmx.service: Failed with result 'signal'.
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

I restarted MX and all seems to be ok now. Maybe 2nd or 3rd time its happened over the last couple of months.
Is this MX or Pi related. I am using the standard pre-built Raspberry Pi image.

Thanks
Andy

Re: MX stops - nothing strange in MX diags

Posted: Wed 30 Dec 2020 1:44 am
by galfert
The status shows Active failed. This means that your systemd service is not configured correctly or something is preventing it to auto-start and thus it is not starting on boot.

Or maybe it was just never enabled.

Try the following:
> sudo systemctl enable cumulusmx

Then check the status again:
> systemctl status cumulusmx

See if the status for Active then shows that it is Active: active.

It might be helpful to see the contents of your cumulusmx.service file:
> cat /etc/systemd/system/cumulusmx.service

Re: MX stops - nothing strange in MX diags

Posted: Wed 30 Dec 2020 10:28 am
by Big Daddy
Hi Galfert,
Have attached the file.

I dont seem to have a problem with the service starting. Its been running fine since I moved to using the image file and also started running it as a service. I have rebooted the Pi several times previously and MX always seems to start up no problem.

For some reason ocassionally it just decides to stop. Its been running perfectly fine since I last restarted it on 11th December, until last night. I have included the diags from the last reboot and also the serviceConsole log.

Andy

Re: MX stops - nothing strange in MX diags

Posted: Wed 30 Dec 2020 11:35 am
by mcrossley
It looks like something external killed the process with a SIGKILL?

That cannot be caught by MX, if a SIGTERM is used it will be caught and logged by MX and an orderly shutdown performed.

Code: Select all

Dec 29 21:18:52 Weather-Pi systemd[1]: cumulusmx.service: Main process exited, code=killed, status=9/KILL

Re: MX stops - nothing strange in MX diags

Posted: Thu 31 Dec 2020 9:32 am
by Big Daddy
Many thanks. Will take a look and see if I can find any logs on the Pi that might indicate what happened.

Appreciate the support.

And

Re: MX stops - nothing strange in MX diags

Posted: Fri 08 Jan 2021 11:32 am
by Big Daddy
So I started to notice this as well on my brothers install of Cumulus, again on a Pi. Both his and my Pi have the latest updates for MX, Buster, packages, mono etc
I dont expect this to be fixed here, more just to point it out and see if anybody has any ideas. To me it seems that mono is possibly hogging / not releasing memory and eventually the process has to be killed due to out of memory.

I am no expert but on my Pi, when it last failed I saw this in the kernel logs (/var/log/kern.log)

Dec 29 21:18:51 Weather-Pi kernel: [2895475.331180] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=mono,pid=952,uid=0
Dec 29 21:18:51 Weather-Pi kernel: [2895475.331651] Out of memory: Killed process 952 (mono) total-vm:1172748kB, anon-rss:866980kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:1358kB oom_score_adj:0
Dec 29 21:18:51 Weather-Pi kernel: [2895475.522290] oom_reaper: reaped process 952 (mono), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

And in the same log just prior to this:

Tasks state (memory values in pages):
[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 952] 0 952 293187 216746 1390592 13170 0 mono


On my brothers Pi I see similar

Jan 8 03:36:26 Nigels-Pi2 kernel: [206448.280612] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=mono,pid=339,uid=0
Jan 8 03:36:26 Nigels-Pi2 kernel: [206448.280993] Out of memory: Killed process 339 (mono) total-vm:1012064kB, anon-rss:844472kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:1212kB oom_score_adj:0
Jan 8 03:36:26 Nigels-Pi2 kernel: [206448.498711] oom_reaper: reaped process 339 (mono), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB

Tasks state (memory values in pages):
[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 339] 0 339 253016 211119 1241088 1858 0 mono


Cumulus seems to stop after this point as mono looks like its been killed and a restart of Cumulus resolves it.

Would this point to a potential memory issue with mono do you think?

I have attached both kernel files if anybody is interested in taking a look.
Andy

***Edit*** I also just noticed that when running "top" on my Pi, mono was using 342.3% CPU. When I restart Cumulus it went to 7% with occasional rise to 23%. On my brotheres Pi it shows 7% to 23% but this one was restarted this morning.

Re: MX stops - nothing strange in MX diags

Posted: Fri 08 Jan 2021 11:57 am
by mcrossley
OK, interesting. I am seeing a memory leak when running under Mono with the Davis WLL station. I am running on a Pi 4 so it has more memory than most pi's. If the GW-1000 is having the same problem it possibly points to the HTTP calls being the issue as both these stations use that mechanism to obtain the data. (I had also been looking at the JSON decoder, but the GW-1000 does not use JSON).

The memory leak does not occur when running on Windows, so it does appear to be a problem in Mono. Unfortunately my dev/debug environment is on Windows, and I have limited knowledge on debugging and tracking memory leaks on Linux.

Whilst I investigate this, it may be a good idea to schedule a shutdown/start of Cumulus every few days to clear the memory usage down.

Re: MX stops - nothing strange in MX diags

Posted: Fri 08 Jan 2021 12:24 pm
by Big Daddy
Thanks Mark,

Both stations are using GW1000. Will instigate a regular shutdown / restart using a cron on the Pi's.

As I can recretae the issue on 2 systems if you need any information please let me know. Unfortunately I am not a Linux expert either but happy to help out where I can.

Andy

Re: MX stops - nothing strange in MX diags

Posted: Fri 08 Jan 2021 1:03 pm
by water01
This may help trace a memory leak mtrace.

Usage explained here https://www.raspberrypi.org/forums/view ... p?t=206290

Re: MX stops - nothing strange in MX diags

Posted: Fri 08 Jan 2021 1:42 pm
by mcrossley
Unfortunately mtrace appears to be for C programs rather than dotNet.