Page 1 of 1

Crash in 3101

Posted: Sat 06 Feb 2021 9:13 am
by sfws
I had been running 3094 quite happily on my unattended RPI since that build was released (last September), and it proved reliable month after month. I will have restarted MX a few times, but not on any regular schedule.

On 22 January, I looked at the forum, saw that Mark had been very busy developing MX and I decided to try release build 3101, on that day it was the latest available, and there was no evidence of people being annoyed with it. I did not try to learn about the new functionality of running as service, I used sudo mono CumulusMX.exe -debug as I had with previous installation here.

It seemed to be working okay when I checked it the next day, and so I left it unattended after that. It happened I was beside my RPi yesterday morning, and checked it was still running, without any issues.

I discovered this morning my MX software installation had crashed, although I don't know when. It appears the last entry in MXDiags file is yesterday afternoon, but the stack report does not mention any time, and might have been generated then or earlier this morning for all I know.

I attach, in the zip, the stack trace it showed on the console, the then current MX diags file (with debug on) that just stopped being written to without recording any reason, and a new file created when that release started running. None of them tell me why unattended MX decided to crash. The only thoughts I have are:
1) that releases happen so often that perhaps only me expects MX to keep running unattended for longer than two weeks?
2) or maybe too much functionality has been rushed into MX (I see Mark is saying he wants to take a break from adding functionality)

Anyway, I have regressed back to 3094 that did work reliably, and I rewound to rollover yesterday in case any data files were corrupted by whatever happened whenever it happened.

Re: Crash in 3101

Posted: Sat 06 Feb 2021 11:32 am
by freddie
Looks like a crash in Mono to me, rather than MX. What version of Mono are you running?

Maybe even a hardware problem?

Code: Select all

Bus error
pi@tiny-computer:/media/pi/portable/CumulusMX $ Crash Reporter has timed out, sending SIGSEGV

Re: Crash in 3101

Posted: Sat 06 Feb 2021 11:41 am
by mcrossley
It's not an obvious crash in Cumulus, it could have been in Mono. It may be worth a check in the system logs.

Re: Crash in 3101

Posted: Sat 06 Feb 2021 11:58 am
by HansR
If it is a mono crash you will find a file containing that info in the cumulus directory.
I don't remember the naming but it is unmistakenly a mono crash dump info file.

Re: Crash in 3101

Posted: Sat 06 Feb 2021 9:22 pm
by sfws
I was out in today's mist still weeding the garden, so I did not read your replies until this evening.
In brief, following advice I upgraded Mono, I also upgraded to b. 3107, and the new combination is so far running problem free.
mcrossley wrote: Sat 06 Feb 2021 11:41 am It's not an obvious crash in Cumulus, it could have been in Mono. It may be worth a check in the system logs.
freddie wrote: Sat 06 Feb 2021 11:32 am Looks like a crash in Mono to me, rather than MX. What version of Mono are you running?
HansR wrote: Sat 06 Feb 2021 11:58 am If it is a mono crash you will find a file containing that info in the cumulus directory.
I don't remember the naming but it is unmistakably a mono crash dump info file.
@HansR: I looked in CumulusMX directory, but did not see any file not derived from the release. Maybe I should search elsewhere for a mono crash dump file ?

@Freddie: I just stopped 3094 MX, then tried an upgrade to latest mono, it did update many components, so my RPi had been running an older version. In the 12 hours since I reverted to the old 3094 MX release this morning, it and the old mono worked happily together, just as had happened in September-January. But now I have got mono to be definitely up to date, I will install up-to-date MX too before I restart.
The older MX release has never suggested any hardware issues, and manually accessing my hardware does not report any problems. I am too tired now for any further investigation.

@mcrossley: The various logs in /var/log did have extra entries from yesterday afternoon, these continued last night. The system log worried me in the little time I spent looking at it. Many of the lines, after that sudden change, were talking about new (being assigned incrementing numbers) users joining (without explaining why! There were a few references to devices by numbers that also seemed to be incrementing. Could mono have been creating the new users each time MX did some access, or could an intermittent hardware issue make each access get treated as fresh (?), or was someone breaking into my LAN while I slept last night. Again, I can try investigating further, including looking at older logs, when I am not yawning.

Re: Crash in 3101

Posted: Sat 06 Feb 2021 9:44 pm
by freddie
@sfws if you like, you could PM me the appropriate part of your syslog and I will take a look (it's my day job).

Re: Crash in 3101

Posted: Sun 07 Feb 2021 1:03 am
by HansR
@sfws: no, I don't think so as CMX is running in that directory. I never found them elsewhere.

Re: Crash in 3101

Posted: Sun 07 Feb 2021 3:10 pm
by mcrossley
Are the "new" just related to you viewing the admin pages. Each page load is logged as a new client connection as the static content does not maintain persistent connections.

Re: Crash in 3101

Posted: Sun 07 Feb 2021 10:41 pm
by sfws
Just spotted MXdiags file reports a rainfall rate field with format error in dayfile.txt line for 23rd March 2017. I was running Cumulus 1, (2 homes ago) when that line was created. Pity the admin interface datalog editing page for dayfile does not let you pick which lines to show as the correction was approximately half-way between start and end (on p216 of 368 pages), and getting to that page is therefore over hundred clicks.
(Incidentally, the format would have been acceptable in Cumulus 1, but I don't believe it was stored like that, I suspect subsequent corruption has made it unacceptable for MX).

I now realise, reading whole dayfile.txt is something that has changed between the old release I was running, and newer releases. I take it MX now stores a duplicate of the whole log file somewhere in memory, that is a lot of extra i/o operations when like me you have over 1 1/4 decades stored in that file. I also see there are a lot more .json files in /web and although I don't think I'm uploading them, just generating them represents a huge increase in i/o actions. Perhaps I will go back to a simpler MX release, abandoning this bloatware. One of the strengths of the original Cumulus software was its https://cumuluswiki.org/a/FAQ#What_is_t ... _update.3F small uploads.

freddie wrote: Sat 06 Feb 2021 11:32 am Maybe even a hardware problem?
mcrossley wrote: Sun 07 Feb 2021 3:10 pm Are the "new" just related to you viewing the admin pages
Yes Mark, I wanted to check overnight low temperature before restarting my gardening, so I did use admin interface on my mobile phone that morning, and I did navigate between pages. It was seeing admin interface pages with blanks instead of figures that made me discover crash. Thinking about it, although the MX diags file ended the previous afternoon, MX was partly still working on RPi to generate web server to load the admin pages that morning on my mobile, but the api was not populating pages with figures, suggesting an i/o failure (ta Niall) potentially caused crash. Anyway, I'm glad it is not an outsider hacking in!

Re: Crash in 3101

Posted: Sun 07 Feb 2021 11:11 pm
by water01
You can now choose which .json files to upload for the graphs.

Re: Crash in 3101

Posted: Mon 08 Feb 2021 7:30 am
by sfws
sfws wrote: Sun 07 Feb 2021 10:41 pm there are a lot more .json files in /web and although I don't think I'm uploading them,
water01 wrote: Sun 07 Feb 2021 11:11 pm You can now choose which .json files to upload for the graphs.
Read the last paragraph of the release announcement again, and then read the quote from my post. Put those together, and your response is not relevant.

The fact that more .JSON files are always created locally, is the cause of the considerable increase in i/o operations and related h/w wear I described.

Re: Crash in 3101

Posted: Mon 08 Feb 2021 11:42 am
by mcrossley
sfws wrote: Mon 08 Feb 2021 7:30 am
sfws wrote: Sun 07 Feb 2021 10:41 pm there are a lot more .json files in /web and although I don't think I'm uploading them,
water01 wrote: Sun 07 Feb 2021 11:11 pm You can now choose which .json files to upload for the graphs.
Read the last paragraph of the release announcement again, and then read the quote from my post. Put those together, and your response is not relevant.

The fact that more .JSON files are always created locally, is the cause of the considerable increase in i/o operations and related h/w wear I described.
Yeah, that is a temporary situation to get around the issue of people wanting the files, but not wanting to FTP them. Long needed much finer grained control of all file output and transfer is on the way...