Page 1 of 1

CMX crashes and web uploading

Posted: Tue 04 Jul 2023 7:01 am
by billy
Just reporting that my CMX has crashed three times in the last 3 weeks. All seem related to uploads to the web server, so maybe this is the same issue as in viewtopic.php?p=172284#p172284 but I'm not sure of this, hence this separate thread.

On the first two occasions … 13 and 29 June, see the first two diags files … the crashes were associated with uploads in the early hours of the morning. It may be of interest that typically around 00:30, on many/most mornings, a MySQL upload fails, an alarm is triggered, but usually the data do end up being uploaded. I have assumed/guessed this “routine” problem was triggered by a the web server issue that was, maybe, not handled smoothly by CMX. Though note the 29 June crash seems to have started with a PHP upload failure.

The third “crash”, again in the early hours of the morning ... this morning 04 July, see third diag file ... was a little different but was again associated with web uploading. The timeline seems to be this:

00:31:32 MySQL uploading error. ie the “regular” occurrence that “seems” benign, but there are other uploading errors occurring in the early hours of the morning, none seemingly fatal.

02:03:14 This error seems more significant. From this point on, uploads are unsuccessful - all of them.

04:09:07 Something more sinister happens here and it is the last entry in the diag file for about 2 hours until:

06:11:08 Exiting system due to external SIGTERM signal ... that's me attempting a restart. This didn't "seem" to work so I rebooted the rpi. I was in a hurry, so maybe didn't give it enough time.

I noticed that during this two hour gap, the monthly file continues to be updated. But the “RecentData” table in cumulusmx.db goes from an entry every minute to once every 5 minutes. (I guess the RecentData table is where CMX gets it's one minute data for the recent graphs?)

I've only picked the eyes out of the diag files, so maybe those with a better understanding will see significance in things I haven't paid attention to. Anyhow I would be grateful for insights. From my experience crashes are rare so I don't have much experience in reading these tea leaves.

Edit: and I forgot to add that WLL stopped reporting during some of the time this morning

Re: CMX crashes and web uploading

Posted: Tue 04 Jul 2023 10:17 am
by HansR
A short remark: it seems to be a network error with some duration and it seems to be once a week. You might ask your provider if that can be confirmed.

I had an outage of 2.5 hrs during the night - only once since I started using PHP upload - at some point which was definitely related to a modem and network software update on the provider side. Apparently the PHP upload is more sensitive for that than the FTP subsystem. CMX and my CUtils failed any time they tried to connect but CMX continued and kept storing data (I use no MySQL so no true crash occurred). Then when the network came back on everything in turn restored and the whole system recovered withing 5 minutes.

1) My guess from what I read in your first logfile is that the CMX error handling recovery of the MySQL part need some additions/improvements.
2) I don't see an error in the 4/7/23 logfile => wrong logfile (of the restart after the error?)

Re: CMX crashes and web uploading

Posted: Tue 04 Jul 2023 11:09 am
by billy
Hans,
Thanks for pointing out the incorrect upload :groan: The original post now has the correct one. I'd be interested to know your view of what that might tell us.
The problem is not regular enough to be classified as "weekly".
There were some more-than-usual network issues last night - as evidenced by the WLL issues I alluded to, although that may have been a WL issue.
I will go back to the web service provider. but will wait for Mark's view of this before taking that step.

Re: CMX crashes and web uploading

Posted: Tue 04 Jul 2023 12:33 pm
by HansR
billy wrote: Tue 04 Jul 2023 11:09 am Thanks for pointing out the incorrect upload :groan: The original post now has the correct one. I'd be interested to know your view of what that might tell us.
Well, I am far from an expert on WLL but my first reaction is the errors start when at:

Code: Select all

2023-07-03 10:09:26 No broadcast data received from the WLL for 30 seconds
it has its first real error after multiple WLL: Missed a WLL broadcast message messages. When it misses again 30 seconds later it sends an alarm mail to the user.

This indicates a network error either failing the external or internal network or the WLL itself. I have no idea.
Then it recovers and continues until:

Code: Select all

2023-07-03 14:24:54.067 PHP[225]: Error uploading to realtimegauges.txt - Exception Type: System.Net.Http.HttpRequestException
Message: An error occurred while sending the request.
Inner Exception... 
Exception Type: System.IO.IOException
Message: Unable to read data from the transport connection: Connection reset by peer.
Inner Exception... 
Exception Type: System.Net.Sockets.SocketException
Message: Connection reset by peer
where it fails on the sockets to the webserver.

It recovers again with failing WLL broadcasts.

Faling again WLL get broadcast at: 2023-07-03 19:14:57 & 2023-07-03 20:05:27

Then at :

Code: Select all

2023-07-04 00:29:41.027 PHP[84]: Upload to realtimegauges.txt: Response text follows:
Error: TimeStamp is out of date
Data TS   = 1688401764
Server TS = 1688401780
meaning either network or the processing on server side is too slow. It recovers.

Then at:

Code: Select all

2023-07-04 00:31:32.727 CustomSqlMins[0]: Error encountered during MySQL operation = One or more errors occurred. (Unable to read data from the transport connection: Operation on non-blocking socket would block.)
from which it seems to recover until :

Code: Select all

2023-07-04 02:03:14.859 PHP[202]: Error uploading to realtimegauges.txt - Exception Type: System.Threading.Tasks.TaskCanceledException
Message: The operation was canceled.
Inner Exception... 
Exception Type: System.ObjectDisposedException
Message: Cannot access a disposed object.
Object name: 'MobileAuthenticatedStream'.
after which the network seems to fail continuously with No route to Host as the main reason. Which seems it cannot access the DNS. This eventually goes to network unreachable. Within this error situation apparently it seems to recover but it does not really and it keeps hanging in

Code: Select all

2023-07-04 02:16:54.902 Realtime[2]: Warning, a previous cycle is still processing local files. Skipping this interval.
etc...

So to summarize: the network starts failing, degrades and CMX loses track. In the end no recovery possible. So when using PHP upload it seems a more strict recovery scheme would be useful but we have to remind that the network may be out for longer periods out of CMX influence. I assume the local network may continue but as I experienced personally, if they start updating the modem remotely that may not be the case.

That is somewhat my view of it might tell us ;) .

Re: CMX crashes and web uploading

Posted: Tue 04 Jul 2023 12:55 pm
by billy

Code: Select all

2023-07-03 10:09:26 No broadcast data received from the WLL for 30 seconds
I suspect this particular instance is not of great significance as (1) it occurs often in my system - usually a couple of times a day; and (2) this particular case is about 14 hours before things went haywire. Mind you, I have assumed it is a problem with my local network and thought it might be time to replace my modem/router.

Re: CMX crashes and web uploading

Posted: Tue 04 Jul 2023 1:25 pm
by HansR
billy wrote: Tue 04 Jul 2023 12:55 pm

Code: Select all

2023-07-03 10:09:26 No broadcast data received from the WLL for 30 seconds
I suspect this particular instance is not of great significance as (1) it occurs often in my system - usually a couple of times a day; and (2) this particular case is about 14 hours before things went haywire. Mind you, I have assumed it is a problem with my local network and thought it might be time to replace my modem/router.
Agree it is not of great significance (bout found if searching for error 8-) )

It seems to me at a certain point the socket used for the PHP upload fails - is there a time limit on open http connections? I would not be surprised -and should be restarted. I start my CMX automatically every 24 hrs (because of a backup) so that might be why it is not happening in my system.

Anyway, enough babbling...

Re: CMX crashes and web uploading

Posted: Wed 05 Jul 2023 5:31 pm
by mcrossley
Odd, the CustomMySqlMins function fully encloses the MySQL call in a try, and you can see that it catches the error and reports it quite a few times, then the unhandled exception seems to occur in Mono....

Code: Select all

2023-06-13 00:22:00.496 CustomSqlMins[0]: MySQL executing - INSERT IGNORE INTO realtime (LogDateTime,temp,hum,dew,wspeed,rrate,rfall,press,intemp,inhum,wchill,wgust,heatindex,UV,SolarRad,avgbearing,apptemp,CurrentSolarMax,IsSunny,IsSunUp,FeelsLike,pm2p5,pm10,pm2p5_1hr) Values('23-06-13 00:22:00',11.7,92,10.5,0,0.2,1.0,1023.6,17.2,60,11.7,3,11.7,0.0,0,'172',11.8,0,'0','0','11.8','7.1','7.9','7.6'); DELETE FROM realtime WHERE LogDateTime < DATE_SUB(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+08:00'), INTERVAL 7 DAY)

> a little while later that command fails, the error is caught and reported

2023-06-13 00:22:38.305 CustomSqlMins[0]: Error encountered during MySQL operation = One or more errors occurred. (Unable to read data from the transport connection: Operation on non-blocking socket would block.)

> the same command is immediately sent again - I don't understand this yet, there is no retry in the code

2023-06-13 00:22:38.305 CustomSqlMins[0]: SQL = INSERT IGNORE INTO realtime (LogDateTime,temp,hum,dew,wspeed,rrate,rfall,press,intemp,inhum,wchill,wgust,heatindex,UV,SolarRad,avgbearing,apptemp,CurrentSolarMax,IsSunny,IsSunUp,FeelsLike,pm2p5,pm10,pm2p5_1hr) Values('23-06-13 00:22:00',11.7,92,10.5,0,0.2,1.0,1023.6,17.2,60,11.7,3,11.7,0.0,0,'172',11.8,0,'0','0','11.8','7.1','7.9','7.6'); DELETE FROM realtime WHERE LogDateTime < DATE_SUB(CONVERT_TZ(UTC_TIMESTAMP(),'+00:00','+08:00'), INTERVAL 7 DAY)

> Then there are two unhandled exception errors

2023-06-13 00:22:38.306 !!! Unhandled Exception !!!
2023-06-13 00:22:38.307 !!! Unhandled Exception !!!

> And CMX prints its handling of the error from the second try - the MySQL object no longer exists

2023-06-13 00:22:38.308 CustomSqlMins[0]: Error - Object reference not set to an instance of an object.

> then the unhandled exception details at the end of the log

System.AggregateException: One or more errors occurred. (Unable to read data from the transport connection: Operation on non-blocking socket would block.) ---> System.IO.IOException: Unable to read data from the transport connection: Operation on non-blocking socket would block. ---> System.Net.Sockets.SocketException: Operation on non-blocking socket would block
...

Re: CMX crashes and web uploading

Posted: Thu 06 Jul 2023 2:34 am
by billy
Thanks Hans & Mark,

Guess it's time to do a complete refresh of the rpi .... and I'll go back to the web server provider to see if they have detected anything untoward.

I now have some independent evidence that my internet connection was having difficulties duirng the hours prior to the third "crash", so I'm going to pretend that one didn't happen :roll: