Page 1 of 1

Weird issue after long run of CMX

Posted: Sat 05 Sep 2020 8:10 am
by HansR
Something I have noticed before but now is also being observed by Steinar as well, is that CMX seems to become unstable after long running on the RPI. Specifically, when the system is up more than 3 weeks , I get an uptime of -49. It happens when I process the file through CMX but I get the same value using the API.

When using the API I get information in the logfile that the code - my code in cutils - goes really haywire meaning it goes to where it should not go.

It may be mono related, but it happens with the mono version 5 and 6.
A reboot always solved the issue with me, but now it becomes a bit different as Steinar can't reboot being far away from the RPi.

It may be in several different subsystems, I point to Mono (and C# libraries) first, but it could be anywhere. The fact that it happens with CMX translating webtags and the same values appear when asking through the API worries me a lot. I focus initially on the CMX uptime which becomes -49 when the problem occurs.

Any suggestions?
How to progress?

Re: Weird issue after long run of CMX

Posted: Sat 05 Sep 2020 10:45 am
by mcrossley
Looks like a bug in mono - wrapping a counter probably, using a signed int instead of a unsigned long as the millisecond counter? A signed int will wrap to negative values after 24 days of counting milliseconds.

One of my test systems: {"ProgramUpTime":"-132 days -23 hours"}

Re: Weird issue after long run of CMX

Posted: Sat 05 Sep 2020 10:57 am
by mcrossley
I guess as a workaround I could add a new tag <#ProgramUpTimeMs> that would return the uptime in milliseconds. You could then do some two's complement arithmetic on that to get the real value?

Re: Weird issue after long run of CMX

Posted: Sat 05 Sep 2020 11:17 am
by mcrossley
Or I could do it for you in the tag code

If mono *is* using an int value, that of course would still only get you up to 49 days before it wrapped back to zero again. :(

Re: Weird issue after long run of CMX

Posted: Sat 05 Sep 2020 3:04 pm
by HansR
Yes, no doubt we could make a trick to get around the uptime. But there is more to it because it seems none of my API calls work anymore. The logs are inconsistent.
I'll try something else with Steinar and then will come back to this.

Re: Weird issue after long run of CMX

Posted: Sun 06 Sep 2020 10:58 am
by HansR
OK, I got confirmation it is really looks like only a local problem for the uptime of Cumulus (as for the system the webtag method does not work at all anymore and I determine it myself).

However I also looked into it myself a bit and I have an other solution in CMX which works for everybody without have to rework the result in the Webtag (meaning it can be used directly if the uptime webtag is used in a webpage). I'll send the suggestion by PM.

Re: Weird issue after long run of CMX

Posted: Fri 11 Sep 2020 6:32 pm
by sutne
I have upgraded the Raspbian and then had a reboot, so the CumulusMX program uptime Is back to 0.

What I do not understand is why is not the Program uptime counter reset when I stop and start CumulusMX?

Re: Weird issue after long run of CMX

Posted: Fri 11 Sep 2020 7:40 pm
by HansR
Hi Steinar, don't know what you know about coding, but that does not happen because the way Cumulus asks the uptime to the system. It does not save the starttime itself but it asks the process starttime with a system call:

Code: Select all

TimeSpan ts = DateTime.Now - Process.GetCurrentProcess().StartTime;
And that is where control is lost and the error is. Mono probably, maybe Linux, some counter possibly a signed/unsigned thing. We don't know for sure. Anyway, restarting CMX does apparently not solve this. If I look at it I don't understand it either because I would say the timer would restart with a new process. Apparently in that call something does not work. We have no influence on this, and can only calculate the time difference in CMX. A workaround to the current method.

In discussion with Mark we found a workaround, so probably in the next release that will be implemented.
But he's away now so it takes some time.