Welcome to the Cumulus Support forum.

Latest Cumulus MX release 3.13.1 (build 3147) - 13 September 2021

Legacy Cumulus 1 release v1.9.4 (build 1099) - 28 November 2014 (a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

If you are interested in supporting Cumulus then maybe you would like to contribute to the maintenance of the Wiki? This need not take hours of your time - just a half hour here and there can make a big difference, particularly if many people are doing this. Any contributions are very welcome, whether they involve new content or editing of existing content. It will be very helpful to current and future users of Cumulus software if the Wiki is kept well-maintained and current. If you are interested then please contact forum user saratogaWX and ask for a Wiki account.

Cleaning up Data files.

Talk about anything that doesn't fit elsewhere - PLEASE don't put Cumulus queries in here!
Post Reply
Phil23
Posts: 759
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Cleaning up Data files.

Post by Phil23 »

I am currently experimenting with my new Ecowitt & a rPi, preparing the Pi for a remote site.

I've got both a PC & the rPi reading the GW1000 & it's been logging for over a month; with a few hick-ups.

Part 1 of my fix-up.

What i need to do to start with is delete some duplicate records created in the Pi's data from when I had 2 sessions running.

Code: Select all

07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:10,12.7,46,1.4,0,0,0,0.0,0.0,953.9,0.0,25.3,32,0,12.7,12.7,0.0,2,0.00,579.70,10.9,0,0.4,237,0.0,0.0,12.5,10.9
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
I'm not real good on Regular Expressions, but found this:- ^(.*?)$\s+?^(?=.*^\1$)
& it get 168 hits, but is obviously missing duplicate entries where the data after the date & time fields is slightly different.

As in these records where one field has changed by say 0.1.

Code: Select all

09/07/21,10:00,11.6,100,11.6,5,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:00,11.6,100,11.6,6,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.1,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.0,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:20,11.9,100,11.9,6,11,337,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.2,484,0.0,354,0.0,32.7,11.5,14.1
09/07/21,10:20,11.9,100,11.9,6,11,336,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.3,484,0.0,354,0.0,32.7,11.6,14.1
09/07/21,10:30,12.1,100,12.1,5,13,345,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,177,0.00,579.70,11.7,502,0.0,355,0.0,33.0,12.0,14.4
09/07/21,10:30,12.1,100,12.1,6,13,344,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,182,0.00,579.70,11.7,502,0.0,337,0.0,33.0,11.9,14.4
Could anyone help with a Regex that only compares the first 15 characters, and them I presume selects the entire line for deletion with search & replace.

Thanks

Phil.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:

User avatar
mcrossley
Posts: 8780
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Buster Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Cleaning up Data files.

Post by mcrossley »

I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??

Phil23
Posts: 759
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Re: Cleaning up Data files.

Post by Phil23 »

mcrossley wrote:
Thu 15 Jul 2021 9:04 am
I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??
Hmmm,

Think you have nailed it. 259 hits which I think matches up with my Excel attempt today.
The Excel method is a pain though, as it needs reformatting to write the CSV out with correct decimal padding.

Thanks of that.

Phil.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:

User avatar
beteljuice
Posts: 3268
Joined: Tue 09 Dec 2008 1:37 pm
Weather Station: None !
Operating System: W10 - Threadripper 16core, etc
Location: Dudley, West Midlands, UK

Re: Cleaning up Data files.

Post by beteljuice »

I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?

That code effectively starts at the back end and recurses everything.

In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.

Of course once you have done the regex you still need to ensure remaining entries are in timestamp order ...
Image
......................Imagine, what you will KNOW tomorrow !

User avatar
HansR
Posts: 2301
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: Davis Vantage Pro 2+
Operating System: Raspbian GNU/Linux 10 (Buster)
Location: Wagenborgen (NL)
Contact:

Re: Cleaning up Data files.

Post by HansR »

beteljuice wrote:
Thu 15 Jul 2021 3:50 pm
I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?
May I join the club of regex haters?
Hans

https://meteo-wagenborgen.nl
Cumulus build 3147 ● Davis Vantage Pro 2+ ● RPi 3B+ ● Raspbian 5.10.52-v7+ ● Mono 5.18.0.240

Phil23
Posts: 759
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Re: Cleaning up Data files.

Post by Phil23 »

HansR wrote:
Fri 16 Jul 2021 10:53 am
May I join the club of regex haters?
Don't think I'd like to join any Regex related Club.
Has the same appeal as knitting.
beteljuice wrote:
Thu 15 Jul 2021 3:50 pm

In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.
That site is a brilliant tool. Especially the Explanation box.
Started trying to write my own decoded explanations of a few samples yesterday before reconsidering it's effect on my sanity.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:

Post Reply