Welcome to the Cumulus Support forum.

Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024

Cumulus MX V4 beta test release 4.0.0 (build 4017) - 17 March 2024

Legacy Cumulus 1 release v1.9.4 (build 1099) - 28 November 2014 (a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

Cleaning up Data files.

Talk about anything that doesn't fit elsewhere - PLEASE don't put Cumulus queries in here!
Post Reply
Phil23
Posts: 884
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Cleaning up Data files.

Post by Phil23 »

I am currently experimenting with my new Ecowitt & a rPi, preparing the Pi for a remote site.

I've got both a PC & the rPi reading the GW1000 & it's been logging for over a month; with a few hick-ups.

Part 1 of my fix-up.

What i need to do to start with is delete some duplicate records created in the Pi's data from when I had 2 sessions running.

Code: Select all

07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:10,12.7,46,1.4,0,0,0,0.0,0.0,953.9,0.0,25.3,32,0,12.7,12.7,0.0,2,0.00,579.70,10.9,0,0.4,237,0.0,0.0,12.5,10.9
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
I'm not real good on Regular Expressions, but found this:- ^(.*?)$\s+?^(?=.*^\1$)
& it get 168 hits, but is obviously missing duplicate entries where the data after the date & time fields is slightly different.

As in these records where one field has changed by say 0.1.

Code: Select all

09/07/21,10:00,11.6,100,11.6,5,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:00,11.6,100,11.6,6,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.1,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.0,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:20,11.9,100,11.9,6,11,337,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.2,484,0.0,354,0.0,32.7,11.5,14.1
09/07/21,10:20,11.9,100,11.9,6,11,336,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.3,484,0.0,354,0.0,32.7,11.6,14.1
09/07/21,10:30,12.1,100,12.1,5,13,345,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,177,0.00,579.70,11.7,502,0.0,355,0.0,33.0,12.0,14.4
09/07/21,10:30,12.1,100,12.1,6,13,344,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,182,0.00,579.70,11.7,502,0.0,337,0.0,33.0,11.9,14.4
Could anyone help with a Regex that only compares the first 15 characters, and them I presume selects the entire line for deletion with search & replace.

Thanks

Phil.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:
User avatar
mcrossley
Posts: 12689
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: Cleaning up Data files.

Post by mcrossley »

I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??
Phil23
Posts: 884
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Re: Cleaning up Data files.

Post by Phil23 »

mcrossley wrote: Thu 15 Jul 2021 9:04 am I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??
Hmmm,

Think you have nailed it. 259 hits which I think matches up with my Excel attempt today.
The Excel method is a pain though, as it needs reformatting to write the CSV out with correct decimal padding.

Thanks of that.

Phil.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:
User avatar
beteljuice
Posts: 3292
Joined: Tue 09 Dec 2008 1:37 pm
Weather Station: None !
Operating System: W10 - Threadripper 16core, etc
Location: Dudley, West Midlands, UK

Re: Cleaning up Data files.

Post by beteljuice »

I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?

That code effectively starts at the back end and recurses everything.

In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.

Of course once you have done the regex you still need to ensure remaining entries are in timestamp order ...
Image
......................Imagine, what you will KNOW tomorrow !
User avatar
HansR
Posts: 5870
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bullseye
Location: Wagenborgen (NL)
Contact:

Re: Cleaning up Data files.

Post by HansR »

beteljuice wrote: Thu 15 Jul 2021 3:50 pm
I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?
May I join the club of regex haters?
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
Phil23
Posts: 884
Joined: Sat 16 Jul 2016 11:59 pm
Weather Station: Davis VP2+ & GW1000 (Standalone)
Operating System: Win10 Pro / rPi Buster
Location: Australia

Re: Cleaning up Data files.

Post by Phil23 »

HansR wrote: Fri 16 Jul 2021 10:53 am May I join the club of regex haters?
Don't think I'd like to join any Regex related Club.
Has the same appeal as knitting.
beteljuice wrote: Thu 15 Jul 2021 3:50 pm
In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.
That site is a brilliant tool. Especially the Explanation box.
Started trying to write my own decoded explanations of a few samples yesterday before reconsidering it's effect on my sanity.
:Now: :Today/Yesterday:

Image

Main Station Davis VP2+ Running Via Win10 Pro.
Secondary Stations, Ecowitt HP2551/GW1000 Via rPi 3 & 4 Running Buster GUI.
:Local Inverell Ecowitt Station: :Remote Ashford Ecowitt Station:
Post Reply