Page 1 of 1

Cleaning up Data files.

Posted: Thu 15 Jul 2021 12:47 am
by Phil23
I am currently experimenting with my new Ecowitt & a rPi, preparing the Pi for a remote site.

I've got both a PC & the rPi reading the GW1000 & it's been logging for over a month; with a few hick-ups.

Part 1 of my fix-up.

What i need to do to start with is delete some duplicate records created in the Pi's data from when I had 2 sessions running.

Code: Select all

07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,16:50,14.4,40,1.0,2,2,237,0.0,0.0,953.8,0.0,26.0,32,1,14.4,14.4,0.0,14,0.00,579.70,12.3,4,0.1,237,0.0,0.0,13.9,12.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:00,13.4,43,1.1,0,1,237,0.0,0.0,953.7,0.0,25.7,33,0,13.4,13.4,0.0,7,0.00,579.70,11.5,0,0.3,237,0.0,0.0,13.1,11.5
07/07/21,17:10,12.7,46,1.4,0,0,0,0.0,0.0,953.9,0.0,25.3,32,0,12.7,12.7,0.0,2,0.00,579.70,10.9,0,0.4,237,0.0,0.0,12.5,10.9
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:20,11.3,51,1.5,0,0,0,0.0,0.0,953.8,0.0,25.1,33,0,11.3,11.3,0.0,0,0.00,579.70,9.5,0,0.5,237,0.0,0.0,11.2,9.5
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
07/07/21,17:30,10.3,55,1.6,0,0,0,0.0,0.0,954.0,0.0,24.8,33,0,10.3,10.3,0.0,0,0.00,579.70,8.6,0,0.5,237,0.0,0.0,10.3,8.6
I'm not real good on Regular Expressions, but found this:- ^(.*?)$\s+?^(?=.*^\1$)
& it get 168 hits, but is obviously missing duplicate entries where the data after the date & time fields is slightly different.

As in these records where one field has changed by say 0.1.

Code: Select all

09/07/21,10:00,11.6,100,11.6,5,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:00,11.6,100,11.6,6,13,330,6.0,2.8,1014.0,33.0,23.8,44,5,11.6,11.6,0.0,75,0.00,579.70,11.0,444,0.0,320,0.0,31.7,11.4,13.6
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.1,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:10,11.7,100,11.7,6,15,337,1.2,3.3,1013.9,33.5,23.9,44,4,11.7,11.7,0.0,82,0.00,579.70,11.0,465,0.0,330,0.0,32.2,11.4,13.8
09/07/21,10:20,11.9,100,11.9,6,11,337,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.2,484,0.0,354,0.0,32.7,11.5,14.1
09/07/21,10:20,11.9,100,11.9,6,11,336,3.0,3.8,1013.9,34.0,24.0,44,6,11.9,11.9,1.0,198,0.00,579.70,11.3,484,0.0,354,0.0,32.7,11.6,14.1
09/07/21,10:30,12.1,100,12.1,5,13,345,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,177,0.00,579.70,11.7,502,0.0,355,0.0,33.0,12.0,14.4
09/07/21,10:30,12.1,100,12.1,6,13,344,1.8,4.1,1013.7,34.3,24.0,45,9,12.1,12.1,1.0,182,0.00,579.70,11.7,502,0.0,337,0.0,33.0,11.9,14.4
Could anyone help with a Regex that only compares the first 15 characters, and them I presume selects the entire line for deletion with search & replace.

Thanks

Phil.

Re: Cleaning up Data files.

Posted: Thu 15 Jul 2021 9:04 am
by mcrossley
I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??

Re: Cleaning up Data files.

Posted: Thu 15 Jul 2021 9:37 am
by Phil23
mcrossley wrote: Thu 15 Jul 2021 9:04 am I'm not a great expert in the black art of regex, but how about something like: ^(.{14}).*\s+?^(?=.{0,14}^\1.*)
??
Hmmm,

Think you have nailed it. 259 hits which I think matches up with my Excel attempt today.
The Excel method is a pain though, as it needs reformatting to write the CSV out with correct decimal padding.

Thanks of that.

Phil.

Re: Cleaning up Data files.

Posted: Thu 15 Jul 2021 3:50 pm
by beteljuice
I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?

That code effectively starts at the back end and recurses everything.

In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.

Of course once you have done the regex you still need to ensure remaining entries are in timestamp order ...

Re: Cleaning up Data files.

Posted: Fri 16 Jul 2021 10:53 am
by HansR
beteljuice wrote: Thu 15 Jul 2021 3:50 pm
I'm not a great expert in the black art of regex ....
That makes two of us Mark :lol:
I got almost identical code but forgot the .* at the end.



I often think regex codesters are a bit insane :?
May I join the club of regex haters?

Re: Cleaning up Data files.

Posted: Fri 16 Jul 2021 10:45 pm
by Phil23
HansR wrote: Fri 16 Jul 2021 10:53 am May I join the club of regex haters?
Don't think I'd like to join any Regex related Club.
Has the same appeal as knitting.
beteljuice wrote: Thu 15 Jul 2021 3:50 pm
In this test the first three (duplicate timestamp) entries have temp values of 11.a, 11.c, 11.b - in that order
As can be seen by a null replacement, the last entry (11.b) is the one that remains.
That site is a brilliant tool. Especially the Explanation box.
Started trying to write my own decoded explanations of a few samples yesterday before reconsidering it's effect on my sanity.