Page 1 of 2

Historic data import

Posted: Sat 31 Dec 2022 12:06 pm
by packman2008
I'm just starting out testing CumulusMX and have got the latest version up and running on Linux and reading my Davis WLL data (and presumably Davis WL Pro data if there are local outages) so that's a good start :)

I have a load of historic data that I'd like to import if possible. I've searched the forum and found various threads, some very old, some newer, but I'm still not really clear on how to approach this. The data I have is in a Meteotemplate MySQL Database. I've got 5 minute data for about the last 2 years (possibly with a few short gaps) and 1 hourly data from the late 1970's until early 2020 when I got my Davis VP2 up and running at my current location. There may be data in other formats/locations, but the MeteoTemplate database is the most complete with the data all in one place and easily accessible using SQL, so I'm looking to use that if possible. After reading a bunch of messages I'm still confused about what I need to do...

1) Some discussions say to create monthly CMX log files, some seem to say create dayfiles and some say to create both? I can create either or both from the MySQL database but I need to know which to create.

2) The monthly log filename format is MmmYY.txt. The two digit year suggests it's only possible to import data post-2000? Is that correct or would Dec90.txt be recognized as from Dec-1990?

3) After creating the required log files I believe I'd need to run CreateMissing to fill any data holes?

4) If I have to create monthly log files does it matter that some (maybe just one) has a mixture of hourly and 5-minute data?

5) I believe there's no processing of historic data to update records, e.g. highs and lows? I saw comments suggesting that this is a manual process, loading each monthly log and then cutting/pasting records into another file or files? If so, this is probably not something that I'd want to do with the best part of 500 monthly files to process. Is there any way to automate this process? I'm able to code so if there's not currently a way to do this then maybe it's something I could consider writing?

6) Is there any benefit in having 40+ years of data stored in CMX, assuming I could achieve that? I would like to have CMX feed a local MySQL database to maintain an ongoing set of weather data (as well as the MeteoTemplate database to drive my web site) so would it make more sense to load the MT data into the CMX MySQL database and not create the monthly/day CSV log files? If I only had recent data in the CMX log files could I have records in the system that pre-dated the data in the CSV log files? Does this approach have any merit or downsides?

Sorry for all the questions, but as you can see I'm a little bit confused at the moment!

Re: Historic data import

Posted: Sun 01 Jan 2023 2:53 am
by flort
I see there have been quite a number of views of your post but no responses so I'll try and answer some of your questions to get you started.
1) Some discussions say to create monthly CMX log files, some seem to say create dayfiles and some say to create both? I can create either or both from the MySQL database but I need to know which to create.
You will need to create the monthly CMX log files only as the Dayfile can be created using the CreateMissing utility. You will need to ensure your log data files are in the correct format with the fields in the correct order.
2) The monthly log filename format is MmmYY.txt. The two digit year suggests it's only possible to import data post-2000? Is that correct or would Dec90.txt be recognized as from Dec-1990?
Sorry, but I don't know the answer to this one. I assume it would see it as 1990 but I'm only guessing.
3) After creating the required log files I believe I'd need to run CreateMissing to fill any data holes?
Running CreateMissing will generate a Dayfile entry for every day that is contained in your log files. It doesn't fill data holes for days where there is no data in the log files.
4) If I have to create monthly log files does it matter that some (maybe just one) has a mixture of hourly and 5-minute data?
It makes no difference that there is a mix of hourly and 5-minute data. It is just using the data to obtain your highs and lows for each day to populate the Dayfile.
5) I believe there's no processing of historic data to update records, e.g. highs and lows?
Once you have your completed Dayfile you can use the Admin interface to simplify this process. Under the Edit menu you will find options for updating All Time Records, Monthly Records, This Month's Records, and This Year's Records. The editor will show your current values on the left with the Dayfile records to the right. By clicking each entry in the Dayfile list it will update the current values.
6) Is there any benefit in having 40+ years of data stored in CMX
It would be nice to have a complete set of records and ensure your highs and lows are all captured but you would have to weigh up the effort verses benefit to make a decision there. CMX can upload both the Dayfile and the Monthly logs to separate MySQL databases. It depends what information you need as to whether the Monthly logs are required. Once you have your complete set of Monthly logs and the Dayfile then you can automatically create the Databases using the Admin interface. Go to Settings then MySQL settings. Once you set all of your parameters there is a section down the bottom to create the database tables. You will then need to run the ExportToMySQL utility to populate the database tables (refer https://www.cumuluswiki.org/a/Software#Export_To_MySQL for details).

I'm no expert but hopefully this will get you going in the right direction and someone else might like to post if I have any incorrect information.

Regards,
Trevor

Re: Historic data import

Posted: Sun 01 Jan 2023 10:45 am
by packman2008
Thanks Trevor :)

I've started a script to export the MeteoTemplate data into a set of monthly log files so it sounds like I'm on the right track. After I posted that I thought that with my data originating in a database it would be relatively easy to find the records, particularly all time records that were probably many years ago.

Re: Historic data import

Posted: Sun 01 Jan 2023 12:24 pm
by packman2008
Number of records processed: 16072
Added : 15940
Updated: 0
No Data: 61 - please check the log file for the errors
Were OK: 71
I got the data loaded. The 71 OK records were from my first test. The 61 with no data are 60 days where my station was down whilst I moved it to my new house and 1 day (01/01/1979) which CreateMissing wouldn't create even though the data was present. This seems to be something wrong in CreateMissing because it wouldn't add 01/11/2022 when I did a test load of my Nov22log.txt file.

To answer the final question a monthly log file called Dec79log.txt is treated by CreateMissing as containing data from Dec 1979, or maybe it uses the dates in the file. Either way the pre-2000 log files were processed correctly.

Re: Historic data import

Posted: Sun 01 Jan 2023 1:06 pm
by packman2008
I've got a problem with the imported rain data. If I look at the historic rainfall chart I can see values for the rainfall rate covering the last 40 years but there's only daily rain data for the last few years.

The monthly log file has four rain related values...

Rainfall rate
Rain so far
Rainfall counter
Rain since midnight

That's excluding the RG-11 rain which doesn't apply in my case. As I'm using a midnight reset Rain so far and Rain since midnight are the same value. The vast bulk of my data is hourly records so the rainfall rate is simply the current hour rain amount minus the previous hour rain amount. I wasn't sure what rainfall counter was (annual rainfall?) so I didn't add it.

I think I'm only seeing daily rain values for my more recent 5-minute data from my station rather than the older 1 hourly data. I don't understand why different amounts of data per hour would affect the daily rainfall totals. Does anyone have any suggestions?

Re: Historic data import

Posted: Sun 01 Jan 2023 1:59 pm
by freddie
The historic charts get their data from dayfile.txt, so you should look in there to check for missing entries.

Re: Historic data import

Posted: Sun 01 Jan 2023 2:14 pm
by mcrossley
Sorry I don't have time for a comprehensive reply at the mo. But CMX (and CreateMissing) uses the rainfall counter to do its calculations. For Davis stations this is actually the yearly rainfall total that resets on 1 Jan.

Re: Historic data import

Posted: Sun 01 Jan 2023 5:29 pm
by packman2008
Thanks Mark and HNY!

I'll modify my script to update the rainfall count and hopefully that will fix the problem.

Re: Historic data import

Posted: Mon 02 Jan 2023 9:48 pm
by packman2008
I've changed my script to calculate the Rainfall Counter as the annual rainfall to date for each monthly log entry but I'm still seeing odd things. For example from the dayfile...
259 17/09/79 43 230 17:00 13.9 00:00 17.2 14:00 1004.0 23:00 1014.0 00:00 8.0 19:00 0.0
...the final three numbers are the maximum rain rate (8.0mm/hr) which occurred at 19:00 and the total rain for the day which is 0.0mm. That rings alarm bells...how can there have been 8mm/hr between 18:00 and 19:00 (based on hourly logs) but have a daily total of 0mm?

The monthly log file entries for that day are...
Sep79log.txt:17/09/1979,00:00,13.9,93,12.8,27.7,0,0,0.30,0.30,1014.0,491.40,0,0,27.7,0,0,0.0,0.0,0,0,9.4,0,0.0,220,0,0.30,0,0
Sep79log.txt:17/09/1979,01:00,14.0,93,12.9,38.9,0,0,0.37,0.67,1014.0,491.77,0,0,38.9,0,0,0.0,0.0,0,0,7.3,0,0.0,240,0,0.67,0,0
Sep79log.txt:17/09/1979,02:00,14.0,93,12.9,35.3,0,0,0.26,0.93,1013.0,492.03,0,0,35.3,0,0,0.0,0.0,0,0,8.0,0,0.0,240,0,0.93,0,0
Sep79log.txt:17/09/1979,03:00,14.4,87,12.2,38.9,0,0,0.00,0.93,1012.0,492.03,0,0,38.9,0,0,0.0,0.0,0,0,7.5,0,0.0,240,0,0.93,0,0
Sep79log.txt:17/09/1979,04:00,14.5,87,12.3,38.9,0,0,0.00,0.93,1012.0,492.03,0,0,38.9,0,0,0.0,0.0,0,0,7.7,0,0.0,240,0,0.93,0,0
Sep79log.txt:17/09/1979,05:00,14.8,87,12.6,38.9,0,0,0.00,0.93,1011.0,492.03,0,0,38.9,0,0,0.0,0.0,0,0,8.1,0,0.0,240,0,0.93,0,0
Sep79log.txt:17/09/1979,06:00,14.8,87,12.6,35.3,0,0,0.00,0.93,1010.0,492.03,0,0,35.3,0,0,0.0,0.0,0,0,8.8,0,0.0,240,0,0.93,0,0
Sep79log.txt:17/09/1979,07:00,14.8,93,13.7,40.7,0,0,0.50,1.43,1010.0,492.53,0,0,40.7,0,0,0.0,0.0,0,0,8.0,0,0.0,240,0,1.43,0,0
Sep79log.txt:17/09/1979,08:00,14.8,87,12.6,40.7,0,0,0.50,1.93,1010.0,493.03,0,0,40.7,0,0,0.0,0.0,0,0,7.7,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,09:00,15.0,87,12.8,35.3,0,0,0.00,1.93,1010.0,493.03,0,0,35.3,0,0,0.0,0.0,0,0,9.0,0,0.0,240,0,1.93,0,0
Sep79log.txt:17/09/1979,10:00,15.3,87,13.1,33.5,0,0,0.00,1.93,1010.0,493.03,0,0,33.5,0,0,0.0,0.0,0,0,9.8,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,11:00,16.3,87,14.1,27.7,0,0,0.00,1.93,1010.0,493.03,0,0,27.7,0,0,0.0,0.0,0,0,12.2,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,12:00,16.8,82,13.7,31.7,0,0,0.00,1.93,1009.0,493.03,0,0,31.7,0,0,0.0,0.0,0,0,11.8,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,13:00,16.8,82,13.7,35.3,0,0,0.00,1.93,1009.0,493.03,0,0,35.3,0,0,0.0,0.0,0,0,11.1,0,0.0,210,0,1.93,0,0
Sep79log.txt:17/09/1979,14:00,17.2,77,13.1,37.1,0,0,0.00,1.93,1008.0,493.03,0,0,37.1,0,0,0.0,0.0,0,0,11.0,0,0.0,220,0,1.93,0,0
Sep79log.txt:17/09/1979,15:00,16.7,72,11.6,37.1,0,0,0.00,1.93,1008.0,493.03,0,0,37.1,0,0,0.0,0.0,0,0,10.0,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,16:00,15.8,82,12.7,37.1,0,0,0.00,1.93,1007.0,493.03,0,0,37.1,0,0,0.0,0.0,0,0,9.4,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,17:00,15.9,82,12.8,42.5,0,0,0.00,1.93,1006.0,493.03,0,0,42.5,0,0,0.0,0.0,0,0,8.5,0,0.0,230,0,1.93,0,0
Sep79log.txt:17/09/1979,18:00,15.5,87,13.3,31.7,0,0,0.50,2.43,1006.0,493.53,0,0,31.7,0,0,0.0,0.0,0,0,10.4,0,0.0,220,0,2.43,0,0
Sep79log.txt:17/09/1979,19:00,15.4,93,14.3,31.7,0,0,8.00,10.43,1006.0,501.53,0,0,31.7,0,0,0.0,0.0,0,0,10.6,0,0.0,230,0,10.43,0,0
Sep79log.txt:17/09/1979,20:00,15.2,93,14.1,38.9,0,0,0.00,10.43,1006.0,501.53,0,0,38.9,0,0,0.0,0.0,0,0,8.9,0,0.0,230,0,10.43,0,0
Sep79log.txt:17/09/1979,21:00,15.2,93,14.1,37.1,0,0,0.11,10.54,1005.0,501.64,0,0,37.1,0,0,0.0,0.0,0,0,9.3,0,0.0,230,0,10.54,0,0
Sep79log.txt:17/09/1979,22:00,15.2,100,15.2,40.7,0,0,0.00,10.54,1005.0,501.64,0,0,40.7,0,0,0.0,0.0,0,0,9.0,0,0.0,230,0,10.54,0,0
Sep79log.txt:17/09/1979,23:00,16.0,93,14.9,42.5,0,0,0.00,10.54,1004.0,501.64,0,0,42.5,0,0,0.0,0.0,0,0,9.3,0,0.0,230,0,10.54,0,0
The two values before the barometric pressure column are the rain rate and rain so far. The column following the pressure is the rainfall counter and the third last value is the rain since midnight. These all seem reasonable...there are various hourly amounts of rain, which add to the rain so far, rain since midnight and rainfall counter values. The 8mm/hour rain rate at 19:00 is there so we're looking at the right day/time.

If you compare the rainfall counter at 23:00 to the value at 00:00 then that indicates 10.2mm of rain for the day. I know there's weird stuff going on with the rain at 00:00 being included in the current day totals but for now I'm not worrying about that when most of the rain seems to be missing!

There are many other days which show zero or low amounts of rainfall (less than 1mm for the day) when there's actually been significantly more.

I'm going to grab the CreateMissing source code to see if I can see what might be happening but as I've never looked at C# before I have a learning curve to climb. I'm also intrigued to see if I can find out why CreateMissing ignores the first day of entries in my log files (01/01/1979) with the first entry added to the dayfile being 02/01/1979.

PS. I just noticed the final two values in the dayfile are 'High 24 Hour Rain' and 'High 24 Hour Rain Time' which for 17/09/1979 are 10.8mm and 21:00. So it looks like part of the code is counting rain correctly. I've not added the numbers up, but 10.8mm looks about right for the 24 hours up to 21:00.

Re: Historic data import

Posted: Tue 03 Jan 2023 10:30 am
by mcrossley
You will find create missing does some "weird stuff" with the rain at midnight. This is because the log files changed at some point from the 00:00 value being to total for the previous day to it being reset to zero.

I'm not at my machine to check the exact logic at the moment. But...

That may be clue if your logs also have an issue with the 00:00 value.

Re: Historic data import

Posted: Tue 03 Jan 2023 11:44 am
by packman2008
I've seen the references to end of day rainfall in Cumulus versions from 2012 and earlier in the CreateMissing code, but the code doesn't seem to have any references to 2012 in it so it looks like it processes the data the same way whenever it was created. As my data is all freshly generated from a non-Cumulus source it's the same all the way through so I don't think the 2012 change would affect it.

My comment about weird end of day things in my data is because my data is hourly only and on the hour, so I have data for 00:00 each day. Now for most values that's just an instantaneous reading at that time, but when I processed the archive data into the MeteoTemplate database I used the rain reported for 00:00 as belonging to the current (new) day. That clearly doesn't make sense because it's the rain that fell between the last reading (23:00 on the previous day) and 00:00 on the new day. So for each of my days where it rained between 23:00 and 00:00 the rainfall total is a bit low and the next day total is a bit high. If CreateMissing is using the Rainfall Counter to figure out to figure out the rainfall then my data weirdness wouldn't matter because it's just calculating the difference between two values. I'll correct my data in the data extraction script but at the moment it wouldn't make any difference because the new values wouldn't appear after processing by CreateMissing.

I'm part way through setting up a C# development environment on a server so I'll see if I can get CreateMissing to compile and then I can add some debug code to try to figure out what's happening. If you get a chance to look and maybe figure it out before I get my head round a new language then that would be great!

Re: Historic data import

Posted: Tue 03 Jan 2023 10:06 pm
by flort
Just a thought - do you have your rain counter resetting to zero at the start of each day? The standard is for it to start at zero on 1st January and be cumulative throughout the year to 31st December. It may affect it's calculation if you start each day at zero.

Regards,
Trevor

Re: Historic data import

Posted: Tue 03 Jan 2023 10:41 pm
by packman2008
Thanks for the suggestion.

My script generates a running annual rainfall counter starting on 1st Jan through to 31st Dec. The example data I posted shows that by the start of the 17th Sep the total is 491mm and 501mm by the end of that day. So that wouldn't appear to be the problem.

Re: Historic data import

Posted: Tue 03 Jan 2023 10:54 pm
by mcrossley
A quick scan of CreateMissing shows this logic.

The raintoday value is taken from the last raintoday value of the day. With 10-minute logging that would normally be at 23:50 (midnight rollover)

Then the 00:00 record for the following day is examined.
- If it has a value then this is an "old" type record, and the 00:00 raintoday value is used for the previous days total.
- if it does not have a value (=0) then the previous raincounter value (from 23:50 in this case) is subtracted from 00:00 raincounter value, the difference is added to the raintoday to take account of any rainfall missing in the 23:50 to 00:00 period.

It also ignores big jumps in the raincounter.

Re: Historic data import

Posted: Tue 03 Jan 2023 11:13 pm
by packman2008
Ahhh, I think that explains why I'm seeing what I do. I'll re-write my extraction script to make sure that any rain between 23:00 and 00:00 gets added into the 23:00 value and then the 00:00 value for the next day is zeroed. It does have the side effect that I'll never see any rain between 23:00 and 00:00 and I might get records for higher rainfall between 22:00 and 23:00.

Another way to do it might be to turn all the 00:00 records into 23:59 records for the previous day.