Welcome to the Cumulus Support forum.

Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024

Cumulus MX V4 beta test release 4.0.0 (build 4019) - 03 April 2024

Legacy Cumulus 1 release 1.9.4 (build 1099) - 28 November 2014
(a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)

Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki

MX and mysql data indexes

From build 3044 the development baton passed to Mark Crossley. Mark has been responsible for all the Builds since. He has made the code available on GitHub. It is Mark's hope that others will join in this development, but at the very least he welcomes your ideas for future developments (see Cumulus MX Development suggestions).

Moderator: mcrossley

Post Reply
jlmr731
Posts: 225
Joined: Sat 27 Aug 2016 12:11 am
Weather Station: Davis vantage pro 2
Operating System: Debian
Location: Wickliffe, Ohio
Contact:

MX and mysql data indexes

Post by jlmr731 »

Would like to know if anyone has done any indexing to tables, like monthly or realtime ? I only have a few years on it and about 1.7 million records so in time it will grow even larger, most query's are quick and a few taking less than half to almost a full second to do, as it has been increasing over time. I do one pivot table that takes 2.7 secs and this has me starting to read on optimization. I see there are no indexes in either of these tables and if anyone has something would like to hear about it. Or any one that is a sql guru that can help, with work and life its going to take time to study (yes im looking for a short cut) but also to help others out.

Thanks to all
User avatar
HansR
Posts: 5957
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: MX and mysql data indexes

Post by HansR »

My question would be: what would you like to query ?
In meteorology it is time series which are most interesting and all logfiles are nicely chronological . Personally and through experience, I don' see much optimisation here. Reading ascii files is pretty fast I can tell you and databases have overhead both in searching as in building indexes.

Did you do some measurement and calculations?

Cheers, Hans
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12756
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: MX and mysql data indexes

Post by mcrossley »

My experience is that *nearly all* of the queries I use invoke full table scans, so indexes are of no use there. I have tried adding some indexes, but saw little performance improvement.

As Hans says you need to analyse how your queries are interrogating the database, then decide if an index could help that query. Over indexing can hit your performance too (at work I once had one Oracle DB query that was taking over 2 hours to process - I removed one index and it took ~20 seconds!) , it is not a simple question, nor a one size fits all answer.
User avatar
HansR
Posts: 5957
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: MX and mysql data indexes

Post by HansR »

mcrossley wrote: Mon 11 Nov 2019 12:02 pm Over indexing can hit your performance too (at work I once had one Oracle DB query that was taking over 2 hours to process - I removed one index and it took ~20 seconds!) , it is not a simple question, nor a one size fits all answer.
Right! Been there, done that :|
Always be careful for overkill and idealising databases. Dedicated algorithm may cost less time, than building and maintaining a database. SQL is not free (in resources).
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
ExperiMentor
Posts: 214
Joined: Tue 24 Nov 2015 11:30 pm
Weather Station: Fine Offset & Davis Vantage Vue
Operating System: Windows 10; Raspbian Buster
Location: Switzerland

Re: MX and mysql data indexes

Post by ExperiMentor »

I have at the back of my mind a bit of a plan to see if I can import the archive files into Tibco Spotfire.
That'll not be running on my Raspberry Pi of course, but I think it wouldd be a good package to do the heavy lifting without needing dedicated programming.

Any thoughts?
jlmr731
Posts: 225
Joined: Sat 27 Aug 2016 12:11 am
Weather Station: Davis vantage pro 2
Operating System: Debian
Location: Wickliffe, Ohio
Contact:

Re: MX and mysql data indexes

Post by jlmr731 »

Ok makes sense , Like i said im not close to an expert I like to read about how things work and saw indexing helps for certain data type and figured this was going to grow over the years so maybe start now while its young. Now hans you made perfect point that it is chronological and just a bunch of numbers and yes mark full scans is what i see take the longest but still very fast. guess its the one query i made that does take up the most time, and im sure i could make it better but it works and gives me the output i want. seems like a simple pivot type (or maybe its case statements) query and doing math averaging does take a bit to do, it was faster with out the pivot but then the php would of been harder (for me) to format it the way i want. Guess ill go back to the basics and relearn problem is I dont do any of this as work, but I do it for fun.
Thanks you for the input and here is the one query that started me down this rat hole :)

Code: Select all

SELECT LogDateTime, Temp as temp , round(avg(CASE WHEN month = 1 THEN temp END),2) January , round(avg(CASE WHEN month = 2 THEN temp END),2) Febuary , round(avg(CASE WHEN month = 3 THEN temp END),2) March , round(avg(CASE WHEN month = 4 THEN temp END),2) April , round(avg(CASE WHEN month = 5 THEN temp END),2) May , round(avg(CASE WHEN month = 6 THEN temp END),2) June , round(avg(CASE WHEN month = 7 THEN temp END),2) July , round(avg(CASE WHEN month = 8 THEN temp END),2) August , round(avg(CASE WHEN month = 9 THEN temp END),2) September , round(avg(CASE WHEN month = 10 THEN temp END),2) October , round(avg(CASE WHEN month = 11 THEN temp END),2) November , round(avg(CASE WHEN month = 12 THEN temp END),2) December FROM (SELECT Monthly.* , EXTRACT(YEAR FROM LogDateTime) year , EXTRACT(MONTH FROM LogDateTime) month FROM Monthly ) Monthly GROUP BY year 
Also i do this on data i get from noaa ids and it does work better but it has separate columns for year, mouth, day
It seemed like a nice way to get average monthly temps
User avatar
HansR
Posts: 5957
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: MX and mysql data indexes

Post by HansR »

@experimentor: you got me to a tool which is new to me. Interesting! But very suspicious that you do not see meteorology among their fields of solutions. I am looking forward to your results of the experiment.

@jlmr731: thanks for the query, not sure on which data you run this, but if you run it only on dayfile and not on monthly log files - use those only if really necessary - you have a lot less records and probably the same results. The query itself ,I think, is going to determine the month first and then starts to loop over the month. Difficult to say from a distance, but I assume more than one (probably more) tablescan will be involved. You might start by simply entering the months as literals. There are only twelve and they are known. Why calculate with a query? And for analytical purposes you might want to extract only one (known) year and do some math on the speed (try to find a timing technique).
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
User avatar
mcrossley
Posts: 12756
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: MX and mysql data indexes

Post by mcrossley »

Jeff, I'm not 100% sure what you wanted from that query, but as Hans said, definitely use the day data table where ever possible. You already have an average temp per day in there, so you can average those values over a month.

So in MySQL something like this...

Code: Select all

SELECT
    YEAR(LogDate),
    MONTHNAME(LogDate),
    AVG(AvgTemp)
FROM daydata
GROUP BY YEAR(LogDate), MONTH(LogDate)
Which takes 23 ms on my Raspberry Pi with data since January 2009
jlmr731
Posts: 225
Joined: Sat 27 Aug 2016 12:11 am
Weather Station: Davis vantage pro 2
Operating System: Debian
Location: Wickliffe, Ohio
Contact:

Re: MX and mysql data indexes

Post by jlmr731 »

hans yes I run that on monthly and yes i know dayfile will give me same results but I had to see if it you work just a little slower than i thought but sometimes i do stuff cause its hard and i will learn something.

Mark yes and thanks

Both of you have been a bit of help so tahnks a bunch
Now if there was only a way to automatically record snowfall Yes we had a few inches today
ExperiMentor
Posts: 214
Joined: Tue 24 Nov 2015 11:30 pm
Weather Station: Fine Offset & Davis Vantage Vue
Operating System: Windows 10; Raspbian Buster
Location: Switzerland

Re: MX and mysql data indexes

Post by ExperiMentor »

HansR wrote: Tue 12 Nov 2019 6:11 am @experimentor: you got me to a tool which is new to me. Interesting! But very suspicious that you do not see meteorology among their fields of solutions. I am looking forward to your results of the experiment.
Well, I had to rise to my own challenge!

Ignoring the warnings above, I started with 6 years of monthLog.txt files.Mainly at 10 minute intervals, but I seem to have a few months in 2019 with 1-minute intervals (someone must have fiddled :bash: ) and March 2016 is an empty file ... real world data this!

So I concatenated them (any order as the filenames aren't the cleverest - the rows are time stamped so Spotfire can sort that out :o ). I's 61MB and has 0.5 million rows.

Run Spotfire, 'Add data tables'. Spotfire noted European text format, comma separated, first column a date, second a time, and the rest a mix of integer and real numbers. Click OK
and it imported the whole lot in 5 seconds (as a contrast, my complicated multi-table data sets in my day job take several minutes to load). Manually adding sensible column labels took a little while. Now I can play.

From the date field, I'm automatically given options like Year, Quarter, Month, Day of month, Day of year. Weather varies a lot from start of a month to the end, and Day of year is too granular so I chose 'week of year' to please our Swedish friends. Drag and drop to plot a histogram of Max daily rain by week of year. Immediately see that some weeks have silly maxima (300 mm/day? Not here!). Knowing that high winds rattle the pole and cause false highs, it's easy to 'colour code by max wind speed', and YES that seem to explain some of the high values as shown below. Changed graphs plot instantly.
Weather plot.jpg
Then if I click on the bar for Week 7, select 'Details-on-Demand' a spreadsheet of the values plotted in that bar appears. Click on the column header to sort and find the highest values and I scan start to sort things out. I also notice That I haven't got the column headings right ... things to fix for another day ....

Proper work may follow ...
You do not have the required permissions to view the files attached to this post.
User avatar
HansR
Posts: 5957
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: MX and mysql data indexes

Post by HansR »

Interesting!
ExperiMentor wrote: Thu 14 Nov 2019 12:49 am [...] and it imported the whole lot in 5 seconds (as a contrast, my complicated multi-table data sets in my day job take several minutes to load). [...]
I can hardly imagine Spotfire making a difference that big (5 secs to 3(?) minutes is 36 times as fast). There is fysical limit to reading a fysical file from disk / SD-card. I actually assume Spotfire reads a small part to satisfy you with some data and while you are busy looking at the first results it reads the rest on the background. Simple response trick to make customers believe.

Now that I write this, @mcrossley: that could be a solution while reading the monthly files for the edit functions :!:

You could time the real read yourself by simply making a very short program opening the file and do a File.ReadAllLines. Time before/after and there you are. If that is done in 5 seconds, then Spotfire is as fast as can be. It also means your own method of reading is queued for improvement (is that SQL?) :)
ExperiMentor wrote: Thu 14 Nov 2019 12:49 am Proper work may follow ...
Looking forward.
I may try something myself if I find the time.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
ExperiMentor
Posts: 214
Joined: Tue 24 Nov 2015 11:30 pm
Weather Station: Fine Offset & Davis Vantage Vue
Operating System: Windows 10; Raspbian Buster
Location: Switzerland

Re: MX and mysql data indexes

Post by ExperiMentor »

HansR wrote: Thu 14 Nov 2019 5:08 am
ExperiMentor wrote: Thu 14 Nov 2019 12:49 am [...] and it imported the whole lot in 5 seconds (as a contrast, my complicated multi-table data sets in my day job take several minutes to load). [...]
I can hardly imagine Spotfire making a difference that big (5 secs to 3(?) minutes is 36 times as fast). There is fysical limit to reading a fysical file from disk / SD-card. I actually assume Spotfire reads a small part to satisfy you with some data and while you are busy looking at the first results it reads the rest on the background. Simple response trick to make customers believe.
Sorry, you have misunderstood me!
I was making an aside comment that my 0.5 million record single file (14 million datapoints) is handled by Spotfire much faster than the incredibly complicated database I use it on in my day-job. On the larger dataset (relational database) it's definitely doing pre-processing as it loads - linking items from different tables etc.

Our huge 'weather' dataset seems complicated to us, but it's trivial compared to a high-res photo. And yet it's not until I try to process every frame of a video that my PC refuses to even try to do it in real time.

Selectively pre-loading data definitely has its uses (eg, only need the first few pages of a Word document, or the low-then-increasing resolution as sequential passes of a JPG are loaded, but that is not the explanation here. I said later that the graph drawing is instant - and that histogram involves data from 4 columns and all 0.5 million rows. All rows x 27 columns must be larger than my 16GB RAM, so this must be being read then swapped back to disk. These Operating Systems do clever things under the bonnet.
User avatar
mcrossley
Posts: 12756
Joined: Thu 07 Jan 2010 9:44 pm
Weather Station: Davis VP2/WLL
Operating System: Bullseye Lite rPi
Location: Wilmslow, Cheshire, UK
Contact:

Re: MX and mysql data indexes

Post by mcrossley »

@hans, nice idea, but the records data needs to read all the data to get any results. The process is CPU bound on a rPi rather than IO bound. On my laptop the import and processing only takes a couple of seconds.

I tried importing the data into an in-memory sqlite db, but so far the import performance is even worse than then whole import and processing using flat files. If the data is already in MySQL (as it is on my rPi) then the processing is v. fast even on pi.
User avatar
HansR
Posts: 5957
Joined: Sat 20 Oct 2012 6:53 am
Weather Station: GW1100 (WS80/WH40)
Operating System: Raspberry OS/Bookworm
Location: Wagenborgen (NL)
Contact:

Re: MX and mysql data indexes

Post by HansR »

mcrossley wrote: Thu 14 Nov 2019 1:17 pm but the records data needs to read all the data to get any results.
True, should have thought about that.
And interesting SQL experiment.
Hans

https://meteo-wagenborgen.nl
CMX build 4017+ ● RPi 3B+ ● Raspbian Linux 6.1.21-v7+ armv7l ● dotnet 8.0.3
Post Reply