I'd like to see how well my initial experiments with Google Fusion Tables will scale up.
I've only been logging since Nov 2011 (at 5 min window) so I've only got 8 full months of data (8 x about 950k each).
This works fine, but does anyone have a large amount of data they wouldn't mind letting me copy and use?
Sort if like 3 or 4 years - I'm trying to see what happens with 20-30mb data sets - 3 years of 5 minute window logs should give me about 311,000 rows to experiment with.
You can either zip them up or I can grab them raw from your site or ... whatever is easiest!
Welcome to the Cumulus Support forum.
Latest Cumulus MX V4 release 4.4.2 (build 4085) - 12 March 2025
Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024
Legacy Cumulus 1 release 1.9.4 (build 1099) - 28 November 2014
(a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)
Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki
If you are posting a new Topic about an error or if you need help PLEASE read this first viewtopic.php?p=164080#p164080
Latest Cumulus MX V4 release 4.4.2 (build 4085) - 12 March 2025
Latest Cumulus MX V3 release 3.28.6 (build 3283) - 21 March 2024
Legacy Cumulus 1 release 1.9.4 (build 1099) - 28 November 2014
(a patch is available for 1.9.4 build 1099 that extends the date range of drop-down menus to 2030)
Download the Software (Cumulus MX / Cumulus 1 and other related items) from the Wiki
If you are posting a new Topic about an error or if you need help PLEASE read this first viewtopic.php?p=164080#p164080
Anyone got any fairly massive data se
-
lardconcepts
- Posts: 35
- Joined: Sat 26 Nov 2011 10:11 pm
- Weather Station: Maplin WH1801
- Operating System: Win 8.1 Pro
- Location: Mid-wales
- steve
- Cumulus Author
- Posts: 26672
- Joined: Mon 02 Jun 2008 6:49 pm
- Weather Station: None
- Operating System: None
- Location: Vienne, France
- Contact:
Re: Anyone got any fairly massive data se
I've got just under 5 years worth of data (from when we moved here) but at 10 minute log intervals (about 40 MB). It's likely that there are people with more than that, but I'm happy to let you have mine to be going on with. I can zip it and put it on the web for you to download if you want?
Steve
-
lardconcepts
- Posts: 35
- Joined: Sat 26 Nov 2011 10:11 pm
- Weather Station: Maplin WH1801
- Operating System: Win 8.1 Pro
- Location: Mid-wales
Re: Anyone got any fairly massive data se
Hey, 40Mb would be brilliant! That's still going to be enough rows to push the limits. Thanks a lot, Steve.steve wrote:I've got just under 5 years worth of data (from when we moved here) but at 10 minute log intervals (about 40 MB). It's likely that there are people with more than that, but I'm happy to let you have mine to be going on with. I can zip it and put it on the web for you to download if you want?
-
lardconcepts
- Posts: 35
- Joined: Sat 26 Nov 2011 10:11 pm
- Weather Station: Maplin WH1801
- Operating System: Win 8.1 Pro
- Location: Mid-wales
Re: Anyone got any fairly massive data se
Right, Steve's data worked fine, and I'll post again when I've done something interesting with it!
But another member PM'd me with more data (he's not replied to whether I can use it publicly yet, so I won't post links here until he does).
It's 5 years of data, at 1 minute resolution. I quickly found the limits on Google Fusion Tables and Excel - just over 1 million rows PER IMPORT.
But it seems to suggest you can MERGE tables and make them bigger.
So I batched it up into years - made 5 folders called 2008,2009,etc and copy the months for each year into there.
Open cmd in windows - copy *.txt 2008.csv etc.
Now we need to get the standard date and time into column A, but spreadsheets don't seem to get on well with standard concatenate for date/time.
So, you need: =INT(B2)+MOD(C2,1) - this assumes you've copied in the header row and are starting at row two.
Copy down to the end of the sheet. Save as CSV.
Then I hit another well-hidden limit in Fusion Tables - 250Mb per user across all fusion tables, despite my having bought extra storage (25Gb per year for £5 across all services including Drive is still a bargain though!).
So, I used another account to import the last table, then a third, fresh account with no fusion data at all to do the merge, having shared the other tables.
It was a Google apps account which said that fusion tables were not enabled, despite the fact it says it can be enabled.
Turns out Fusion tables CAN'T be enabled for apps accounts, yet:
So I created brand new "standard" account to test - I managed to import one more table, then started to merge with one other table (so 2008 and 2009 are merged).
It seemed to go OK, but when I went to sort by date, it said:
Failed to create table.
OR
Could not fetch exact count. Try reloading the page.
This latter message also appears when sorting by date, descending.
As does "Could not fetch data. Try reloading the page."
When I go back to the docs list and click that table I mentioned above, after about a minute of working, it says:
Not Found
Error 404
it also seems to have created another identically named table with a different ID which only has 59,158 rows. This is well short of any of the single tables.
I've posted all this in the fusion help group - and it's very useful having this large data set, precisely so I could see what problems I might run into in the future.
One way round it would be to somehow average out the 1 minute resolution into a 1 hour timeframe average, but given that there aren't always 60 samples per hour, this is beyond my current spreadsheet knowledge.
And besides, the way Excel has been struggling with this data even on my 3Ghz quad core 4Gb RAM SSD PC, it's been generating enough heat and fan activity to keep my little office nice and warm on these cold summer days!
I'll post again when I get anything useful or interesting.
But another member PM'd me with more data (he's not replied to whether I can use it publicly yet, so I won't post links here until he does).
It's 5 years of data, at 1 minute resolution. I quickly found the limits on Google Fusion Tables and Excel - just over 1 million rows PER IMPORT.
But it seems to suggest you can MERGE tables and make them bigger.
So I batched it up into years - made 5 folders called 2008,2009,etc and copy the months for each year into there.
Open cmd in windows - copy *.txt 2008.csv etc.
Now we need to get the standard date and time into column A, but spreadsheets don't seem to get on well with standard concatenate for date/time.
So, you need: =INT(B2)+MOD(C2,1) - this assumes you've copied in the header row and are starting at row two.
Copy down to the end of the sheet. Save as CSV.
Then I hit another well-hidden limit in Fusion Tables - 250Mb per user across all fusion tables, despite my having bought extra storage (25Gb per year for £5 across all services including Drive is still a bargain though!).
So, I used another account to import the last table, then a third, fresh account with no fusion data at all to do the merge, having shared the other tables.
It was a Google apps account which said that fusion tables were not enabled, despite the fact it says it can be enabled.
Turns out Fusion tables CAN'T be enabled for apps accounts, yet:
So I created brand new "standard" account to test - I managed to import one more table, then started to merge with one other table (so 2008 and 2009 are merged).
It seemed to go OK, but when I went to sort by date, it said:
Failed to create table.
OR
Could not fetch exact count. Try reloading the page.
This latter message also appears when sorting by date, descending.
As does "Could not fetch data. Try reloading the page."
When I go back to the docs list and click that table I mentioned above, after about a minute of working, it says:
Not Found
Error 404
it also seems to have created another identically named table with a different ID which only has 59,158 rows. This is well short of any of the single tables.
I've posted all this in the fusion help group - and it's very useful having this large data set, precisely so I could see what problems I might run into in the future.
One way round it would be to somehow average out the 1 minute resolution into a 1 hour timeframe average, but given that there aren't always 60 samples per hour, this is beyond my current spreadsheet knowledge.
And besides, the way Excel has been struggling with this data even on my 3Ghz quad core 4Gb RAM SSD PC, it's been generating enough heat and fan activity to keep my little office nice and warm on these cold summer days!
I'll post again when I get anything useful or interesting.
- yv1hx
- Posts: 223
- Joined: Mon 05 Apr 2010 10:40 pm
- Weather Station: No station yet ...
- Operating System: Win XP Professional
- Location: Some point in the Earth
Re: Anyone got any fairly massive data se
Dear lardconcepts:
If you still require a bunch of data, please have a look to my site: http://met.ivic.gob.ve/red/Cabimas/data/
I have registers form 2008 most of them every minute.
If you still require a bunch of data, please have a look to my site: http://met.ivic.gob.ve/red/Cabimas/data/
I have registers form 2008 most of them every minute.
Marco
-
lardconcepts
- Posts: 35
- Joined: Sat 26 Nov 2011 10:11 pm
- Weather Station: Maplin WH1801
- Operating System: Win 8.1 Pro
- Location: Mid-wales
Re: Anyone got any fairly massive data se
Thank you very much, I have taken a copy to experiment with. I am still trying to figure out a way to average data out into hourly resolution; still hitting those Excel limits!yv1hx wrote:Dear lardconcepts:
If you still require a bunch of data, please have a look to my site: http://met.ivic.gob.ve/red/Cabimas/data/
I have registers form 2008 most of them every minute.
I think SQL might be one solution - going to have a look at that possibility next week!
Again, thank you!
- yv1hx
- Posts: 223
- Joined: Mon 05 Apr 2010 10:40 pm
- Weather Station: No station yet ...
- Operating System: Win XP Professional
- Location: Some point in the Earth
Re: Anyone got any fairly massive data se
lardconcepts:
At some point I was stuck in the same manner than you with the Excel limits and bad behaviors, (I was trying to convert old pre-Cumulus data to the Cumulus format), but I solve it learning some scripting PHP and writing a script in that language...Was a good solution because there is a plenty of tutorials and support available.
At some point I was stuck in the same manner than you with the Excel limits and bad behaviors, (I was trying to convert old pre-Cumulus data to the Cumulus format), but I solve it learning some scripting PHP and writing a script in that language...Was a good solution because there is a plenty of tutorials and support available.
Marco