Page 1 of 1

UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 11:11 am
by steve
Some questions for the experts out there. As I understand it, we should all really be using utf-8 encoding for our web pages now, and probably HTML 5 also. When Cumulus processes a page to replace the web tags with real data, it always saves the page using ANSI encoding (approximately equivalent to iso-8859-1). This seems to come up quite often on the forum.

So, question 1 - how big an issue is this currently?

It's quite a small change for me to get Cumulus to save ALL processed files in utf-8 instead of ANSI. But if I do this, and change the standard templates to utf-8, this would presumably mean that anyone not currently using utf-8 for their own non-standard pages would have to change them, yes?

Question 2 - is this approach not acceptable?

With a bit more effort, I could provide an option to save processed files in utf-8. One setting for the 'standard' files, and a setting for each 'extra' file. I would still change the supplied standard files to utf-8, but people who are using customised version of the standard files would need the option to turn it off for standard files.

Question 3 - is doing it this way worth the extra effort?

Question 4 - should the utf-8 files have a BOM or not?

I'm thinking that I would convert the standard files to HTML 5 at the same time. Not to use any of the new facilities, just to make them compatible.

Question 5 - is this going to cause any serious issues? Everyone should be using a browser which supports HTML 5 now, yes? Particularly as the pages wouldn't be using any new stuff?

Re: UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 12:18 pm
by mcrossley
Not in any order...

4. No BOM.

5. Yes they should, if people are still using non-HTML 5 compatible browsers are going to have a pretty poor web experience nowadays.

2. Would not a single 'universal' UTF-8 switch that applies to all files be sufficient?

I think you need input from Ken on how this would affect the Saratoga templates as well.


aside:
All my pages are already UTF-8, I just use the web tags to construct them though (apart from the NOAA reports for which you have already provided a UTF-8 option that I use), and there aren't too many extended characters in use - and I tend to use the corresponding HTML escape where there are (° etc)

Re: UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 12:27 pm
by BCJKiwi
Currently edit (and have converted) all my website (Saratoga and packages like Davis Console scripts) scripts with notepad++.
Files are saved as;
Dos/Windows - ANSI as UTF-8 without BOM
Confused - so am I but that is the only thing that works cleanly without issues in XHTML 1.0 transitional AND HTML5.
HTML5, as I understand it, must be utf-8. (note normally written as utf-8, not UTF-8. Usually does not make any difference but have found the od occasion where it had to be utf-8.

There is an exception and that is scripts such as the cloudbase script where the script actually generates a graphic image which must be coded as ANSI - NOT utf-8

However, this is what works and follows the recommendations for XHTML 1.0 transitional and for HTML5.
I do not claim sufficient knowledge to be authoritative - just to advise what I have found works in practise.
You might note that both the cloudbase and Davis console scripts referred to above function in both the Saratoga template environment and in your current standard Cumulus generated web pages.

Having said all that, HTML5 is the current standard and if you are undertaking a major development (which you obviously are) then it should be HTML5 throughout. That means utf-8, and css as many of the traditional formatting techniques used in XHTML 1.0 transitional no longer work or won't pass validation as HTML5.

Re: UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 12:32 pm
by BCJKiwi
Re the Saratoga scripts - Ken already has code to manage a mixture of XHTML 1.0 transitional, and, HTML5 within his templates.

Re: UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 12:35 pm
by water01
Would agree with Mark, changed all my pages to uft-8 without BOM when I switched to PHP and webtags to get around the problem of extraneous characters that kept popping up with iso-8859-1.

With the demise of XP most people should be on IE9 at least which means HTML5 shouldn't be a problem, but I guess there will be some diehards that will stick with XP!!

Re: UTF-8 (and HTML 5)

Posted: Wed 23 Apr 2014 2:48 pm
by steve
mcrossley wrote:2. Would not a single 'universal' UTF-8 switch that applies to all files be sufficient?
I was thinking about the case where the standard files were utf-8 but someone has extra files which are not.