Search Support

Avoid support scams. We will never ask you to call or text a phone number or share personal information. Please report suspicious activity using the “Report Abuse” option.

Learn More

The charsetalias.properties file is gone in Firefox 4? Most web sites in Hong Kong use the Big5-HKSCS charset but declare the charset as Big5. Without charsetalias.properties, the encoding needs to be manually changed very often when viewing them.

  • 7 replies
  • 39 have this problem
  • 8 views
  • Last reply by ExpHP

more options

I have been defining Big5 as an alias of Big5-HKSCS in charsetalias.properties. This makes Firefox renders pages declared as "Big5" to be rendered in Big5-HKSCS instead. Big5-HKSCS is a standard published by the Hong Kong government because the Big5 charset does not contain many common Chinese characters used in Hong Kong. However, Big5-HKSCS is just an extension of Big5, so most (if not all) Chinese web sites in Hong Kong using Big5-HKSCS declare their pages to be in Big5 in the <meta> tag.

I am now trying out Firefox 4 RC and found that the charsetalias.properties file is no longer there. So in order to read pages containing characters that are in Big5-HKSCS but not in Big5, I need to switch the encoding manually. This is extremely inconvenient. Can you consider bringing charsetalias.properties back?

Someone noticed this issue as well.

I have been defining Big5 as an alias of Big5-HKSCS in charsetalias.properties. This makes Firefox renders pages declared as "Big5" to be rendered in Big5-HKSCS instead. Big5-HKSCS is a standard published by the Hong Kong government because the Big5 charset does not contain many common Chinese characters used in Hong Kong. However, Big5-HKSCS is just an extension of Big5, so most (if not all) Chinese web sites in Hong Kong using Big5-HKSCS declare their pages to be in Big5 in the <meta> tag. I am now trying out Firefox 4 RC and found that the charsetalias.properties file is no longer there. So in order to read pages containing characters that are in Big5-HKSCS but not in Big5, I need to switch the encoding manually. This is extremely inconvenient. Can you consider bringing charsetalias.properties back? [http://groups.google.com/group/mozilla.dev.i18n/browse_thread/thread/415caa98a6246a00 Someone noticed this issue as well.]

All Replies (8)

more options
more options

cor-el, thanks for the recommendation, but it doesn't really solve the problem. There are just too many sites in Hong Kong using Big5-HKSCS but declaring their pages as encoded in Big5.

more options

As a possible workaround, are you (or a volunteer) good with JavaScript?

There is an addon called Greasemonkey that allows you to write js scripts that can modify webpages you view. Theoretically, a script could be written to detect a Big5 encoding in the header and replace it with Big5-HKSCS.

I myself don't know enough about javascript to write such a script...

Edit: I am teaching myself JavaScript right now.

Edit: Um. I have a script that's properly finding places where encodings are specified, but then I try using .setAttribute() to change it and nothing visibly happens...

Modified by ExpHP

more options

jkmlee, which websites are you having trouble with? So far, all the HK websites I've visited with FF4 RC display properly.

more options

jackfeed, it is not like FF4 RC breaks HK websites. It just cannot display some HKSCS characters. For example, this is a government press release:

http://www.info.gov.hk/gia/general/201007/19/P201007190079.htm

Pay attention to the third Chinese character in the article's title. It cannot be displayed because it is defined in Big5-HKSCS but not in Big5. The rest of the characters on the page are fine because they are Big5 characters.

more options

ExpHP, can you share your script or try the link I posted in my reply to jackfeed? See if that character can be rendered if you change the Big5 encoding in the meta tag to Big5-HKSCS?

Anyway, I still think using a charset alias mapping is more efficient than manipulating the DOM...

more options

Here's the script I was working on.

It determines the encoding of the site by checking a couple of different ways the meta tag can supply encodings. One way is to have a charset attribute on a meta tag, and the other way is to have a meta tag with http-equiv="content-type" and have charset= be part of the content attribute.

(note: I think document.characterSet() might be a far more reliable way to detect encoding since it doesn't depend on the specific way the meta tag was written. But I haven't tested it.)

Debugging messages showed that it could detect the specified encodings fine on the few test pages I was trying it on. The problem is in the way I change the encoding.

It currently uses .setAttribute to attempt to modify the meta tags once it finds the encodings. Unfortunately, this doesn't change the selected encoding in Firefox. It doesn't affect the page source, either.

I have to replace those setAttribute calls with something that will actually affect the selected encoding in Firefox.... but I'm not sure what would accomplish that.

http://pastebin.com/Yt7fygXe

Just a little warning, it's case- and whitespace-sensitive because I didn't get around to using regular expressions yet. So if a page says charset="BIG5", you'll probably need to make it uppercase in the script as well. Sorry, I'm lazy!


Nonetheless, I hope this is a start...

Modified by ExpHP

more options

I found that this add-on can force a charset based on the URL of the page: https://addons.mozilla.org/en-US/firefox/addon/charset-switcher/

I tried to hack this add-on so that it will always change the charset to "Big5-HKSCS" if it is detected to be "Big5". However, it seems that the detection takes place before the parser has found the content-type meta tag because the value of docShell.QueryInterface(Ci.nsIDocCharset).charset I can obtain is always "UTF-8". I assume the parser always assume that a page is in UTF-8 initially until it reaches a content-type meta tag that specifies the charset.

I might try implementing nsICharsetResolver next. It seems that I might be able to do the "Big5" to "Big5-HKSCS" mapping inside notifyResolvedCharset()... Does anyone know any add-on out there using this interface?

Modified by Michael Lee