Can not open .eml files containing unicode chars in filename
When I try to open .eml files using Thunderbird having unicode chars in their filename (most likely originating from unicode subject fields) Thunderbird doesn't find the file. After manually removing all unicode chars, it works as expected.
Unicode chars in filenames are allowed and common in W10. MS Mail & Outlook do open these files without complaints.
W10 Pro latest Build 19041.746 Thunderbird latest 78.7.1 Not related to a recent update
What can I do?
Edit: better caption
Izmjenjeno
All Replies (12)
in TB please choose Unicode UTF-8 encoding.
in TB options/preferences > General > Languages & Appearance > Advanced ( Font & Encodings ) > Text Encodings > Outgoing mail : Unicode (UTF-8) , Incoming mail : Western (ISO 8859-1).
if outgoing mail encoding is different , then set that to what is shown in above.
if incoming mail encoding is different or whats shown in above , then you may change/set that also to : Unicode (UTF-8).
Thx for the reply!
It's not about the content or encoding of an email, but the filename it is saved in.
When saving an email as an .eml-file eg on the desktop having Unicode in the filename like "Test
my bad , sorry , i did not understand completely.
please see : https://en.wikipedia.org/wiki/Comparison_of_file_systems
lately i'm using macOS a lot. i prefer to use the option that saves each email as an individual (eml) file , so that AV has better opportunity to handle+scan each, plus other AV scanners can also check/quarantine it better, w/o halting TB, etc. in macOS, eml filename with unicode chars are handled fine.
macOS filesystem APFS/HFS/HFS+/Extended-Journal, etc , GNU/Linux general filesystem Ext2/Ext3/Ext4 , Unix filesystem (too many types) , ... can handle unicode any byte based.
but windows NTFS needs UTF-16 encoded filename/foldername unicode characters.
so in windows, TB needs to call filesystem APIs to write/read filenames , or use its own UTF-16 based writing/reading progra,/dll APIs.
OK, that seems to be an explanation. Is there a solution?
I'm asking for a customer. Files are there, but they can't open open them using TB, but with Windows Mail and Outlook. Hard to explain to the enduser.
...given, that unicode chars in subjects become more and more common.
I'm using W10 and Thunderbird 78.7.1 I have some files saved in a folder on desktop as .eml files. Those files are saved using subject, but none have " in filename. So I tried to manually edit the filename and a pop up said "A filename can't contain any of the following characters', so I'm not able to edit and test. See image below. This is because certain chars have special meaning and cannot be used, so I'm surprised you have them.
I did not even get to a point where I could opening the .eml
So I sent an email to myself with "special char" in subject as the Subject. Received it and then chose 'Save as' saved as eml in folder on desktop. I noticed at this point, the " double quote had been auto changed to a single ' quote. Clicked on 'Save'. Looked at saved folder to notice the filename was 'special char' in subject The .eml file opened ok
So when I use Thunderbird to save, it auto resets the chars that Windows says cannot be used.
Did they originally save the .eml files using Outlook or MS Mail or import .eml file from a Linux system?
The files are downloaded from a web-server, having a [subject].eml naming convention.
The in windows forbidden chars like / : >< etc have unicode "fullwidth" equivalents, which aren't forbidden under windows and still look like the regular char.
Example : https://www.compart.com/en/unicode/U+FF1A / cs /
Filenames like"RE:abc/xyz.eml" are "legal" even under Windows, but are not readable using TB.
Same applies to all other unicode chars like
The files are downloaded from a webserver. It replaces the in Windows "forbidden" chars like : / < > ? etc with their "fullwidth" counterprarts, which are allowed.
'/' becomes '/' etc.
Doing so, readability ist preserved. "RE:abc / xyz.eml" is an allowed filename under W10.
But even when people use Unicode in subject fields like thumbs, pointers etc, these .eml files can't be opened by TB on Windows.
Edit: sorry, unicode seems to be not allowed here
Izmjenjeno
eeh, this forum can't handle unicode?
Here comes an balloon, followed by text:
Edit:
No, it can't. All after the balloon was cut off.
OK. This subject seems to be complicated on many levels ;)
Izmjenjeno
tigerf said
eeh, this forum can't handle unicode? Here comes an balloon, followed by text: Edit: No, it can't. All after the balloon was cut off. OK. This subject seems to be complicated on many levels ;)
not displayed in forum, but displayed in the email I received. ❉ 🎈 using html entity hex :)
Izmjenjeno
Anyway. TB is the only Windows program I know of that can not open otherwise readable files due to filename woes.
That's unfortunate, imho.
Edit: Beeing a developer myself, I tried to contribute by fixing it in the sources. This seems to be more timeconsuming than anticipated and affordable, so I gave up this approach.
Izmjenjeno
you are right, my earlier post did not have a solution , just related information . sorry.
in macOS , TB is using a latin/english alphanumeric based common name (for imap/pop mail-accounts) & then a code for all .eml files inside mail sub-folders , instead of "<subject>.eml" filenames. i think such practice is good/better (for privacy) , as actual subject is kept inside the file. Content of a file can have unicode chars inside the file , as OS filesystem limitation does not effect content of a file . and filename is using filesystem compatible filename characters.
so, may be TB for windows, should also adopt that type of saving policy . use simple 1st 128 or 256 unicode characters in filenames , so that its compatible with all major OSes/distros , and so that users can transfer TB-profile (if needed) to another OS without further migration hiccups.
and also noticing inside macOS TB , domain-name/server-name is saved as filename , as macOS can handle it so its fine.
So if a domain-name/server-name has non-english IDN/unicode chars , and if thats used as UTF-16 in windows, then that should not be a problem . but direct unicode char if not UTF-16, may create problem in windows.
macOS, linux, etc filesystem can handle it (unicode chars) better than windows filesystem , by able to handle multi-byte unicode chars.