Search Support

Avoid support scams. We will never ask you to call or text a phone number or share personal information. Please report suspicious activity using the “Report Abuse” option.

Learn More

Unable to filter/match UTF-8 in subject or from address

  • 1 cavab
  • 1 has this problem
  • 5 views
  • Last reply by jjoun

more options

I asked this question Dec 9, 2018 with a slightly different subject but did not get one reply... I think I have been waiting patiently long enough to re-post. Doesn't anyone use message filters?

Thunderbird message filters do not match UTF-8 in To/Cc/Bcc/From/Subject headers/body

   No replies
   1 has this problem
   4 views

Dan Liston 12/9/18, 6:51 PM

   Quote

I have been a Thunderbird user since its roots in Netscape. My currently installed version is 60.3.3 (32-bit) on windows 10 (64-bit). Edit by author: Now using 78.5.0 (64-bit) on windows 10 with the same problem. My OS language [and keyboard] setting is "only" English (US).

I have IMAP accounts on all the major [free] email services such as gmail, hotmail, msn, yahoo, aim, and more.

It has become more and more annoying to receive emoji laden Subject headers, and absolutely aggravating when these appear in a From field. In my case... These have always been 100% positively spam. And anyone that knows me (emails me) knows I evangelize ivory snow clean 7-bit ASCII messages.

Under Tools>Options>Display> Advanced I have "Show only display name for people in my address book" unchecked, and Formatting I have "Display emoticons as graphics" unchecked. I have checked/enabled "Adaptive junk mail controls for this account" on all my accounts. And I "never" send HTML bloated email.

I have only today installed the filtaQuilla 1.4.1 add-on trying to resolve my own problem, but this also fails [I'm probably doing it wrong] to match characters above decimal 127 that are UTF-8 encoded. Considering my target Subject/Sender/Recipient fields "contain" rather than start/end with the encoding, I would like the equivalent of a *UTF-8* or regex .*[Uu][Tt][Ff][-]8.* filter to match; ? U T F - 8 ? B ? somebase64encoding ?

The =?UTF-8?B? is enough for my purposes, as some base64 encoding varies wildly, but the leading "=?", "UTF-8?", "B?", and closing "?=" strings are reliable constants. When viewing these fields, the characters appear as a domino (actually a box with 00 above 0E) or a black diamond containing a white question mark. But examination of the raw message by hitting Ctrl-U shows the ASCII text I am trying to match.

How do I prevent the filters from first translating the text before attempting to match it? Edit by author: I don't really care about the above question. I just want to know how to match these emoji-like characters, to flag the messages containing them as spam/trash. Filters should be applied before translation/transformation of characters to allow proper matching. Especially if = and ? do not mean anything to the filters (taken literally) anyway. Shouldn't they be literal in the case of UTF matching as well?

If there is a way to match UTF without specifically looking for UTF, I am open to suggestions. I have also tried looking for a custom header "Content-Type" and matching "charset=utf-8" without any luck. The raw messages I am trying to match only had "Content-Type: text/html;" anyway.

Modified December 9, 2018 at 6:51:14 PM CST by Dan Liston Last Modified November 22, 2020 by Dan Liston

I asked this question Dec 9, 2018 with a slightly different subject but did not get one reply... I think I have been waiting patiently long enough to re-post. Doesn't anyone use message filters? Thunderbird message filters do not match UTF-8 in To/Cc/Bcc/From/Subject headers/body No replies 1 has this problem 4 views Dan Liston 12/9/18, 6:51 PM Quote I have been a Thunderbird user since its roots in Netscape. My currently installed version is 60.3.3 (32-bit) on windows 10 (64-bit). Edit by author: Now using 78.5.0 (64-bit) on windows 10 with the same problem. My OS language [and keyboard] setting is "only" English (US). I have IMAP accounts on all the major [free] email services such as gmail, hotmail, msn, yahoo, aim, and more. It has become more and more annoying to receive emoji laden Subject headers, and absolutely aggravating when these appear in a From field. In my case... These have always been 100% positively spam. And anyone that knows me (emails me) knows I evangelize ivory snow clean 7-bit ASCII messages. Under Tools>Options>Display> Advanced I have "Show only display name for people in my address book" unchecked, and Formatting I have "Display emoticons as graphics" unchecked. I have checked/enabled "Adaptive junk mail controls for this account" on all my accounts. And I "never" send HTML bloated email. I have only today installed the filtaQuilla 1.4.1 add-on trying to resolve my own problem, but this also fails [I'm probably doing it wrong] to match characters above decimal 127 that are UTF-8 encoded. Considering my target Subject/Sender/Recipient fields "contain" rather than start/end with the encoding, I would like the equivalent of a *UTF-8* or regex .*[Uu][Tt][Ff][-]8.* filter to match; ? U T F - 8 ? B ? somebase64encoding ? The =?UTF-8?B? is enough for my purposes, as some base64 encoding varies wildly, but the leading "=?", "UTF-8?", "B?", and closing "?=" strings are reliable constants. When viewing these fields, the characters appear as a domino (actually a box with 00 above 0E) or a black diamond containing a white question mark. But examination of the raw message by hitting Ctrl-U shows the ASCII text I am trying to match. How do I prevent the filters from first translating the text before attempting to match it? Edit by author: I don't really care about the above question. I just want to know how to match these emoji-like characters, to flag the messages containing them as spam/trash. Filters should be applied before translation/transformation of characters to allow proper matching. Especially if = and ? do not mean anything to the filters (taken literally) anyway. Shouldn't they be literal in the case of UTF matching as well? If there is a way to match UTF without specifically looking for UTF, I am open to suggestions. I have also tried looking for a custom header "Content-Type" and matching "charset=utf-8" without any luck. The raw messages I am trying to match only had "Content-Type: text/html;" anyway. Modified December 9, 2018 at 6:51:14 PM CST by Dan Liston Last Modified November 22, 2020 by Dan Liston

All Replies (1)

more options

Sorry this is off-topic but I have no idea how to post my problem. It has taken my entire afternoon to compose my email message and now I can't merge. And this is late.

Thank you.