libxul.so on-disk corruption observed on 2 different platforms
I'm not sure I would have believed this if I hadn't seen it with my own eyes.
Symptom: One day, out of nowhere and not pursuant to any user-initiated Firefox installation, configuration or plugin change, a deterministic set of websites begins to crash tabs immediately on page load. Reloading the same website always results in another immediate crash. If the website at a particular URL crashes a tab, it will never load or reload, because it will always crash the tab it is in. Websites not in the set of websites which crash a tab are unaffected. Restarting Firefox does not fix the problem - the same sites crash tabs after restart. Disabling ASLR does not fix the problem.
I first observed this on a Mac OS X laptop (Macbook Pro 2017) running up-to-date Firefox versions at the time. This happened three times in the last six months. As I did not have time to investigate at the time, and observed that merely reinstalling Firefox from upstream archives cleared the problem, I moved on.
Starting yesterday, I began to observe this on my up-to-date x86-64 Ubuntu Eoan laptop:
/var/log/syslog.1:Jan 5 18:19:56 achpee kernel: [3642722.464002] Web Content[17741]: segfault at 3 ip 00007f6881995024 sp 00007ffd2548d010 error 6 in libxul.so[7f687d5ad000+4665000]
/var/log/syslog.1:Jan 5 18:20:40 achpee kernel: [3642765.713880] Web Content[18209]: segfault at 3 ip 00007f2f71605024 sp 00007fff08ecc490 error 6 in libxul.so[7f2f6d21d000+4665000]
/var/log/syslog.1:Jan 5 18:20:53 achpee kernel: [3642778.682450] Web Content[18253]: segfault at 3 ip 00007f46d16fd024 sp 00007ffdf062b6e0 error 6 in libxul.so[7f46cd315000+4665000]
/var/log/syslog.1:Jan 5 21:18:29 achpee kernel: [3653435.096133] Web Content[27212]: segfault at 3 ip 00007f081696d024 sp 00007ffd774ee3e0 error 6 in libxul.so[7f0812585000+4665000]
/var/log/syslog.1:Jan 5 21:19:52 achpee kernel: [3653518.422883] Web Content[28191]: segfault at 3 ip 00007faa3c23d024 sp 00007ffe1e89e1c0 error 6 in libxul.so[7faa37e55000+4665000]
No mitigation I attempted worked until I remembered reinstallation. On a Linux box? Seriously? Follow me (in the attachment, since this forum doesn't support markdown or any sane way of formatting).
By all appearances, a 64-bit word in the middle of the libxul binary was altered by some process which did not affect its mtime. Since Firefox apparently does not validate its components on startup or at any other time, this went undetected and resulted in that code crashing the process whenever it was reached.
I have to say this is a bit disconcerting because I have no explanation at all for this - the package on the machine where I did the investigation was installed in a routine automatic update weeks ago and only yesterday began to exhibit the described symptoms. Perhaps it was corrupted (by CPU heat? memory? disk firmware? cosmic ray? No MCE events have been logged) when it was written out, and I happened not to visit a website which reached the corrupted code until yesterday. That seems unlikely since the crash has happened very frequently since yesterday on a variety of websites:
$ sudo zgrep libxul /var/log/kern.log | wc -l
32
There is a prior bug report describing a similar experience here: https://trac.torproject.org/projects/tor/ticket/20100 For what it's worth, a coworker on another team at my company also ran into these symptoms and mitigated it on his corporate laptop simply by reinstalling.
Is there a way to mine the automated crash reporter data to see if this is an emerging trend of some kind?
Gewysig op
All Replies (1)
This forum does not appear to be allowing me to attach the terminal log in a sane fashion, so here it is as an image.
Gewysig op