Buscar en Ayuda

Avoid support scams. We will never ask you to call or text a phone number or share personal information. Please report suspicious activity using the “Report Abuse” option.

Learn More

FF string search "find in tab" ignores manual line break, good, can it also ignore Paragraph Mark, this would be wonderful.

  • 15 respuestas
  • 3 tienen este problema
  • 98 visitas
  • Última respuesta de HugoLudwig

more options

It would be wonderful if FF "find in tab" would ignore the Paragraph Mark Sign. This is all I want - string search and Paragraph Marks ignored - professional search tools do this by default. Please help.

It would be wonderful if FF "find in tab" would ignore the Paragraph Mark Sign. This is all I want - string search and Paragraph Marks ignored - professional search tools do this by default. Please help.

Todas las respuestas (15)

more options

I'm not aware of any built-in feature for this. Maybe there's an add-on which has this capability??

Unfortunately feature suggestions tend to get lost in the support forum.

To get this where you want it to go, here are 3 options (you can use more than 1):

more options

Here's something to experiment with: the built-in find will not cross certain boundaries, but if you hack the page, then you can search more freely. The appearance is somewhat wrecked, but it might be worth it in some cases. To test out this idea:

When viewing the page, open the web console in the lower part of the tab using either:

  • Ctrl+Shift+k
  • "3-bar" menu button > Developer > Web Console
  • (menu bar) Tools menu > Web Developer > Web Console

At the bottom, next to the caret (>>), paste the following long line of code and press Enter:

var allblocks = document.querySelectorAll("p,div,h1,h2,h3"); for (var i=allblocks.length; i>0; i--){var spanNew = document.createElement("span"); if(allblocks[i-1].nodeName.substr(0,1)=="H") spanNew.setAttribute("style", "display:block; margin:1em 0; font-weight:bold;"); else spanNew.setAttribute("style", "display:block; margin:1em 0;"); allblocks[i-1].parentNode.insertBefore(spanNew, allblocks[i-1]); while (allblocks[i-1].childNodes.length>0){spanNew.appendChild(allblocks[i-1].childNodes[0]);} allblocks[i-1].remove();};

This should relocated the contents of div, p, and h1-h3 tabs into span elements which Find will search across. Is that useful at all?

more options

Jefferson,

thank you for your replies - you are great as always. The issue with the paragraph mark is a little broader for me: I do have thousands of HTML files that I have converted from PDF to HTML for better accessibility and I do search 10ens of them at a time in a FF window, so hacking a specific page is unfortunately not practical. If you look closer to the paragraph mark issue it is actually a bug in FF - the "find in tabs" search algorithm is not reliable because of this limitation. In addition, if FF can ignore the manual line break there must be the possibility to ignore the paragraph mark. BTW, in each case where paragraph marks are used for formatting purposes in a web page this problem exists. Who owns the code for "find in tab"?

more options

Oh, I see. The PDF-to-HTML converter probably only cares about appearance and isn't forming logical segments of text as you would find in normal web pages. I guess my question is, how bad is it? Could you paste a sample selection in a Pastebin and provide a link to it?

As for how it might be addressed, I would try the previous links (mailing list, Bugzilla) once we can articulate what needs to be changed.

more options

I see you are smart and you understand the issue. The PDF converter puts a Paragraph Mark at the end of each line for formatting purposes - this is how bad it is. All that is required is the option that Find in Tabs ignores the Paragraph Mark.

Thank you for your help.

more options

Okay, I think it would be useful for the benefit of developers who might look at this bug report to see some of the HTML that is generated.

It sounds as though you are saying every line might start and end with p tags:

<p>and some more text... ending here</p>

But the very specific details probably matter in assessing whether this is something that could be added quickly.

http://pastebin.com/ (no registration required)

more options

Thank you very much for your reply.

I have copied a test file to the pastebin - I hope it works. But to give you better access to my problem here is the link to the product I use: http://www.investintech.com/prod_a2e_pro.htm able2extract professional 9 is, to my knowledge, the best PDF conversion product available. If you download a trial version you can produce authentic output. Thank you again.

more options

Is the issue with any PDF or only with PDFs where the program performs OCR?

To share your Pastebin, you'll need to post the link (or maybe you intend to post it only in the bug report).

more options

Thank you for your reply - the issue is with any PDF. Do you really want that I provide a link to my file system? I'm not so sure that I want to do this....

But, we are getting a little bit off topic and see the problem backwards: The issue is that FF "Find in Tabs" could easily "IGNORE PARAGRAPH MARKS" and the problem would be solved. Any other approach to solve the Paragraph Mark issue is much, much more complicated and will produce unpredictable results. I would really, really very much appreciate if one of the developers of "Find in Tabs" would simply insert a few lines of code and fix that FF bug. Thank you very much for your help and assistance.

more options

I meant a link to the Pastebin you created. You don't have to give an example from a sensitive PDF. If it happens to all PDFs, or all PDFs of a particular type, you can pick any one you like, convert it, then view source of the converted page and paste that into a Pastebin. I guess my point is that having someone else convert something at random may not yield the specific HTML structure that is causing you a problem, which would be a waste of everyone's time.

more options

I did as advised:

The source text is copied to pastebin with the name: PDF Paragraph Marks.

I hope I did it right.

Thank you for your time and help

more options

Okay, I found it: http://pastebin.com/RtTUHAyr

Every line is a fragment encapsulated in two elements, a span and a div:

<div style="position:absolute;left:70.96px;top:187.80px" class="cls_006"><span class="cls_006">"Delivered at Frontier" means that the seller delivers when the goods are placed at the disposal</span></div>
<div style="position:absolute;left:70.96px;top:200.52px" class="cls_006"><span class="cls_006">of the buyer on the arriving means of transport not unloaded, cleared for export, but not cleared</span></div>

Find does not cross over between two div's, just as it does not cross the boundary between two p's. So you would want to file a bug referring to the ability to search across both of these kinds of elements.

more options

Thank you very much for the explanation and clarification. I opened an account in Bugzilla and filed a bug report under: Bug 1120148 - quick find search recognizes Paragraph Mark as character

I'm afraid I'm not particularly good at reporting this technical matter properly. Could you do me a great favor and file the appropriate bug report so that the developers understand what the issue is and can better address this issue? PLEASE!

more options

Okay, I added a comment with a link to this thread for further background.

more options

Thank you sooooooo much! I hope we are successful. Thank you.