Highlight annotation is broken for some pdf

I am using version 7.2.1

For some pdf, the highlight annotation is not working properly. The highlighted text is not taking space as the text in the pdf. The same thing is happening in the official demo.

Hi there!

Are you able to provide us with that document so we can reproduce and find a fix?

Thank you!
Logan

pone.0192022.pdf (2.2 MB)

Here it is

1 Like

Thanks @ehtesham.ahmad.nadim! We can reproduce this issue and are looking into a fix now.

1 Like

Hey there!

I looked into a bit, and the error is coming from the core PDF.js library which unfortunately we do not support. The issue is reproducible in vanilla PDF.js. You can reproduce by opening your document here and then selecting some text, copying it, and pasting it somewhere else. You will see that there are no spaces in the text.

To get this issue fixed you will have to open a ticket in the open source PDF.js repo.

If you wish, we can also open a ticket there on your behalf, and share the document you posted with your permission. Let us know if you would like us to do that.

If you do open a ticket there, please post a link to it here so we can track it and make sure it gets merged in once the ticket is resolved.

Thank you!
Logan

1 Like

Hey @Logan,
Please open a ticket in the PDF.js repo and let me know the update.

Hey @Logan ,
It’s been a while didn’t hear anything from you about the spacing issue. I have found another issue. I think it is related to the spacing issue that I have told you about earlier. When I highlight a mathematical formula in a document pdf js can not recognize the special characters.

I am attaching the document for your convenience.
pbio.3000210.pdf (2.6 MB)

Hey everyone,

Thanks for the additional bug reports here.

There seems to be a lot of other PDF.js users facing the same issue, and there is already multiple bug reports open.

I will try out some of the suggested “fixes” in those threads and let you know if I find a solution. I will also track those issues and see if an actual fix is ever released.

Thanks!

Hello,

I tried out some of the fixes that people suggested in the PDF.js bug reports, and some of them did fix the issue, but the fixes caused multiple other issues throughout the SDK so unfortunately I cannot use them.

The issue is deeply rooted inside the PDF.js source code, so unfortunately I cannot do much to help here.

If this is a major issue for you, consider checking out our sister product, PDFTron WebViewer. It is able to parse the text correctly.

I’ll continue to play around with this issue when I get some time, but for now all I can do is track the GitHub issues and hope a fix gets pushed to the PDF.js repo.

Thanks!
Logan