PDFJSExpress vs PDFTron highlighting problem

Which product are you using?
PDF.js Express Plus

PDF.js Express Version
|UI version|‘8.1.0’|
|Core version|‘8.1.0’|
|Build|‘OS8yMy8yMDIxfGM3YjM4YzBmOQ==’|

PDFTron Version
|UI version|‘8.1.0’|
|Core version|‘8.1.0’|
|Build|‘OS85LzIwMjF8NTU4Zjg4N2Fk’|

Detailed description of issue
If i select or highlight a text in a pdf, not all characters are selected or highlighted.
It also happens if i upload the pdf in your demo application.

Problem picture: https: //drive.google.com/file/d/1EeyTZ8tmvwJN3AqmQsQNoyaYQ9Lpv2KI/view?usp=sharing

The weird thing is. I tried the same pdf (and different pdfs as well) with PDFJSExpress Plus and PDFTron (both which can be free downloaded, the demo version) and the problem only happens in PDF.js Express Plus. In PDFTron the issue isn’t happening. The Screenshot from the expected behaviour is from PDFTron and the screenshot from the issue is from PDF.js Express Plus

Expected behaviour
I expect that all characters which are selected are highlighted.

Expected picture: https: //drive.google.com/file/d/1HOPSJIHpWb692KfYtyqH2HYUp9znYuVS/view?usp=sharing

Does your issue happen with every document, or just one?
It happens not on every document, but on some.

Link to document

Code snippet

  • no code implemented for this. I am using tjhe out of the box functionallity

Hello, I’m Ron, an automated tech support bot :robot:

While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:

Guides:APIs:Forums:

Hmm sorry i dont know how to edit my post.

Here are the missing pictures.

  1. Problem picture:
    Pasteboard - Uploaded Image

  2. Expected picture:
    Pasteboard - Uploaded Image

  1. Link to the PDF which causes the issue:
    https://pseudohub.de/pages/wow/demo/pdf/diploma.pdf

Hey there!

Thank you for the detailed bug report. This is a known issue - it actually stems from the core rendering engine (PDF.js) which we do not normally support - however I will investigate this and see if I can find a fix.

I’ll keep you posted.

Thanks,
Logan

Hi,

Thank you for your reply. What i dont understand is, why in PDFTron no such an error occurs with the highlighting. Isn’t PDFTron also using PDFjs ?

thanks
zesman

Hi!

No, PDFTron uses a custom rendering engine built in-house for the last 20 years. It has much higher accuracy and better text parsing, which is why the text select is better in some scenarios.

Thanks,
Logan

Hi,

did you find a solution to that problem?

Hi,

I am also interested in a solution for this issue. Here is my case:

Thanks a lot,

Evrard

Hi everyone,

As mentioned, text selection is a function of the PDF.js Core library, which we do not support.

I tried to resolve this issue by digging into the PDF.js Core, but could not come up with a good solution - the library is simply providing us invalid text location data. This can happen for a wide variety of reasons and it is impossible to come up with a single generic fix.

We will continue investigating text select issues over time, but I cannot guarantee when we will fix it or if it will ever be fixed.

Thanks,
Logan