Lag in pdfjsexporess webviewer when opening 5-6mb pdf with 700+pages

Hi!
There is a significant lag in ReactJs when opening the following pdf.
Everything on the page gets frozen and non-responsive for a few seconds.

https://www.fda.gov/food/fda-food-code/food-code-2017
FDA Food Code 2017 09252019_0.pdf (5.7 MB)

Hi there,

Thank you for reaching pdf.js express forums,

I was unable to reproduce this issue on our demo site:

Are you using the latest version of pdf.js express?
Could you please share a minimal runnable sample project along with the operating system + browser information?

Best regards,
Kevin Kim

Hi there,

I have the same problem with some PDF files. I experience it if the file contains links (but not every file works like that). In this case, the PDF is scanned (?) at the beginning and it may take some time. The more links, the more time it takes. It is not so difficult to reproduce. Just add annotationChanged listener, e.g.:

    annotationManager.addEventListener(
      'annotationChanged',
      (annotations, action) => {
        console.log('Annotation changed', action, annotations[0])
      }
    );

and check your console. My output (for the file attached by @vazearchis) is as follow (it’s printed many many times):

Kevin Kim, indeed it cannot be reproduced on your demo page (even for my file), I don’t know why. I use your library in React application and I just remove from AnnotationManager annotations that don’t fit my logic, but this “scanning” is impossible to omit or I don’t know how to do it. Currently I use @pdftron/pdfjs-express": "8.4.0". As I remember correctly, the same situation happened with the previously used "@pdftron/webviewer": "8.2.0".

My testing file is here:
django.pdf (6.0 MB)

Maybe my case is the same, maybe not, but maybe my investigation will help.

Best regards,
Dorota

Hi there,

It is possible that the link annotations are being created when you load the PDF causing a delay in the annotationLoaded event.

Could you please try the same with disableAutomaticLinking:
https://pdfjs.express/api/Core.DocumentViewer.html#disableAutomaticLinking

Best regards,
Kevin Kim

Thanks for you help! I will try disabling the auto linking and see if it improves the speed.

Thanks for the tip. Unfortunately it doesn’t work for me. I have disabled automatic linking. I checked that isAutomaticLinkingEnabled() returns false, but ‘scanning’ of links starts as before.
@vazearchis, I am really curious about your result.

Best regards,
Dorota

Hi Dorota,

Could you try linearizing the pdf file to see if there’s any improvements?

If this does not work, could you please share a minimal runnable sample project for us to see the issue?

Best regards,
Kevin Kim

Hi Kevin,

I’ll try to give you a sample.

Here we had an issue ended with success and after that I applied locally every tip to the sample. My application is more developed, but the base is the same, so let’s use the sample.
I needed to change several files to run the sample as follow:

package.json file:

"@pdftron/pdfjs-express": "8.4.0",
"react": "17.0.2",
"react-dom": "17.0.2",

index.html file:

<script src="%PUBLIC_URL%/pdfjsexpress/core/webviewer-core.min.js"></script>
<script src="%PUBLIC_URL%/pdfjsexpress/core/pdfjs/PDFJSDocumentType.js"></script>

instead of:
<script src="%PUBLIC_URL%/lib/webviewer.min.js"></script>

copy-webviewer-files.js file (the whole file is as follow):

const fs = require('fs-extra');
const path = require('path');

const copyFrom = path.resolve(__dirname, '../node_modules/@pdftron/pdfjs-express/public/core');
const copyTo = path.resolve(__dirname, '../public/pdfjsexpress/core');

fs.ensureDirSync(copyTo);
fs.copySync(copyFrom, copyTo)

and finally App.js, what I changed the most:

import React, { useRef, useEffect } from 'react';
import './App.css';

const App = () => {
  const viewer = useRef(null);
  const scrollView = useRef(null);

  useEffect(() => {
    if (!viewer) { return }
    const Core = window.Core;
    Core.setWorkerPath('/pdfjsexpress/core');
    const documentViewer = new Core.DocumentViewer();
    // here I applied disableAutomaticLinking and it didn't stop "scanning"
    documentViewer.disableAutomaticLinking();
    window.documentViewer = documentViewer;
    window.WebViewer = {};
    window.WebViewer['l'] = () => 'Insert commercial license key here after purchase';

    documentViewer.setScrollViewElement(scrollView.current);
    documentViewer.setViewerElement(viewer.current);
    documentViewer.loadDocument('/files/PDFTRON_about.pdf').then(instance => {
      console.log('Document loaded');
    });

    // this section was added only for checking "scanning" PDF file for links
    const annotationManager = documentViewer.getAnnotationManager();
    annotationManager.addEventListener(
      'annotationChanged',
      (annotations, action) => {
        const annotation = annotations[0];
        console.log('Annotation changed', action, annotation);
      }
    );

  }, [viewer]);

  return (
    <div className="App">
      <div id="scroll-view" ref={scrollView}>
        <div className="header">React sample</div>
        <div className="webviewer" ref={viewer}></div>
      </div>
    </div>
  );
};

export default App;

You even don’t have tu use my django.pdf file - attatched to the repository PDFTRON_about.pdf file has some links too (less than mine, so it’s faster to test). At the beginning the preview looks bad, please resize the window and then pdf will be readable.

Hopefully, I didn’t miss any change in any of my files and you can easily reproduce the issue.

Best regards,
Dorota

Hi Dorota,

I was able to replicate the sample with your changes and see that the LinkAnnotationElement gets added to the django.pdf very slowly (one every 3 seconds). When scrolling down in the UI (while the previews are minimized), the pages are not rendered yet, and thus are creating the annotations as they load in.
Another thing I notice is that there is no iFrame so the pages are not in a document container. Could this be the issue as the entire document is trying to be rendered all at once?

Best regards,
Kevin Kim

Hi Kevin,

Thank you for your answer. My another remarks are as follow:
If you remove div with scroll-view id, “scanning” runs faster. I use scrollView in my application and I have LinkAnnotationElement getting added very fast. So, I don’t know if it depends on scrollView or not or maybe on any other styles.
You are right, there is no document injected to an iFrame. I can see an iFrame as DOM element below, but it’s almost empty with display none. I have no idea, why I have many many divs (one div per one pdf page) instead of about twenty as I can see on PDF.js Viewer Demo page. Is it “a missing document container issue”, as you mentioned? Hm… maybe…, if there are only several pages, adding links is fast and almost imperceptible for the user. Why doesn’t disableAutomaticLinking help in this case anyway? Do you have any tips for me?

Best regards,
Dorota

Hi Dorota,

disableAutomaticLinking is to prevent WebViewer from creating additional annotations when there are text links on the page. For example if page 50 has ‘google.com’ text, when scrolling down to page 50 (or when page 50 is rendered) the link text becomes an annotation. This usually happens after document has been loaded onto WebViewer.

The div showing up with the id ‘virtualListContainer’ is for processing documents with a lot of pages, and they are rendered every 20 pages.
https://pdfjs.express/api/Core.DisplayModeManager.html#isVirtualDisplayEnabled

I think the main issue comes from the fact that there is no iframe instantiated. In the original App.js, there was a WebViewer instance instantiated. However I believe you modified this to not have the iframe and using without the iframe, your ‘viewer’ tries to load everything all at once in initial load and this could be a huge performance issue for documents with a lot of pages and a lot of annotations.

Best regards,
Kevin Kim

Hi Kevin,

Thanks for your tips and I’m sorry for a long time of break in this topic. I got busy with other projects.
I followed your advice and indeed, if I use instantiated WebViewer (with some tools/bars turned off), I can see “virtualListContainer”, as you mentioned. That’s cool, the number of pages in the DOM is now limited. Then, I tried to use the mentioned disableAutomaticLinking method and it didn’t work for my “scanning links”.
But… now my code is quite similar to this repository, so I modified it a bit and the result is similar to that one in my application. You can achieve it by changing some parts of code:

  1. Update dependencies:
    image

  2. Change the application code a bit and of course use django.pdf file (the same happens for other files):

This part shows that links are somehow scanning, although automatic linking is turned off:

      annotManager.addEventListener(
        'annotationChanged',
        (annotations, action) => {
          console.log('action', action, annotations[0]);
        }
      );

Could you tell me what I’m doing wrong? The code above doesn’t include a licence key, but we do have one and when I use it, it’s exactly the same.

Best regards,
Dorota

Hi there,

Thank you for your response,

On the demo, PDF.js Viewer Demo | PDF.js Express If I add the annotationsLoaded event, it took over 200 seconds for the annotations to load after document loading:

Doing the same on the react sample, there’s a considerably longer time for the annotations to load:

I believe this is expected behaviour. If you call annotationManager.getAnnotationsList(), you will see a very large number of annotations and adding them into the document is what is causing the delay.

Best regards,
Kevin Kim

Hi Kevin,

Thank you for your answer. OK, I see…

I’ve found quite similar topic here, but there is no solution for my case. If such a behaviour works fine and as expected, is there any possibility to turn off collecting links as annotations? I don’t need them - I have an application that allow user to select their own annotations on PDF and links like that are not allowed there (I only use HIGHLIGHT and RECTANGLE annotations). Collecting unnecesary links affects performance.

Sorry if it’s not the case from the topic of this thread anymore :frowning: But maybe it is, who knows :woman_shrugging:

Best regards,
Dorota

Hi Dorota,

If you do not need to use the link annotations that come with the PDF, then possible solutions are:

  1. Create the original PDF without any unnecessary annotations on them (this may not be feasible i.e. django.pdf)
  2. Load the PDF in your application, then delete all the unnecessary annotations (this could take a while as you would need to wait for the annotationsLoaded event as outlined above)
  3. Use a 3rd party software to flatten the annotation, but this would still require you to wait for the annotationsLoaded event (or their equivalent for another software) Example of this would be Apryse (formerly PDFTron) WebViewer: Flatten Annotations in JavaScript PDF Viewer | PDF.js Express SDK
  4. Use a low level PDF editor Apryse Documentation | Documentation

Best regards,
Kevin Kim

Hi Kim,

Thank you for your answer. We will consider which solution would be best for our team.

Best regards,
Dorota