How to merge individual annotation XFDFs into one smaller combined annotations XFDF on the backend?

Which product are you using?
PDF.js Express Plus

PDF.js Express Version
UI version 8.7.0
Core version 8.7.5
Build ‘Ny80LzIwMjR8MzMxOTBmNGM5YQ==’

Detailed description of issue
In order to allow real-time collaboration, we followed the best practices described here under “Store annotations”: PDF.js Express Viewer Integration Best Practices | Documentation

We are currently saving each annotation (add, modify, delete) in a separate table row. And we have until recently without issues loaded these annotations one at a time like this:

Back-end:

def get_annotations
        annotations = []
        DevPdfAnnotation.where(dev_pdf_id: params[:dev_pdf_id]).each do |annotation|
            annotations.push({author: annotation.author, xfdfstring: annotation.xfdfstring})
        end

        render json: {
			annotations: annotations.to_json
		}
end

Front-end:

const annotations = JSON.parse(response.annotations);
                      _.each(annotations, function (annotation) {
                           instance.annotManager.importAnnotationCommand(annotation.xfdfstring)
                           .then(importedAnnotations => {
                               instance.annotManager.showAnnotations(importedAnnotations);
                           });
                      });

This has worked until recently, but now with some users doing 400+ annotations on individual documents we’re having issues. The total individual xfdf strings end up adding up to a lot of data, in one case of ours 35+ MB.

On the best practices section I mentioned you say:

“One way to handle this to store the XFDF data for each annotation separately, for example as rows in a database. Then when PDF.js Express requests the annotation data for a document you can run a query to get all the annotations with that document id, append the data together and send it back to PDF.js Express.”

My question is how can we “append the data together and send it back”. Is there a way to merge the individual xfdf strings into one smaller xfdf string like the one exportAnnotations() returns? Could you give us some guidance on how to do this? We’ve spent days on this issue already. Our backend is in ruby.

We can’t keep doing things the way we are because it (1) takes a very long time, many minutes, for the annotations to load individually (35 MB of xfdf strings in the case I mentioned), and (2) we also have issues trying to merge and download afterwards using the /merge endpoint. It fails due to the request being too large.

Do you provide any backend libraries that can be used to do what exportAnnotations() does?

Hi there,

I believe what the documentation means is you could have a database setup to have each row be an annotation XFDF to be imported.
I.e.
You have a table of all documents. Each row will have docID and this will link to a table of annotations. Each row in the table of annotations will have individual XFDF.

When you need to import all these annotations into the document, you are simply querying all the rows of the annotations based on the docID. Once you have the XFDF, you will need to make adjustments as necessary to merge the XFDF together and then import it via annotationManager.importAnnotations API.

Unfortunately we do not have a library to make those merges. However, this would be a pretty simple task because the XFDF needs to simply follow this format:

You can query individual XFDFs (rows), merge the strings all under the tag and then call importAnnotations.

There should be a lot of JSON manipulation libraries available, here’s a forum post when searching for the topic:

and on in Ruby:

Best regards,
Kevin Kim

Hi Kevin,

Yes that’s exactly what we’re doing right now, as far as using separate rows for each annotation XFDF string. The problem is that this method saves a row for each modification and deletion as well. Every little action users take. Over time the total size of the combined XFDF strings of these individual rows gets enormous. For one user document of ours it is 28MB. But after all the annotations get imported, the single XFDF string for all annotations that exportAnnotations() returns is WAY smaller. Around just 100 KB. So things load much faster and I can actually merge the annotations with the PDF using the /merge endpoint when using the string returned by exportAnnotations(). Surely I’m not the only one who has faced this problem? Am I missing something? How can I use individual rows like this for real-time collab without completely killing performance? I can’t just append all the individual XFDF string into one because they will still be 28MB in size. What exportAnnotations() does is just list the existing state of the annotations without the history of months of edits, etc. Do you not know of any ways of combining the individual XFDF strings into one like exportAnnotations() does, which removes all the redundant unnecessary data? So the initial annotations load can be fast.

Thank you for your response,

I understand your issue but unfortunately we don’t have an API that can simply condense lots of annotationCommands into one XFDF.

There are potentially 2 methods you could try:

  • If annotation history is not important for you, you can disregard the import/export annotationCommand and go completely with import/export Annotations. This way, you are always keeping track of one state of XFDF for the entire document. On the backend you will only need to keep one row for the XFDF that contains all the annotation for the document.
  • Once a collaboration session has been concluded, you could create a copy of the PDF without annotations, apply all the import/export annotation commands that the users made, then export the final result via exportAnnotations and save that copy as well into your database. Then when the document is loaded again, you import the XFDF that contains all the annotations.

Best regards,
Kevin Kim

I’ll do that then and will just make sure my own real time collab code makes sure all users are always in sync on the front end when the main xfdf string is saved. Thanks for the help.