Want to read S3 bucket PDF object using bucket access key and secret key

Hi Team,

I am fetching PDF file from the S3 bucket using PDF.js Express Web viewer and saving annotations, however without S3 access key and secret key.
Now I required to READ and SAVE PDF object using AWS S3 bucket access key and secret key

So how can I pass S3 bucket access key and secret key, while reading and saving the PDF object.

Please advice

Warm Regards,
Kartik Shah

Hi Kartik,

PDF.js Express will not be able to read/write files from s3 that require secret access keys. It is only able to read remote files that support CORS and are publicly accessible.

To do this, you would have to download the document in your own code outside of PDF.js Express, and then pass the document into the loadDocument API.

Let me know if there are any more questions,

Thanks!
Logan

Hi Logan,
Thank you for the accurate revert.

Few queries:
So, with the “loadDocument” API, Can we check or pass AWS S3 credentials too before we load the document or pdf ?

Also, Could you please share “loadDocument” reference page in pdf.js express where I could see the example and integrate at my side?

Hi,

The loadDocument api allows you to pass headers that will be forwarded when we try to fetch the document from the URL you provide. If the authentication happens via request headers, this option might work for you. In your case it might look something like this:

WebViewer(...).then((instance) => {
    instance.loadDocument('https://mys3.aws.s3.com/path/to/file', {
        customHeaders: {
            header1: 'value1,
        }
    })
})

Keep in mind this will still only work if the file is publicly accessible. If it is not, you have to fetch the file first (probably using the S3 SDK) and then passing the resulting blob into loadDocument

Thanks Logan

I am going to try with customHeaders first by passing my S3 acess_key and secret_key, Then if It won’t work I would check with loadDocument API.
Let you know in case of any other query.

Hi Logan,

I checked by enabling the public READ Access to my S3 bucket PDF file and also applied the required CORS policy to S3 bucket, such as below:
customHeaders: {
apiKey: [apiKey],
secretKey: [secretKey]
}

Yes, I can read the PDF file, however if I remove S3 credentials from custom headers then also I can read the file, due to same file having a public read access.
So what is the meaning to pass S3 credentials if you gives a public read access to the file?

Yes, that sounds right.

As I mentioned before, PDF.js Express can only read public files. Your s3 apiKey and secretKey are not valid headers for s3, which is why it works with or without them.

The apiKey and secretKey you have are used to download files programmatically via the s3 SDK. If you wanted to keep your files private but still use them with Express, you would have to use the SDK to download the blob and then pass it to Webviewer.

This might look something like this:

  import AWS from 'aws-sdk';
  import WebViewer from '@pdftron/pdfjs-express'
  AWS.config.update({
    region: 'your region',
    credentials: {YOUR_CREDENTIALS}
  });

WebViewer(...).then(async instance => {
  const s3 = new AWS.S3();
  const body: BlobPart = await new Promise(resolve => {
    s3.getObject({
      Bucket: 'YOUR_BUCKET',
      Key:'KEY_OF_FILE',
    }, (err, resp) => {
      if (err) {
        console.log(err)
      }
      resolve(resp.Body as BlobPart)
    })
  });
  const file = new Blob([body], {type: 'application/pdf' });
 instance.loadDocument(file);
})

See the S3 SDK documentation here.

If you don’t mind keeping your files public, then the method you are currently using works great.

I hope this helps,
Logan

Hi Logan,

Is this a JavaScript S3 SDK? If yes then, there is no need to use PDF.js Express REST API ??

Yes, Amazon provides a JS SDK to fetch files from your S3 bucket. However, this has nothing to do with the PDF.js Express REST API so I am unsure of your question.

The PDF.js Express REST API is for merging/extracting annotations from a PDF, and a few other useful tools.

Hi Logan,
As per your earlier post below:

Keep in mind this will still only work if the file is publicly accessible. If it is not, you have to fetch the file first (probably using the S3 SDK) and then passing the resulting blob into loadDocument

So that’s the reason, I asked like we have to use PDF.js Express REST API to the loads the document/pdf using “loadDocument”, after we fetch the S3 object using SDK.
Please Confirm

Thank you

Hi,

You do not have to use the REST api to load a document. The REST API is a separate service to merge and extract annotations from a document.

You need to use the viewers loadDocument function to load a document, as shown in the code snippet I sent earlier.

I hope this clears things up!
Thanks,
Logan

Hi Logan,

It’s more clear now, thanks.!!