Want to read S3 bucket PDF object using bucket access key and secret key

shahkartikk · June 24, 2020, 2:08am

Hi Team,

I am fetching PDF file from the S3 bucket using PDF.js Express Web viewer and saving annotations, however without S3 access key and secret key.
Now I required to READ and SAVE PDF object using AWS S3 bucket access key and secret key

So how can I pass S3 bucket access key and secret key, while reading and saving the PDF object.

Please advice

Warm Regards,
Kartik Shah

Logan · June 24, 2020, 2:39pm

Hi Kartik,

PDF.js Express will not be able to read/write files from s3 that require secret access keys. It is only able to read remote files that support CORS and are publicly accessible.

To do this, you would have to download the document in your own code outside of PDF.js Express, and then pass the document into the loadDocument API.

Let me know if there are any more questions,

Thanks!
Logan

shahkartikk · June 24, 2020, 2:52pm

Hi Logan,
Thank you for the accurate revert.

Few queries:
So, with the “loadDocument” API, Can we check or pass AWS S3 credentials too before we load the document or pdf ?

Also, Could you please share “loadDocument” reference page in pdf.js express where I could see the example and integrate at my side?

Logan · June 24, 2020, 3:01pm

Hi,

The loadDocument api allows you to pass headers that will be forwarded when we try to fetch the document from the URL you provide. If the authentication happens via request headers, this option might work for you. In your case it might look something like this:

WebViewer(...).then((instance) => {
    instance.loadDocument('https://mys3.aws.s3.com/path/to/file', {
        customHeaders: {
            header1: 'value1,
        }
    })
})

Keep in mind this will still only work if the file is publicly accessible. If it is not, you have to fetch the file first (probably using the S3 SDK) and then passing the resulting blob into loadDocument

shahkartikk · June 24, 2020, 3:53pm

Thanks Logan

I am going to try with customHeaders first by passing my S3 acess_key and secret_key, Then if It won’t work I would check with loadDocument API.
Let you know in case of any other query.

shahkartikk · June 25, 2020, 5:20pm

Hi Logan,

I checked by enabling the public READ Access to my S3 bucket PDF file and also applied the required CORS policy to S3 bucket, such as below:
customHeaders: {
apiKey: [apiKey],
secretKey: [secretKey]
}

Yes, I can read the PDF file, however if I remove S3 credentials from custom headers then also I can read the file, due to same file having a public read access.
So what is the meaning to pass S3 credentials if you gives a public read access to the file?

Logan · June 25, 2020, 5:31pm

Yes, that sounds right.

As I mentioned before, PDF.js Express can only read public files. Your s3 apiKey and secretKey are not valid headers for s3, which is why it works with or without them.

The apiKey and secretKey you have are used to download files programmatically via the s3 SDK. If you wanted to keep your files private but still use them with Express, you would have to use the SDK to download the blob and then pass it to Webviewer.

This might look something like this:

  import AWS from 'aws-sdk';
  import WebViewer from '@pdftron/pdfjs-express'
  AWS.config.update({
    region: 'your region',
    credentials: {YOUR_CREDENTIALS}
  });

WebViewer(...).then(async instance => {
  const s3 = new AWS.S3();
  const body: BlobPart = await new Promise(resolve => {
    s3.getObject({
      Bucket: 'YOUR_BUCKET',
      Key:'KEY_OF_FILE',
    }, (err, resp) => {
      if (err) {
        console.log(err)
      }
      resolve(resp.Body as BlobPart)
    })
  });
  const file = new Blob([body], {type: 'application/pdf' });
 instance.loadDocument(file);
})

See the S3 SDK documentation here.

If you don’t mind keeping your files public, then the method you are currently using works great.

I hope this helps,
Logan

shahkartikk · June 25, 2020, 5:46pm

Hi Logan,

Is this a JavaScript S3 SDK? If yes then, there is no need to use PDF.js Express REST API ??

Logan · June 25, 2020, 6:00pm

Yes, Amazon provides a JS SDK to fetch files from your S3 bucket. However, this has nothing to do with the PDF.js Express REST API so I am unsure of your question.

The PDF.js Express REST API is for merging/extracting annotations from a PDF, and a few other useful tools.

shahkartikk · June 25, 2020, 6:22pm

Hi Logan,
As per your earlier post below:

Keep in mind this will still only work if the file is publicly accessible. If it is not, you have to fetch the file first (probably using the S3 SDK) and then passing the resulting blob into loadDocument

So that’s the reason, I asked like we have to use PDF.js Express REST API to the loads the document/pdf using “loadDocument”, after we fetch the S3 object using SDK.
Please Confirm

Thank you

Logan · June 26, 2020, 6:20pm

Hi,

You do not have to use the REST api to load a document. The REST API is a separate service to merge and extract annotations from a document.

You need to use the viewers loadDocument function to load a document, as shown in the code snippet I sent earlier.

I hope this clears things up!
Thanks,
Logan

shahkartikk · June 29, 2020, 11:26am

Hi Logan,

It’s more clear now, thanks.!!

Topic		Replies	Views
Read S3 bucket private pdf file by checking S3 access key and secret key using PDF.js Express Technical Support	2	1191	July 24, 2020
How to load different pdf on pdfjs express viewer on passing url through button click Technical Support	1	2404	October 7, 2020
Failure to load seemingly valid AWS S3 Signed URL Bug Reports pdfjs-express	3	385	June 22, 2022
Async Load PDF Through Stream Technical Support	5	1070	October 14, 2022
PDF.js Express access to AWS S3 object with pre-signed URL Technical Support pdfjs-express	1	594	May 6, 2022

Want to read S3 bucket PDF object using bucket access key and secret key

Related topics