Hi everyone,
This post is provide transparency into what happened on July 17 & 18 when the REST API started returning 502 errors.
Description
On July 17 at ~10AM PT, the PDF.js Express REST API started returning 502 errors, essentially shutting the service down. The incident was found and reported by customers, and eventually resolved on July 18 at 8:30AM PT once the proper engineers were made aware of the outage.
Root Cause
During routine maintenance to our infrastructure, an access key that the REST API used to interact with our cloud storage was updated. The update removed some permissions that the API needed to upload files to cloud storage, which caused “access denied” errors in our internal code when trying to upload files. This caused the REST API to return 502 errors, as the error was unhandled.
Resolution
The outage was resolved by updating our API to have the required permissions to access to the cloud storage provider.
Moving forwards
This outage took a lot longer to catch and resolve than what is acceptable. Moving forwards, we will ensure we have the proper mechanisms in place to both a) prevent these kinds of issues moving forwards and b) ensure our engineers are made aware of outages as soon as they happen - we should not rely on customers to report these kinds of outages. These changes will be made this week.
Additionally, we will be putting process in place to make sure there is proper internal documentation for who to contact in case of an outage.
I apologize for the inconvenience this may have caused and I appreciate your patience with us as we harden these systems moving forwards.
Thanks,
Logan Bittner
Head of Web Applications