Jan 31, 2022 Authentication Server Outage

Hi everyone!

I just want to get in touch with everyone and apologize for the server outages we had last night. The issues should be resolved now.

What happened

We use software called PM2 to run our server as a daemon. PM2 does a ton of functionality behind the scenes, and one of those functionalities is logging. What we did not know however, was that PM2 keeps a massive log file that does not get cleared out unless you explicitly tell it too.

Last night that log file hit a threshold that essentially filled up the entire hard drive on our server. At this point, our server could no longer handle requests as there was no disk space left for logging (trying to log crashed the server).

How we fixed it

We cut down the size of the log file to give the server room to breathe while we implement further fixes.

Preventing this moving forwards

In response to the recent server outages, we will be implementing the following:

  • We have installed PM2 Logrotate, which ensures this file does not grow to that size again.

  • We will be spinning up a back up server that PDF.js Express can fall back to in case the main server goes down again

  • We will be setting up better alarms and monitoring that will ensure we can resolve these kinds of issues sooner in the future. 8+ hours of downtime is not acceptable and I apologize for that. (We do currently have alarms and notifications, but they do not go to anyone’s personal email/phone so these notifications were not seen until this morning.)

Thank you everyone for your patience. I will do my best to make sure this does not happen again.

Thanks,
Logan

Hello, I’m Ron, an automated tech support bot :robot:

While you wait for one of our customer support representatives to get back to you, please check out some of these documentation pages:

Guides:APIs:Forums:

Hi @Logan,

Thanks for this update and resolving the issue.

In addition to these mitigation steps, given this is the second time that this licencing system has been impacted in the last 6 months (30th August 2021 being the last time with a certificate problem). Both of these issues impacted out production environments and all of our clients for periods of 8+ hours. (We are GMT+10)

is it possible to discuss an alternative approach to the current “phone home” licensing server structure?

I.e. is there a way to setup an alternative approach for annual licensed whereby the key is hosted/cached by our own infrastructure and is updated say every 12 months? This would reduce the network traffic and reliance on pdf express infrastructure.

Thank you.

Hi there,

We do offer OEM licenses that are facilitated through our sales team. These OEM keys do not phone home at all. If this is of interest to you, send me a DM and I can get you in touch with our sales team to talk about the terms.

Besides that, we will not investigate a new approach to our current licensing system as this is a SaaS service and phoning home will always be required.

Thanks,
Logan