Hi everyone!
I just want to get in touch with everyone and apologize for the server outages we had last night. The issues should be resolved now.
What happened
We use software called PM2 to run our server as a daemon. PM2 does a ton of functionality behind the scenes, and one of those functionalities is logging. What we did not know however, was that PM2 keeps a massive log file that does not get cleared out unless you explicitly tell it too.
Last night that log file hit a threshold that essentially filled up the entire hard drive on our server. At this point, our server could no longer handle requests as there was no disk space left for logging (trying to log crashed the server).
How we fixed it
We cut down the size of the log file to give the server room to breathe while we implement further fixes.
Preventing this moving forwards
In response to the recent server outages, we will be implementing the following:
-
We have installed PM2 Logrotate, which ensures this file does not grow to that size again.
-
We will be spinning up a back up server that PDF.js Express can fall back to in case the main server goes down again
-
We will be setting up better alarms and monitoring that will ensure we can resolve these kinds of issues sooner in the future. 8+ hours of downtime is not acceptable and I apologize for that. (We do currently have alarms and notifications, but they do not go to anyone’s personal email/phone so these notifications were not seen until this morning.)
Thank you everyone for your patience. I will do my best to make sure this does not happen again.
Thanks,
Logan