Uptime caused minor downtime

February 7, 2008

About an hour ago you may have experienced what seemed like sporadic downtime as one of our slices went down. We were out of the office, but luckily our Twitter-based SMS alerts let us know. I called one of our users to do some pre-computer-access diagnostics during the drive back. His tests seemed to work fine, which meant our alert script could be broken or something else happened we hadn’t considered when building the automatic alert tests: one of the slices was down.

Sure enough, one of the slices had stopped working because of a memory leak in logging. It was a slow leak and it wouldn’t have happened if we hadn’t had such consistent uptime recently. Luckily, we’re refactoring out the component that had the memory leak, so we don’t expect to see this happen again for a bit.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: