Archive for the 'Downtime' Category

Issues today

May 15, 2009

We’re having some load issues today, most likely due to intensive export operations. Things may be up and down as we try to keep things running. If one slice goes down then all the load goes to the other slice and then it becomes unresponsive. It’s not been a fun day. I think as soon as we get both slices up in a stable way, we should be good.

If you’re having trouble exporting, don’t keep trying to export. Just email support@devjavu.com and I’ll do it for you this weekend.

-Jeff

More upstream problems today

April 1, 2009

Engine Yard suffered an outage today from their datacenter. Here’s what they had to say:

We have been in constant contact with our Herakles data center throughout this service interruption. Unfortunately, root cause determination at Herakles for the ISP outage is unknown at this time. Herakles started troubleshooting at the core switching, then to the routers coming into the facility, and are now focused on the distribution switches inside the data center. At this time, Herakles data does not have an ETA for resolution of this problem. We will continue to keep you updated as we hear information from the Herakles data center operations team.

Unfortunately, after the issue was resolved, DevjaVu was still down. Since then I’ve been in contact with them and they’re on it. There’s no ETA, but my guess is within the next hour. Although I’m sure they have a number of follow up issues with their clients after this massive outage.

Anyway, sorry about this outage. If only DevjaVu had an offline mode, no?

Update @ 4/1 10am: No April Fools here. Downtime was extended overnight due to a billing glitch. It’s been resolved and Engine Yard is currently restoring our slices, which should be up momentarily.

Update @ 4/1 1pm: We’re back! After much unnecessary delay from problem after problem after problem. *sigh* I feel like I could have done better. Feel free to yell at me next time you see me.

-Jeff

Engine Yard slices going nuts, back up soon

March 24, 2009

There seems to be a problem with the network storage on our slices and now they’re going a bit crazy. I’ve pinged Engine Yard and they’re usually quick to help fix the problem. It is something that’s happened before. Sorry if it’s interupting any work!

-Jeff

Authorization bug again

January 29, 2009

Just to let you know, we’re experiencing some downtime with Subversion due to a bug with our authentication mechanism. It might be a while to fix, but we’re working on it and we’ll keep you posted here.

Update: Things should be back to normal.

-Jeff

High load brought us down, resolving now

October 15, 2008

Apparently our slices at Engine Yard died because of excessive load and we’re working with them to bring them back up. Sorry for this bit of unexpected downtime in the middle of the day. They should be back up within the next 30 minutes.

-Jeff

Recovered from Subversion outage

October 9, 2008

Early this morning the Subversion authorization configuration for many of our projects became corrupted, preventing access to those repositories. Unfortunately, I am a major bottleneck in response to DevjaVu service emergencies. This happened to coincide with me moving and not having Internet access, and this morning having my phone die without being able to charge it. I didn’t know there was a problem until very late into the emergency and I wasn’t able to get online to do anything about it until even later.

Obviously this is a terrible excuse for a full day of service interruption. I do the best I can with the resources I have, but I know this is unacceptable. Particularly the lack of communication while handling it. That’s an easy one to fix for next time. I’ll also be looking at hiring a part-time emergency technician and putting together an SLA for premium users.

Not to mention the cause of this outage should not be possible when we migrate to our new system that’s been in development for some time. I want to share more about that soon, but for now I’d just like to appologize for this outage and the lack of communication regarding it.

-Jeff

More growing pains

February 20, 2008

We got a couple reports that Subversion service was becoming slow or unresponsive today during peak usage hours. This makes it a bit difficult to debug, but it would help if anybody could report recent SVN downtime or stalls including when it happened to support@devjavu.com

Thanks. : )

-Jeff

Uptime caused minor downtime

February 7, 2008

About an hour ago you may have experienced what seemed like sporadic downtime as one of our slices went down. We were out of the office, but luckily our Twitter-based SMS alerts let us know. I called one of our users to do some pre-computer-access diagnostics during the drive back. His tests seemed to work fine, which meant our alert script could be broken or something else happened we hadn’t considered when building the automatic alert tests: one of the slices was down.

Sure enough, one of the slices had stopped working because of a memory leak in logging. It was a slow leak and it wouldn’t have happened if we hadn’t had such consistent uptime recently. Luckily, we’re refactoring out the component that had the memory leak, so we don’t expect to see this happen again for a bit.

-Jeff

We’re working on it

December 17, 2007

Yes, Trac is down. It happened after a deploy that worked fine on staging. Rollback failed, so we can only rush to fix it. Unfortunately, the issue has us completely by surprise. We may have to disable the SpamFilter plugin…

-Jeff

Resolved upstream connectivity issues

November 12, 2007

Our excellent host Engine Yard has informed us that there has been some routing issues upstream from our data center that may have caused some connectivity issues. If you experienced any downtime, it should be resolved now or will be momentarily as Engine Yard continues to make sure it’s completely resolved.

We weren’t personally affected, but our remote alert server was, so we’ve been getting a nice stream of text messages claiming that the site was down even though it worked for us. If you had trouble accessing your project, we’re sorry for the temporary inconvenience.

-Jeff