Archive for the 'Infrastructure' Category

More growing pains

February 20, 2008

We got a couple reports that Subversion service was becoming slow or unresponsive today during peak usage hours. This makes it a bit difficult to debug, but it would help if anybody could report recent SVN downtime or stalls including when it happened to support@devjavu.com

Thanks. : )

-Jeff

Uptime caused minor downtime

February 7, 2008

About an hour ago you may have experienced what seemed like sporadic downtime as one of our slices went down. We were out of the office, but luckily our Twitter-based SMS alerts let us know. I called one of our users to do some pre-computer-access diagnostics during the drive back. His tests seemed to work fine, which meant our alert script could be broken or something else happened we hadn’t considered when building the automatic alert tests: one of the slices was down.

Sure enough, one of the slices had stopped working because of a memory leak in logging. It was a slow leak and it wouldn’t have happened if we hadn’t had such consistent uptime recently. Luckily, we’re refactoring out the component that had the memory leak, so we don’t expect to see this happen again for a bit.

-Jeff

Resolved upstream connectivity issues

November 12, 2007

Our excellent host Engine Yard has informed us that there has been some routing issues upstream from our data center that may have caused some connectivity issues. If you experienced any downtime, it should be resolved now or will be momentarily as Engine Yard continues to make sure it’s completely resolved.

We weren’t personally affected, but our remote alert server was, so we’ve been getting a nice stream of text messages claiming that the site was down even though it worked for us. If you had trouble accessing your project, we’re sorry for the temporary inconvenience.

-Jeff

WSGI makes us faster, simpler, nicer!

September 20, 2007

We’ve been working for a long time on getting certain infrastructural things fixed. For example, did you know up until now we’ve been running on CGI? As we posted before, the original plan was to get Trac running on a cluster of tracd’s, much like the cluster of Mongrel’s we use for the Rails part of DevjaVu.

Just think about that. At our simplest setup, we’d have 2 Mongrel processes and 2 tracd processes. Plus we have to run Apache for Subversion given our setup. But wait, then we have this load balanced across several machines! With every machine we add, we’d get another set of these processes to worry about.

I was okay with Mongrels, but really, I was a bit upset coming from PHP about how much more I had to worry about to make sure the application was running. With mod_php you can just assume that if Apache is up, the app is up. With Rails and any FCGI or Mongrel-like setup, there were now two layers, one of which is usually a cluster of processes that could die or leak memory.

So recently we started writing a new application to replace our Rails part of DevjaVu. We’re writing it in Pylons, which is really nice and Rails-like, while still very Pythonic. This led us to mod_wsgi, which boasted faster performance than mod_python, and was even easier to configure. We tried our new app on our dev server with mod_wsgi and it was very nice. Then I realized Trac had WSGI support, so I tried Trac on mod_wsgi. Perfect!

After a few changes, DevjaVu production is now running Trac on mod_wsgi. Pages are loading almost twice as fast! Our setup is simpler, too. Once we replace our Rails app with the Pylons app, we can run everything with Apache. We’re stuck with Apache for a while anyway, so why not roll everything in? Faster. Simpler.

And we can finally have nice URLs for real! Let me explain:

Before, we had to have URLs like this:
http://myproject.devjavu.com/projects/myproject/wiki

That was redundant, long and ugly. We were able to rewrite our ideal shorter URLs to the longer version, so you could use URLs like this: http://myproject.devjavu.com/wiki

However, that was just a cosmetic rewrite and Trac would always generate the long URLs.

Now, Trac generates the short URLs all the time and the short URLs are the proper, non-rewritten way Apache serves the request. We have the long versions that you probably have posted around the web do a permanent redirect to the short ones. This way you can still use them, but you should update your links at some point since they are now deprecated.

Anyway, that was fun. Hope you enjoy the changes. We have a lot more coming up. Let us know if you run into problems.

-Jeff

Engine Yard Maintenance Notification

September 20, 2007

Engine Yard, our host, will be taking a 2 – 4 hour maintenance this Sunday September 23rd at 12:05 AM Pacific (GMT -8). Therefore, DevjaVu may be down for a bit Sunday night. They will be installing new load balancers on our cluster. They’re calling a 2 – 4 hour maintenance but expect only about 15 minutes of downtime while the network is re-configured to allow for the new machines.

-Jeff

Fun with tracd clusters

June 2, 2007

Actually it wasn’t all that fun. Thanks for putting up with the random Trac outages. We’ve reverted back to CGI based Trac to keep things stable again until we can make sure tracd clusters work without dying all the time. We would have reverted back sooner, but we needed enough information in the production environment to debug back in development.

To explain this in full technical detail, we have Adrian Perez, an up and coming writer of children stories:

Once upon a time, I had some fish. They used Trac, but they all died. Then there was a picture of some fish, eating insects and larva. But they died to. It was a fanciful time for all, said the Fox. I knew that would happen said the Dog. I couldn’t tell however, as I was precipitously on the shitter. Sorry kids. It was only a mistake. Before that though there was obviously an outage to say the least. I popped it. Brace with the hips when using Subversion. I could tell you about the events of this tremulous time, but if I were to, Ninjas, as always, would attack. Most of the server outages are actually Ninja based or related. In the time of Pirates, I tried to count them all, but they were legion.

Thanks for being such awesome beta users!

-Jeff

Zippiness upgrade tonight

May 29, 2007

Tonight at midnight PDT there will be a bit of downtime as we switch to our new tracd cluster setup. This will make Trac super zippy and a bit more friendly on our servers. We were hoping to return Trac SSL with this update, but we weren’t able to get it properly working in time and we need to push this upgrade.

We’ll also let a few more projects in after this, so if you asked for an invite, keep an eye on your inbox.

We’re a few projects short of 1024 beta projects. When we reach this power of two we’re going to have ourselves a little hacker celebration. Just thought I’d share that. :P

-Jeff

Recent downtime, holding off on new projects

May 10, 2007

Since the original prototype was launched about a year ago, there hasn’t been a down month in traffic. Every month we’ve gotten more and more beta users and the growth just seems to come out of the sky. Obviously, you guys have been talking about DevjaVu because we don’t do any real marketing. We really appreciate that, so we’ve been inviting just about everybody that comes knocking.

However, just at the point where we’re trying to get out our premium plans, we’re starting to hit the limits of what we can handle without doing more refactoring. The recent problems we’ve been having have showed us this, and we’re sorry if we’ve affected your development. Our goal is to stay out of your way and help you focus on getting things done, so when this happens it’s counter-productive for both of us.

The problem we had this morning was triggered by our new backup scheme, which malfunctioned and used up more space than it should. This led our disk to get filled up and write operations to both our user and permission stores failed and got corrupted. We had to restore our user database and regenerate our SVN authorization file. This caught an old project with a legacy bug and killed SVN authorization, hence the Forbidden error.

We’ve since recovered and more than doubled our disk space thanks to the awesome guys at Engine Yard. We now have to go in and refactor some of these pieces so that this kind of chain reaction can’t happen again. We’re also hoping to get a beta of our Basic premium plan released to our current beta users by the end of the month.

It’s definitely a tough balance to move forward enough to be able to generate revenue, while also refactoring to maintain system integrity. We really appreciate you all holding out while we slowly become the most awesome developer tool in the world.

So for now we’re going to hold back on letting new projects in and focus on making the DevjaVu innards more ninja-like. Then we’ll see if we can get some of you guys to try our first premium plan. It’ll be hard to turn away beta users though, especially with pleas like this one:

my boss wants me to work with some pirates on a new project, but I don’t trust their swashbuckling ways, and I know I’ll have to roll back a lot of their changes. I h-a-t-e pirates, and I reckon the best thing to do is get ninja-hosted SVN–that’ll teach ’em!

-Jeff

Better SSL coming

May 2, 2007

Since we released SSL access, we’ve been running into certain limitations with the way we configured it. Some of you may have noticed, for example, you were unable to use svn move over secure SVN. Our new setup should fix this, however for the time being, SSL will be disabled for Trac until we can redesign our whole configuration for the new SSL configuration.

-Jeff

Scheduled downtime on April 28

April 25, 2007

The Engine Yard folks are scheduling a maintenance window on the night of Saturday, April 28 at 11pm PST for 5 hours. Part of this maintenance procedure is patching a bug in the Xen virtualization software their clusters are built on. The bug would essentially cause random 10-15 minute reboots, so this should improve our overall uptime.

More information in plenty of detail at the Engine Yard blog.

-Jeff