Feb 27 2007Comments
But, as the intermittent 4xx and 5xx errors coming out of relevancellc.com over the last few days indicate, things didn't go so well for the main public site. Basically this boils down to a Big Problem and a small problem:
- Big Problem: Under even modest loads, the new stack blows up. Apache CPU usage skyrockets until we have to reset the box. There is some corner case in effect here, as both we and lots of other people are using this stack successfully in other deployments.
- small problem: We have a lot of cruft from website moves in the past. We have old URLs from our Typo days, and old URLs from our Wordpress days, and on and on. Because these URLs do not match anything, they all pass through Apache and hit Mephisto. The small problem magnifies the load, which triggers the Big Problem very quickly.
I know how to fix the small problem--just watch the logs and add some RewriteRules. But that wouldn't be sporting, and besides, it would just make it harder for me to trigger the Big Problem. So I spent most of yesterday exploring the Big Problem. Some people would say that my explorations were unsuccessful, but I am a glass-half-full kind of guy. I now know a lot about various deployment issues that are not causing my problem.
This morning I took a different tack, and created the RewriteRules to solve the small problem. Surprise! With the small problem solved, the Big Problem doesn't happen anymore. (And even if it does, it happens rarely enough that the monit instance I installed yesterday is an acceptable workaround, plus a warning should the problem ever worsen.)
Sometimes it makes sense to ignore the Big Problem. Since I have isolated the problem so that it no longer does harm, there is no business need to solve the problem at all.