As some of you might know, my work place is no stranger to odd server problems, such as instances just wanting to return 503 errors. Today I have a new one: one of our applications refuses to run at full speed if its in anything other than the default instance.
I think it might be best to spell out the hardware and software configuration of the servers I'm talking about, so its not lost in the text. Its important to note that all of these tests are talking to the same database and database server. All server names have been changed to protect the innocent.
Komodo PairLoad balanced identical pair of servers, Komodo1 and Komodo2. Hardware: Quad core 3 ghz (4 cores), 6gb of ram OS & CF: Microsoft Windows Server 2003 R2 (32-bit), CF 8 32-bit Instances: CMS was running in its own instance pair, Komodo1i1 and Komodo2i1, and later in cfusion.
Jaguar PairLoad balanced identical pair of servers, Jaguar1 and Jaguar2. Hardware: Dual Quad core 1.6 ghz (8 cores), 8gb of ram OS & CF: Microsoft Windows Server 2003 R2 (32-bit), CF 8 32-bit Instances: CMS was running in its own instance pair, Jaguar1i1 and Jaguar2i1.
JaybirdHardware: Dual core 1.8ghz (2 cores), 2gb of ram OS & CF: Server 2003 (32-bit), ColdFusion 8 32-bit Instances: CMS was running in the main instance, cfusion.
AdderOur test machine. Hardware: Dual core 3ghz (2 cores), 4gb of ram OS & CF: Server 2003 64-bit, ColdFusion 8 64-bit Instances: CMS was running in the main instance, cfusion.
The ProblemWe have a CMS that powers some of our client websites, and several months ago we moved it to a load balanced pare of servers (Jaguar Pair) looking to see some better performance since the site was kind of query heavy (older coding and lots of recursive querying) and it had been housed with a number of other sites on a shared server (Jaybird). After moving it we didn't really see a performance boost, but we knew it had to be faster, I mean, its a better server, how could it not be faster?
Fast forward to now, and after some user complaints, we're observing serious speed differences between our CMS on the Jaguar Pair and a development copy on Adder, our test machine. I'm not talking about a little sluggishness, but a difference of 6-8 times longer to return requests on the Jaguar Pair vs Adder. Specifically, we were testing a page that reliably returned in less then a second on Adder, but took around 8 seconds on the Jaguar Pair.
So, the first thing we tried to do was see see if the network traffic was the same, and our server team tracked packet transfers back and forth between the servers to make sure that everything was going to the same location, and along the same paths. That part checked out, but it still showed that it simply took 8 times as long to get back a request for one server.
Next up was to make a copy of the CMS back onto Jaybird and do some testing, so that we could look at a single instance 32-bit OS. Testing showed that Jaybird did not experience slowness either, so the 320-bit OS was not to blame.
After that we wanted to see if it was something to do with server configurations, so we made a new instance pair on our other load balanced server pair, Komodo, and got everything setup only to see the same performance problems on the Komodo Pair as we saw on the Jaguar Pair, so this behavior was not limited to a bad configuration the Jaguar Pair or something like that.
Failing that test, we decided to see if there was some sort of overhead being incurred by CF instance clustering, so we isolated Komodo1i1 and removed it from the CF cluster. Tests showed that Komodo1i1 still lagged at 8 seconds to return results, even though it was running as a single instance.
Lastly, we removed out test site from an instance entirely, and let it run in the main instance, cfusion, to find that suddenly the response times dropped back to normal.
The only conclusion that we can reach is that there is some sort of overhead that the instances are running into that the main cfusion instance is not experiencing. In all of these tests the only way we found to bring the response times back in line was to remove the program from an instance and let it run in the default instance, such as on Jaybird, Adder, and later Komodo. Any time we have it in an instance, such as on the Komodo Pair or the Jaguar pair, the system slows to a crawl.
Running our CMS under the main instance obviously isn't a desirable solution, since we want to be able to have multiple instances on these load balanced servers. We do have other applications running on other instances on the Komodo Pair and the Jaguar Pair, so we can't really just run everything in the main instance, and we haven't been noticing slow down with those applications. Its possible that we are experiencing a slow down with those applications as well, but that its not as noticeable since they are not as query heavy as the CMS.
Does any one have an idea what might be causing this slow down?