[]
Rails/FastCGI vs. J2EE scaling, take 2
[ tirsen ] 07:20, Friday, 15 April 2005


By popular demand, I present to you the follow-up to my extensively slaughtered last blog post. This time I'll try something new; balanced analysis! ;-)


Just a disclaimer to start off, I have extensive experience designing, developing and maintaining J2EE based systems but I have no practical in-production experience with FastCGI whatsoever (apart from my own dabbing around with various internal ThoughtWorks stuff). Yeah, yeah, call me a kid that's too young to remember and blah blah. (Yes, I'm looking at you Jason.) ;-)


Without further ado, these are the various issues that I see with both solutions to the scalability problem (jumbled together with latency and perceived speed, but hopefully you'll get my point):


Sharing data between processes


Both J2EE and FastCGI share this problem; J2EE applications that needs to scale is usually deployed to a cluster with multiple processes on multiple machines and FastCGI of course is always running in multiple processes. There are two solutions to the problem, implement session affinity so that all requests in a session always ends up at the same process, or implement a way of sharing session data using some form of intra-process communication. Doing the latter is actually so easy and well-understood nowadays (both in J2EE and FastCGI) that session affinity is rarely needed (session affinity also has a availability. The main strategies is to use shared in-memory storage (memcached in Rails land), session replication (JBoss uses JavaGroups for this), storing the session state in a database (both Rails and J2EE can do this) or on the filesystem (this is the default out-of-the-box behavior of Rails). Storing session state on the filesystem is a simple solution that actually works amazingly well, distributed filesystems with good enough performance is very easy to set up and use in all environments nowadays. As load increases it's probably worth investigating the other options.


Cost of running multiple processes on the same node


Processes are more expensive than threads because of the cost of maintaining an extra heap and other kernel resources. That said, most modern operative systems have highly optimized process management, Linux can for example easily run thousands of processes on well-speced hardware and with the new scheduler in 2.6 the limit should be even higher. I'm not completely up-to-date with this but the JVM used to fall over after a couple of hundred threads. Also, the standard solution to the problem of limiting multi-threading issues is to limit the number of objects shared between threads, thus the benefits of just having one single heap is slightly limited.


Simplicity of programming models


FastCGI has a simpler programming model as each request is processed completely separated from other requests going on at the same time. This may not be a big difference to application developers but it does makes a huge difference to the framework code in Rails. I've discussed simplicity before (it is one of my favorite subjects) and it should mean that if this simplicity is harnessed a framework based on FastCGI can be easier to maintain, stabilizes faster, could get more committers and have features added to it in a more rapid pace. Of course, simplicity is an elusive thing and if the Rails developers does too many design mistakes then they might not be able to reap the rewards. Complexity always lures around the corner, ready to attack when you least suspect it. Personally I have quite a lot of confidence in the Rails developers: The introduction of Ajax to Rails was one of the cross-roads where complexity could have started to cripple the entire framework but the implementation is such a wonderful showcase of elegance and simplicity (where as my own design quite frankly... ehm... sucked...).


Multi-threading safe objects


J2EE's multi-threaded design requires shared objects to be designed and implemented with multi-threading in mind. Again, this may or may not be a problem for application developers; in Struts for example the action instances are shared by multiple threads where as in WebWork each request gets its own instance. It does make a big difference to the framework though. A lot of the objects that are shared by multiple threads are in the framework and this is tricky indeed to get exactly right. Just an example of this: We did a minor upgrade of Jetty on a system I maintained some years ago. Everything seemed to work fine but blew up completely on some machines in our server farm. Eventually we tracked it to that the version of Jetty we upgraded to didn't function properly on a multi-processor machine. Granted, our QA process was lacking as we didn't do regression testing on the same hardware as we had in our server farm but this was a small shop that just couldn't afford this.


Cost of intra-process communication between webserver and FastCGI process


This extra cost is limited if the webserver and FastCGI process are running on the same machine, most operative systems has highly optimized IPC solutions. If they are deployed to two physical node, which is the case in a FastCGI cluster, then there is certainly an extra cost which can give high latency. Most J2EE applications (particularly enterprise ones) are deployed in a double firewall architecture where the J2EE application server and the webserver resides on two different machines. In this case the J2EE solution also suffers from this problem.


One database connection allocated to each FastCGI process


For each FastCGI process you need to allocate one connection and this connection is open for the entire lifetime of the process. When using connection pools in a multi-threaded process the connection is only allocated during actual transactions with the database. This could potentially limit how far you can scale a FastCGI architecture although how high this limit is is very hard to say as it depends on the type of load your application have, what type of database and so forth. (Remember that many databases can also be clustered to handle a large amount of connections.) Also, it is best practice when using for example Hibernate to allocate a connection early in a requests lifecycle and keep it to the end, so this problem could show up in a Java application too. Also to note is that one of Rails caching strategies is to generate static HTML to the filesystem and then let the webserver deliver it, this means that a FastCGI process isn't used for that request at all. How applicable this caching strategy is depends very much on your application. On some applications (blogging engines and content management systems for example) this can be used extensively where as on other applications (internet banks for example) it cannot be used at all.


Summary


Do I dare a summary or will I get shredded to pieces in blogsphere? Well, I suppose I'll get abuse anyway so I'll just give it a go: Both FastCGI and J2EE have both been proven to scale in production through many years and are both very well understood. They should both be considered low risk. Where FastCGI has a simpler programming model J2EE seems to possibly have lower latency and a higher theoretical limit to what dimensions it will scale to (although where this limit is is hard to tell).

Both FastCGI and J2EE are good solutions to the scalability problem.


Comments

As I sort of recognize the app you maintained a couple of years ago :), I just want to refresh your memory and make a small correction. We had been using the same version of Jetty for quite a while and the problem surfaced when an customer upgraded their old systems for the new version, and they happened to be the first ones running on multiple processors. So really the problem was not about regression testing on the same server hardware as our own server farm, but the same hardware as our customer.

Which makes for a good question not exactly related to your post - how many hardware platforms should you run the acceptance tests on? Today we have six test machines in total - one running unit tests in Cruise Control and five ones attempting to cover the most common permutations of operating system - app server - database. As the app server of customer choice in our case is quite static we get away with only five, but we still spend in my view too much time keeping them running (WebSphere is not the optimal choice for automatic testing). And that is not even addressing the multi processor issue.

I know - another topic but if you have any thoughts I'd love to hear them.

--Marcus Ahnve, April 21, 2005 07:45 PM

Thanks for reminding me about the details Marcus!

Two thoughts come to mind about your particular situation:
1) Automate, automate, automate. I know you guys are already doing this and I know this is hard when dealing with WebSphere. I just felt I needed to reiterate the importance.
2) Do a cost-benefit analysis to see whether you're getting sufficient benefit from the investments your putting into your testing platforms. Are two of them so similar that problems in one always also show up in the other? Can you get rid of that one?

You might consider contacting Mike Cannon-Brookes, he also has a machine park where he is testing a big variety of appserver/database/operating system combinations. I know Charles Miller wrote some clever and nice Ruby scripts for all of it. (Started out as a mix of Perl/Python/Bash/Ant/Maven which proved to be completely unmaintainable, later refactored to Ruby.)

--Jon Tirsen, April 22, 2005 02:48 AM