[]
Ruby on Rails and FastCGI: Scaling using processes instead of threads
[ tirsen ] 00:30, Tuesday, 12 April 2005

There's a lot of buzz around the Ruby on Rails framework at the moment. Something that doesn't get a lot of attention is it's scalability solution: FastCGI. Partly because it doesn't seem very exciting on first glance and partly because it's so fundamentally different to how "enterprise systems" have traditionally scaled. I think Java developers in particular should invest some time in understanding FastCGI.

A Java virtual machine is extremely expensive to start. When started it occupies huge amounts of memory and system resources. This property of the Java platform has led us down the path of scaling using one single virtual machine per physical machine.

One way of doing this is to use some form of non-blocking event-driven architecture and have a fixed small number of threads (typically one per CPU). Each thread handles a number of requests at the same time, as one request waits for IO it moves on and processes another request. This is a complicated (but interesting) way of building systems and not very well suited for non-senior developers. Most enterprise software projects have a large portion of junior or mid-level developers so this is not really a practical way of scaling the Java architecture for enterprise systems.

The other way of scaling this is with a larger number of threads, each thread processing only one request at a time. As one request waits on some form of IO operation the thread blocks and dispatches to another thread. This approach is much easier to develop in and it has been shown to if not scale as well as the non-blocking IO solution it can still scale pretty well. (Take a look at for example the mixed-threading model of JRockit, where a lot of IO optimizations are done on the virtual machine level.)

Another interesting issue is how expensive resources are handled (typically database connections). In a one-VM solution they are pre-allocated in pools so that the cost of allocating one doesn't get incurred on each request. As a request is completed it's very important to return the connection to the pool so it is available to other requests. Another complication is that the heap is shared by multiple threads and all shared objects needs to be built for multi-threading safety. Designing and implementing multi-threading safe objects that doesn't deadlock and have a high throughput is extremely hard. Because of this complexity the traditional solution has been to simply dodge the issue and minimize the amount of objects that are used by multiple threads at the same time. Those objects that are used by multiple threads have little or no state. This effectively solves the problem, but minimizes the benefits of having a shared heap. Because many systems are clustered on multiple physical nodes the shared heap can not even be used as a means of inter-process communication.

These are all interesting characteristics of the typical enterprise software architecture. We've slowly learnt to deal with them and have built up a toolbox of quite effective tools around them (J2EE being one of them).

Ruby on Rails and FastCGI scales in a completely different way.

FastCGI is an extension to the old and ultra-simple CGI architecture which to put it bluntly doesn't scale one single bit. For each request a CGI implementation will fork off a new process containing the application code with a bunch of environment variables telling the application what to do, it will capture the output of the process and send it back to the web client. Starting a new process is usually a very expensive operation (depending on operative system and what type of process and so on) and no resources can be kept alive between requests so database connections and so on will all have to be reallocated on each request.

In contrast, a FastCGI implementation will on startup pre-fork off a number of CGI processes. Each process will listen to standard input (or any other IPC solution such as named pipes, domain sockets or even network sockets in case of a cluster). As a request comes in an available process is chosen and the content of the request is sent as name-value pairs to the process. The FastCGI implementation captures the output and sends it back to the client. The process is then returned back to the pool of available processes.

This means that each process can pre-allocate one single database connection (for each database that it talks to). There are no issues of multi-threading as each process processes only one request at a time. No objects needs to be written to handle multi-threading, as there is just one single thread per process. Expensive resources doesn't need to be allocated in pools and application code doesn't need to return the resources once done with them. Complicated non-blocking IO solutions or muxer/demuxer architectures doesn't need to be used. You can even allocate FastCGI processes on multiple physical nodes, effectively implementing a cluster. In high-security situations a double-firewall security architecture can be set up so that the web-server is protected by one and the back-end FastCGI servers are protected by an additional one.

There seems to be some indications that FastCGI scales at least as well as the typical application server architecture. If this is the case then it's great news. Dealing with the complexities of a multi-thread/one-process system are very expensive. In practice the real performance and scaleability from FastCGI applications might be even higher as the much easier development model decreases the risk for programmer errors.

Java 5 seems to have done some clever optimizations on some platforms for starting up additional virtual machines (you pay up front for the first one, the rest is very cheap). Maybe it's time we try this stuff out in Java too?

For more information:

  • http://www.fastcgi.com/
  • http://www.fastcgi.com/devkit/doc/fcgi-perf.htm
  • http://cryp.to/publications/fastcgi/
  • ...and much more.

    Update 1: I may come off as bashing the multi-threading solution here, which really wasn't my intent. I am intrigued by the simplicity of scaling with processes and am curious as to whether this will actually work.

    Update 2: Cameron Purdy points out a very valid flaw with the FastCGI architecture; if you for example create one hundred FastCGI processes per server in a 20 blade cluster then each separate process needs to allocate one database connection. This would amount to 2000 database connections which occupies a lot of memory (both in the database server and on the client) and bluntly put just doesn't scale. ;-) Does anybody that have real experience using FastCGI know how to solve this problem? Is there a solution or does FastCGI simply not scale to these dimensions? (Please ignore the unfriendly tone Cameron uses in his post, which surprises me as it's quite unlike Cameron "Peace" Purdy. It's really sad that Hani is making such an impact on the Java blogging community.)

    TrackBack
  • Comments

    How is being able to run FastCGI serving processes on multiple machines a cluster any more than being able to run mod_jk workers on multiple machines? It's an apples and oranges comparison to the earlier cluster you raised, which is that one's IPC may also need to be inter-machine. That's just as true with FastCGI.

    --Robert Sanders, April 12, 2005 05:12 AM

    Okay, I don't know much about mod_jk but if it only processes one request per worker process at a time (that is, no threading) then it uses the same scalability model as FastCGI. It's not really FastCGI itself that I'm interested in here, it's the principle of scaling with processes instead of threads.

    On the other hand, if mod_jk workers does process multiple requests at the same time then it is exposed to the same issues as the Java appserver solution. They need to be designed for multi-threading, implement resource pools, be very careful about resource lifecycle management and so forth.

    Yes, in some sense I'm comparing apples and oranges, the two solutions are completely different. But they are solutions to the same problem. And if the reports on scaleability for the one-process-per-request model are correct, then it might have some big repercussions into how we build server-side applications.

    --Jon Tirsen, April 12, 2005 08:58 AM

    But it seems if you have separate processes instead of threads then having session-scoped data becomes complex? I am not familiar with Ruby's approach but the FastCGI whitepaper (http://www.fastcgi.com/devkit/doc/fastcgi-whitepaper/fastcgi.htm) talks about session affinity where you can bind requests with a specific parameter to a specific process. In that case you must have quite complex load-balancing logic in the server to distribute the sessions equally and even then you cannot guarantee proper balancing between the processes because if two users bound to the same process click at the same time you will have one user waiting for the other while other processes are sitting idle. Or have I missed something important?

    --Erik Jőgi, April 12, 2005 11:29 AM

    Erik, I think you are raising a good point. It is not too much of a problem when all your processes are running on the same physical machine as you can then simply share sessions via the file system. But if you use multiple machines you will usually want to share them by putting them onto a central machine -- ways of doing this are via DistributedRuby (which let's you do Ruby IPC quite trivially) or by a session backend that stores them inside your SQL database.

    I'm sure that the FastCGI process binding approach makes sense as well, but storing the session data inside the database seems like the simplest solution -- after all databases have already solved the problem of load balancing with multiple machines (via data replication and all that) in case one machine is no longer able to handle all the database load.

    Perhaps there is yet parts of the bigger picture that I don't know about -- in that case feel free to inform me about them. It certainly is an interesting problem we are talking about.

    --Florian Gross, April 12, 2005 12:33 PM

    Rails doesn't depend on FastCGI's session affinity patch. It stores session data in a configurable store (several hard-drive options, one that talks to another Ruby instance over the network, one that uses the database), and otherwise each request is completely independent.

    Very simple.

    --Ryan Platte, April 12, 2005 12:42 PM

    I would like to point out that not everything in the jvm world needs or is thread safe, many structures are not thread safe for speed reasons, nor do they need to be thread safe. The benifits of threads vs processes comes into effect when you are serving hundreds if not thousands of smaller requests ESPECIALLY if there is any shared state since in a process that will need to be reloaded and saved for each request vs a message for the threaded app.

    I would say that all you have managed to do it say that fastcgi MIGHT scale as well as JVM based app servers based on how the application is designed. As a web developer for almost a decade at this point I still seee rails as a superior environment for 95% of the web apps out there, for the other 5% jvm or .net might be appropriate. One of the reasons that I make that statement is that even though java is theoretically more powerful if the average developer cannot keep up with technology and program to the interfaces they might as well not exist. Without being constraining Rails allows quick rich development and easy access to the very advanced tools. The example I have given to frieds is the caching tools. Even given a few minutes I can describe how they work and how sweeping keeps it current. I would much rather have a system that is approcable and understandable that one that COULD BE 10% faster if I took another months to optimize the system.

    We need more technology like rails.

    --Eric, April 12, 2005 02:41 PM

    It should be noted that the FastCGI spec does indeed provide support for concurrent requests to a single process, and thus your claim that one does not need to be concerned with multi-threading issues is not entirely true unless you are running FastCGI in a non-concurrent manner.

    http://www.fastcgi.com/devkit/doc/fcgi-spec.html#S3.2

    --Pete, April 12, 2005 02:56 PM

    In my list of session stores above, I failed to mention the memory cache store, which is evidently what's used at 43Things, one of the current headliner Rails apps.

    --Ryan Platte, April 12, 2005 03:02 PM

    One difference between threads vs. processes is that threads can directly share memory. This is difficult or at least tricky for processes. Sharing memory means requests can easily share cache, which can be a huge performance factor. Performance is certainly not the same as scalability, but they are related.

    I believe that threaded apps can be made to perform and scale way better than FastCGI and/or ruby on rails, but the extra trouble far outweighs this advantage. Hardware is cheap compared to time.

    --Carl Free, April 12, 2005 03:42 PM

    See http://www.jroller.com/page/cpurdy/20050412#fastcgi_not_so_fast

    --Cameron, April 12, 2005 04:34 PM

    Since Ruby threads are in process, implemented totally by the interpreter, they exhibit a subpar performance under heavier load. Therefore FastCGI clustering is the only viable route to hande simultaneous requests with acceptable performance.

    Basically you don't have any choice in building a production environment, you HAVE to use FastCGI.

    --Zsolt Szász, April 12, 2005 04:52 PM

    Congratulations, you've discovered how server side developers deployed applications 10 and more years ago, back when threads were still new and spooky.

    In other words process-level scaling is practically ancient technology, and also very applicable in some situations. It's worth noting that you can do this just as easily in Java as you can with other technologies - unless you're severely challenged starting a JVM is not all that difficult :-)

    It's also worth noting that while Java can move to this model, moving Ruby towards a theading model is much more problematic. Java can go either way but Ruby really can't.

    --Mike Spille, April 12, 2005 06:04 PM

    Mike,

    Although process scaling may be "ancient" technology, how does that make it worse than newer technologies?

    --Joe, April 12, 2005 06:21 PM

    Jon,

    I think what Mike is saying is that although you may be too young to remember, there were REASONS people moved away from multi-process scaling to a threading model. See Apache 1.x vs. Apache 2. Yes, there's some times when multi-process is a better idea, but I think it's probably when your code is shit and you want to protect different requests from the others blowing up (I'm not saying RoR is shit, just that that's the best reason I can think of for using this concurrency model).

    --Jason Carreira, April 12, 2005 07:16 PM

    Jason, you got it spot on. As I said "...and also very applicable in some situations". The problem is that while it works for some scenarios, it's really, really sucky for other ones.

    What you want is the capability to go either way with your toolset - use processes if that works, use threads if processes are too heavy weight. From what I've seen Ruby doesn't give a very good threading option.

    Another reason for my post was to point out that there's nothing new at all in anything being discussed here - and there seemed to be an implication that this wasn't a well-known technique (it is).

    Last - I really hate it when people confuse scalability with latency and perceived speed. Jon, FYI most of what is described in this blog entry is about latency and perceived speed (aka wall clock time) _not_ about scalability. For example, plain old CGI can be shown to scale very well to a certain level if you stick a load balancer in front of many machines. The problem is that the latency from a user perspective is sucky.

    --Mike Spille, April 12, 2005 07:26 PM

    Mike, The ruby implementation of threads is currently not so good, but I thought I read where a patch is coming that will make use of native OS threads where available. I think this will greatly improve its performance.

    --David Morton, April 12, 2005 07:45 PM

    David, the problem is that Ruby has some unpaid people working on a threading model...

    Java has SUN and IBM and BEA working on threading models, garbage collection algorithms, etc...

    The advances in the JVM over the last 5 years are just amazing, and it's not going to be that easy for an opensource project to do that.

    --Jason Carreira, April 12, 2005 09:28 PM

    Mike, thanks for pointing out my confusion regarding "latency", "perceived speed" and "scaleability". These are all important but are kind of confused and jumbled together in my post. Hopefully this doesn't get in the way of getting my point across.

    Jason, Mike, you are correct in saying that Ruby and Rails cannot at this stage move to a threading model as the entire stack has been optimized for a one-process-per-request model. This has massively simplified the Rails code itself and the entire programming model of Rails.

    Also of interest is that while there has been massive advancements in the JVM the last 5 years there has also been some massive advancements on the OS level regarding processes. Given the complexity that threading introduces isn't it worthwhile to reexamine the choice we made 5 years ago on how to scale Java apps (and enterprise apps in general)?

    Jason, don't think paying people is enough to get good results. One of the best threading solutions is in the JRockit JVM which was built by a couple of Swedish guys straight out of university. They lived off bread and water (not quite literally) and had to do consulting on the side to make it all hang together financially. In the end BEA bought them and hopefully they got off pretty well, but I think you understand my point.

    --Jon Tirsen, April 13, 2005 01:28 AM

    The perception that Java performance has increase greatly over the last 5 years is true, but the technology that they did it with is almost 20 years old. Most of Hotspot comes from the research project "Self" which was a fully dynamic language based on Smalltalk. Additionally, the GC algorithms that Java uses are still not up to snuff with some of the better ones that Lisp or Smalltalk use in there VM's. There is still room for lots of improvement in both Java and Ruby. The neat thing in all of this is that we have a new simplier tool to do web apps in that will serve us well for a large portion of our user base. Additionally, tie that to the performance increases we've seen in HW and the standard amounts of RAM available to the processes, etc. and we can solve problems in a different way and considering solutions that weren't viable years ago. I remember when I got laughed at because I wanted 4 meg or RAM to run the commercial Smalltalk environment. They didn't laugh for long as I got problems solved quicker and with many times less code than would have been required in C or C++. The trade-off of giving me memory gave them solutions much quicker. In the end that is what it is about a lot of times. With our computes getting faster every 18 months and having more memory etc., we should focus on looking at different ways of solving our problems. Ok - I'm getting off my high horse now, but lets all remember, that speed allows us to simplfy our own lives by using different sometimes less efficent solutions in a trade-off for our own sanity.... :-)

    --Sam Griffith Jr., April 13, 2005 06:39 AM

    As an alternative to mod_fastcgi, look at mod_fcigd:
    http://fastcgi.coremail.cn/

    I have been using it with Apache 2, and have been very happy with it.

    --Paul Querna, April 13, 2005 09:33 AM

    For the database problem described in "update 2" we use SQLRelay ( http://sqlrelay.sourceforge.net/ ) for database pooling.

    --sx, April 13, 2005 12:51 PM

    Mike: "unless you're severely challenged starting a JVM is not all that difficult :-)"

    No, but it's very expensive memory wise. I think a lot of the reason Java has worked hard on its threading implementation is that it has to.

    True, rails can't make as much use of threads as java, but conversely you'd have to be on crack to be running java cgi scripts, so both approaches have their limitation flexibility-wise.

    Forking a (small) process isn't as expensive as it was when threads were touted as the Next Big Thing (at least on *NIX), and most webserver fcgi plugins allow pool management of the fcgi workers, so the expense of 1 db connection per fcgi is not as great as you might expect.

    And developing/debugging multiple processes in an order of magnitude easier that not shooting yourself in the foot writing multithreaded apps.

    --Dick Davies, April 13, 2005 01:11 PM

    "There are no issues of multi-threading as each process processes only one request at a time."

    There are no issues with threading in Java if you use the single thread servlet model and you don't need to operate on any shared data.

    What application doesn't share data? HTTP sessions, caching, etc. How do you even implement HTTP sessions in the FastCGI model? It has to be expensive.

    --Bob Lee, April 13, 2005 04:07 PM

    Bob: Regarding FastCGI and sessions.
    PHP uses a file or database store (think local mysql via unix socket). I've not heard of any issues there, at least with the file store.

    Taking straight FastCGI now, i.e. C or C++, one can easily code a thread pool to service requests and keep the sessions in RAM aswell. This obviously means that you won't be able to use multiple FastCGI processes, unless you use some session affinity at the web server level (to keep requests going to the same processes, thus partitioning the sessions in per-process/multithreaded sets).
    And we're not done yet. There's memcached too.

    I'm not sure how and if this is achievable in Ruby/RoR.

    All in all, I definitely enjoy the options at hand in the no-java-land. Disclaimer: I've been doing 98% Java for the past 5 years and gotten a little sick of it.

    --Radu-Adrian Popescu, April 13, 2005 07:50 PM

    Bob,

    The single thread model has been deprecated, to quote the JavaDoc:

    Note that SingleThreadModel does not solve all thread safety issues. For example, session attributes and static variables can still be accessed by multiple requests on multiple threads at the same time, even when SingleThreadModel servlets are used. It is recommended that a developer take other means to resolve those issues instead of implementing this interface, such as avoiding the usage of an instance variable or synchronizing the block of the code accessing those resources. This interface is deprecated in Servlet API version 2.4.


    But you are correct, something similar is usually how you solve the multi-threading issue in Java.

    Sharing data between processes is not a problem unique to Rails or FastCGI, Java clusters also have multiple processes which need to share session data. By default Rails will use the file system to store the session, this works out of the box and requires no time to set up. Most clusters I have worked with has consisted of blade servers with no disk connected to a SAN/NAS or other networked disk solution, so storing session data on the disk is usually not a problem. If it does become a problem you can also store it in the database or using memcached. All of these solutions are also popular in Java land (not specifically memcached of course but in general some form of cluster-wide in-memory storage) and has been proven to work well in production.

    --Jon Tirsen, April 14, 2005 01:29 AM

    In re Update 2.

    It seems to me that the amount of concurrency you buy by using the thread pools and database pools (let's call it the J2EE method vs. the FastCGI) method is limited.

    Let's imagine that I determine that the maximum number of connections my database can handle is N. And I have one system that creates N FastCGI processes and another system using the J2EE model. With all other things held equal the only extra concurrency I will get out of the J2EE model is that time spent servicing requests during which the request does not have a database connection checked out. Otherwise those threads which exceed N are blocked waiting for a connection whereas in the FastCGI server they are blocked waiting for a process.

    Most naively implemented J2EE implementations I have seen check out a database connection at the beginning of a request and return it at the end. Will there really be that much of a difference in extra concurrency?

    --Victor Lewis, April 14, 2005 06:56 PM

    I was going to make the same point as Victor. Many J2EE apps need a database connection for most of their requests, and hold on to it for all (or at least most of) the duration of the request. And I wouldn't even call that 'naive'; it's a good simple strategy that works in most situations. Hibernate calls this the 'Open Session in View' pattern.

    I'm not familiar with FastCGI, but it seems similar to how classic Tuxedo apps scaled. There's a bunch of 'server' processes, each of which handles one request at a time, and each server process has a single database connection. The Tuxedo apps I worked on only supported a relatively small number of users. But I understand this model scaled reasonably well. Lots of banks run on Tuxedo. (Tuxedo now supports multi-threading, but it didn't for a long time.)

    Having said that, I wish Ruby supported native threads. Cameron is right, it would be great if Ruby could support both thread and process pooling styles of scaling.

    --Steve Molitor, April 15, 2005 01:18 AM

    re: "Please ignore the unfriendly tone Cameron uses in his post"

    Hi Jon - you mistook the tone .. it was intended as friendly and humorous, not at all unfriendly. I find most of this arguing (Java "versus" Ruby for example) to be entertaining at worst, and I don't take any of it personally.

    Peace.

    --Cameron, May 4, 2005 03:34 PM

    No, that's not a valid flaw in fastcgi at all. Its a valid flaw with using processes instead of threads like rails does. Fastcgi apps can be multithreaded, and then work very similar to a typical java servlet setup.

    --Adam, June 6, 2005 07:52 PM

    Regarding the "a very valid flaw with the FastCGI architecture; ... This would amount to 2000 database connections which occupies a lot of memory ...) and bluntly put just doesn't scale... Does anybody that have real experience using FastCGI know how to solve this problem?"

    Yes. Use an out-of-process connection pool like "pgpool" for postgresql. This provides not only failover between database servers, but memory friendly connection pooling for exactly the problem you describe.

    --RonM, June 24, 2005 06:18 AM

    Just a small practical point: how come your web site doesn't print correctly in Firefox? Printing in both Portrait and Landscape results in lost text on the right-hand side!!

    --Frank Daley, August 26, 2005 01:41 AM

    If RoR is going to take a single process approach how will it scale now that all hardware vendors are emphasizing threads?

    Examples
    1) Sun's upcoming Niagara line with 8*4 = 32 threads
    2) AMD's dual core Opteron's
    3) Intel's dual core Xeon's

    Java will scale in this case, but how will RoR scale?

    --Amit Kulkarni, September 24, 2005 06:23 PM

    Amit,

    Don't confuse hardware threads and software threads. A hardware thread makes a processor look and behave like it's actually multiple processors. An architecture that scales using multiple single-threaded processes will scale perfectly in a multi-processor environment, you will simply have one process per processor (or more). A single process multi-threaded approach will also scale, but please note that writing multi-threaded code that works in a multi-processor scenario is even harder than in a "simulated" context-switching scenario. Not impossible just pretty damn hard!

    Cheers,
    Jon

    --Jon Tirsen, September 25, 2005 01:13 AM

    And for the german (like me)

    FastCGI behebt das von CGI bekannte Problem, dass pro Request ein Prozess benötigt wird. FastCGI-Programme terminieren nicht nach jeder Nutzung, es sind mehrere Anfragen über eine bestehende Verbindung (Session) möglich.

    --Stefan, October 10, 2005 03:18 PM

    Im curious why people seem to think Ruby thread support is sub-par. Ruby's thread implementation is described here http://www.rubycentral.com/book/tut_threads.html

    There's no metion of it "not scaling well."

    --Bert, October 11, 2005 09:36 PM

    Bert,

    They are sub-par. They're "green threads" emulated by the Ruby interpreter and not real OS threads. This means that they'll work fine on a single-processor system, but won't work on a multi-processor system. And they're probably not as performant as real OS threads either.

    Cheers,
    Jon

    --Jon Tirsen, October 11, 2005 09:52 PM

    A primary production environment factor that doesn't appear to have been addressed here is business continuity.

    Ultimately the reason virtually all development occurs is to support business activity, while various developers may like to spend their time discussing and actively seeking clever and time/space performant coding solutions often this effort only occurs to solve a business need.

    Scaleability of process loading can be up and out, while currently "scaling up" is out of fashion because big Iron hardware is expensive, so commodity hardware and clustered or Grid environments is definately in vogue and represents the accademic and business communities choice of scaling out.

    However there are non technical issues with both scaling up and scaling out and these ironically fall to software vendors charging for their products on a CPU count basis. In particular I'm thinking of Microsoft's SQL Server and Oracle's RDBMS. Both of these are well designed stable software platforms that scale up and out, though Oracle stomp all over SQL server in all factors except ease of use.

    A busy online businesses does not consider performance, it considers "throughput" because this represents the total number of customer or advertising hits successfully serviced within a reasonaby quick performance perception. I use the word perception here because out right speed of delivery is less important to the end user than the perception that they are obtaining a quick service, and this is often achieved using cache tricks or by delivering web furniture quickly or by simply capturing the audiences imagination, after all web publishing is the domain of the designer, not the developer.

    However, I nearly digressed too far then. My point is that after lots of static web content caching, minimising of HTML generation (like jsp or some other mechanism) what is left is the dynamic content only. Minimising dynamic content is essential to increase the amount of potential caching, both on the the browser but most importantly to the user community at large the application cache servers and the ISP proxy servers, in fact any service that provides a static content cache increases the perception of performance, as well as lots of lovely bandwidth reduction benefits. I'm not getting into the downsides of caching because that's another story and involves lots of interesting heuristic solutions.

    So, dynamic content is why were are here. We're all here reading and writing about the publishing of dynamic content in particular that content generated by a long running process using the FCGI API. Excepting technical and opinion based conversations about API pro's and con's what really matters is how quickly the dynamic content can be obtained, and ironically most of the content is obtained from backend databases, like Oracle or SQL server. Any query against a database, either remote or local is going to the factor of largest delay and generally its a good idea to have the application server (fcgi or otherwise) employ a mechanism that allows multiple requests to occur simultaneously, while in physical reality much of the elapsed time is spent waiting for backend data to complete, how this is achieved is independant of the implementation language, just so long as either a thread can block on a completion call, or a call-back can pop and push session data onto queues. In fact, I suspect the ideal solution is not a 1 thread per request solution, rather a series of of threads that service queues - queues for new requests, queues for dispatching requests, queues for database object resolves and queues for forwarding to the recipient. Sure this requires a little more thought than the single thread per request solution but then most developers are pretty darn shrewd and thoughtfull anyway.

    In summary:

    Nobody uses the CGI mechanism any longer so we're only talking about FastCGI or J2EE or alternatives, like Ruby On Rails.

    Optimising the database and the SQL queries is essential to performant dynamic content, not the implementation of the application server.

    The application server needs to be developed in a stable manner which is where the choice of the threading strategy arrives. Again one thread per request or a series of threads servicing queues, it's a design choice.

    Ideally a single application instance should handle lots of concurrent requests so long as performance is at least within tollerance it often boils down to RAM and CPU consumption.

    I personally care for C++ because the memory footprint is smaller, or at least more controllable than say Java(J2EE) and Standard C++ has nice string handling and funky templates and boost libraries and terrific.

    But what about business continuity? Well session affinity is no good because there is no session data sharing between instances, session data is really an accademic solution but not a business solution because if one of the applications with a tied session goes off the network, what happens to the user experience? Personally I go for a session data database that exists either in an Oracle database situated on another server or a localised database available to all the nodes in the cluster shared by NFS from a network attached storage. Before you wince at NFS you should consider that commodity Grids make extensive and successful use of NFS against NAS with Jumbo TCP packets. It's not rocket science any more, it's here and it's reasonably quick.


    --Michael Hartley, October 29, 2005 09:51 PM

    Interesting thread. It's been going on for quite a while it seems. One thing which struck me as odd though, was the repeated statements that "threading is hard". Compared to a single threaded model it's a bit more to think about, but it's in no way an over-arching abstacle.
    Since it seems like many here have got previous experience working with various J2EE application servers I'd thought threading would have been second nature to you all by now. If you've ever tried to make a performant desktop application, you know that threading is essential there as well. Why this bashing of the threaded model? It's an excellent and pleasant way of solving a lot of problems.

    Also, as for the discussion on session management. For the last few projects we've employed memory to memory federation (various formulas of "peer-to-peer" replication) with great success. Saves capacity on the RDMSs so they can do other stuff and provide muscle for those of our services still left in DB land. We've also tried to reduce the over-handed way of using databases by implementing server caching in the resource and application layers primarily (thank god for the J2EE model where you can share memory resources and federate it to other nodes as well!). Presentation caching is almost futile today, since customers are demanding personalized services (not much point in caching an account statement for one specific Joe).

    As for resource consumption, I'd say we've manage to spread our use in a natural way, buy using technologies deemed best for their specific purposes. Eg. MQ for system and enterprise integration, LDAP servers for identity management, reverse proxies for moving off some of the security aspects to another tier, and of course databases for about 50% of our data shuffeling requirements.

    Some points about some specific techs which are worth repeating. Use LDAP servers for user management (all those people logging on and off you enterprise applications). LDAP servers are designed for frequent inquiries and infrequent updates, so they are blazing fast for that purpose. Split up you databases into different clusters tuned for the varying characteristics in you typical solution. For example, we've split one of our databases serving public portals (mostly read requests) and lifted the entire DB up into server memory. The performance is blazingly fast and can be recommended (naturally, the schemas also need to be optimized, less the CPUs will become the bottlenecks instad of IO).

    All is not great though. Currently our single biggest problem is actually web services. We've tried to stay away from them and use them only for integration where requirements are for low transaction volumes, but vendors are now starting to put webservice (SOAP / XML transformation junk) in most of their integration components. Event MQ integration has been infested by the web service virus and I'm afraid the situation will get a lot worse before it gets better.. (Three letters. SOA, shudder).

    --John Brogersen, November 12, 2005 07:39 PM

    The neat thing about a discussion of "performance" is that it means different things to everyone involved. Forking versus threading means nothing to the customer who wants their app to do what they want, when they want.


    In Computer Science, it's all about big-Oh notation anyway. These linear latencies go away (i.e. your reverse proxies) and your simply left to stare at your bad design choices. Penalties related to fork are highly overrated... Threads and forks *should be* very similar on a modern OS-- both involve library calls, system calls, and the process table. (See linux clone manpage).


    Apache 2.0 *adds* additional models, it does not remove any... My understanding that the threaded model was included (among other reasons) because it is faster on *Windows* where fork/spawn is very slow.


    Sun has built quite a structure around Java, but their inability to see around corners will be problematic. Doing anything useful with Java requires you to build quite a big structure around the language. (Hence the "Irritable Standard-itis" that afflicts most customers.)


    Web services rock, btw. If you don't like it, write something else. Java's extensible, right? Or is the point *not* to extend it? (Like a mobile home that never goes anywhere.)


    I think the Java-centric mindset is in for a shock at some point in the not-so-distant future. It's expensive to run and inflexible. A new technology will arise, and Java simply won't be the best solution, and it will take another 5 years for them to catch up to that.


    Meanwhile everyone who's invested in something "that'll be around 20 years from now" will wonder why they heck they thought it was still 1970.


    --Beaned by Java, February 24, 2006 07:18 AM

    Beaned by Java, first off I didn't see John writing anything about using reverse proxies as a way of reducing latencies. From what I read he seemed to use them for consolidating certain services (like security), i.e. separation of concerns, which in itself has nothing to do with performance, but to an extent relate to scalability.

    As to your mentioning of it being all about the "big-Oh" in CS, well for computational cost yes. However in a distributed world there are a lot of latencies outside your control which you can't mathematically fit into valid formula for your solution. Yes approximations can be done, but their not a logically correct for a valid formula (non-predictable outcome), so it's not as cut and dried a problem as that.

    I'm all for new approaches and agree that there is much to be said regarding many of the Java third-party framework complexities in general. However, sometimes you do need to do advanced stuff and it just happens that the Java platform stack with all it's third party vendor components happens to solve a lot of the problems which many projects do face.

    Ever created data processing infrastructure for the olympic games, some sports event or other highly transactional oriented solution? If so, then it would be apparent that some stuff are just really hard to do well and the more you can rely on quality components from third party experts, the less headache you'll have.

    Agreed, Java (or even J2EE) and Rails do not compete in the same league. RoR is a web framework adressing a very narrow (but important) problem space, while the Java stack is a general purpose platform and J2EE a bit more specialised at certain business problems (of which *one* is building web applications).

    If a solution is complex and RoR backed by Ruby doesn't provide components to do the main heavy lifting of a website, then one might be better off with for example using a Java based stack or DotNet (performance). However if a problem can be addressed by something like RoR, then I think that choice would be a preferable one due to streamlining with it's entailed benefits.

    Now, we both ventured a bit outside a pure scalability discussion, but my point is simply that performance and scalability has a lot to do with what kind of problems a solution is trying to address. Sometimes the RoR might perform well enough and will be able to scale that performance (throughput) as required, in other cases when more crunching , certain technologies / standards or integration is required it may not. Seems to fall back on a "best tool for the job" reiteration.

    --Mark Gray, March 15, 2006 02:01 PM