Ruby Web Servers Benchmark
Feeling like my hardware wasn’t being used to the full extent of its capabilities, I decided to benchmark the most successful web servers from my previous trials in a different environment. This time, however, I’d use more workers, to see if there are significant performance gains and watch memory consumption.
Different versions of Ruby were also taken into account. As you’ve probably guessed, I’m benchmarking the application in Ruby 1.8 and Ruby 1.9.
Setup
The setup was pretty simple. Nginx (0.7.64) was used in all the tests. The remaining components’ versions were:
Nginx was acting as a proxy balancer and main server for Thin/Unicorn and Passenger, respectively. At first, 30 workers were used. In the following round, the number of workers was increased to 60.
The Ruby versions used were 1.8.7 (patchlevel 249) and 1.9.1 (patchlevel 376), compiled from source with the same flags: ”-O2 -march=nocona -pipe”.
Test
An awesome tool called autobench was used for this benchmark. While being great a great tool, ab lacks some of the existing features in httperf. Autobench, based on httperf, allows to perform more complex benchmarks and obtain valuable results.
This specific test aimed and discovering how much requests per second could the web server handle for each web page. Autobench would calibrate httperf to try to get more and more juice out of the application until it reached a bottleneck. After that it would try to stabilize the number of requests per second at the higher level the system could handle.
Memory consumption of all the involved components, in each test, was also recorded using the information in /proc/{pid}/status.
For this ride, 3 pages of escolinhas were used. The most visited one, the heaviest one and the lightest one. I’ll let you figure out which one is which.
An important side note is that a request needs to be completed in less than 30 seconds to be considered valid. If the reply only comes 32 seconds after, it is considered a failed request.
Configuration
Each setup had a similar configuration. The important sections were as follows:
When you see [30|60], it is obviously related to the varying number of workers. Nginx had a pretty standard configuration for all the tests.
All tests were ran on Gentoo Linux, with a tweaked sysctl to allow a higher throughput.
Results
The results of Ruby 1.8.7 are shown, moving on to Ruby 1.9.1. Finally, a brief analysis on memory usage is presented.
Ruby 1.8.7
Starting with Ruby 1.8.7, the results were as follows.

As you’ve probably guessed, this is the heaviest page. Autobench was unable to find a stable point on this page, it’s simply too heavy to be persistently being requested. Anyway, all web servers behaved similarly, being able to dispatch 2.5 requests in the first iteration but completely suffocating after that.

This time each setup was able to consistently serve the web page, being able to serve 10~12 requests per second. Although each web server performance is quite similar, we can see than Unicorn (with 60 workers) tends to take the lead.

Again, all setups perform similarly. Unicorn (with 60 workers) seems to also take the lead on this one, although by an insignificant margin.
Ruby 1.9.1
After these tests, the configuration was changed to use Ruby 1.9.1. Let’s see how it stacks up.

Wow. I mean - WOW. Switching the Ruby version increased the number of responding cycles to 15~16. The average handled requests per second also had a huge boost. I already knew that Ruby 1.9 is quite more efficient than Ruby 1.8, but we’re talking a 15x increase in successful iterations and a 2x increase in requests per second!
Yes, there are exceptions:
- Thin wasn’t able to cope with the rest of its folks. This is probably related to the fact that Thin was design to handle small requests and this page is quite heavy, needing many database queries and using complex Ruby code;
- Passenger with 30 workers had an intermittent behavior, failing before Unicorn (and Passenger itself with 60 workers) but coming back to handle a few more requests.
Passenger with 60 workers seems to take the lead on this one. Unicorn also behaved quite well, being stable all along.

The results were pretty similar to our previous tests with Ruby 1.8.7, probably because the page is quite light. Ruby code is not a bottleneck here, as we can clearly see. Unicorn (both 30 and 60 workers) seems to be on top in most iterations.

The results were, again, very similar. The reason is probably the same I’ve stated above. Passenger acted weirdly throughout the benchmark. With 30 workers, things went normally until the 10th iteration, where it started failing and acting weird. With 60 workers, it acted weirdly all along.
Since these strange requests have taken a surrealistic time of 0.1 seconds to complete, I’m disqualifying Passenger here as something clearly went wrong. I’ve repeated these tests but the same results came out. I have not tried to find the true cause of this since, as we’ve seen, it won’t make much difference.
Memory
Here is the memory consumption in MB.

A few details regarding memory usage:
- Thin always uses less memory, either it’s page 1 with 30 workers or page 2 with 60. By larger or smaller margins, it’s always the memory champion;
- Passenger used a lot of memory with 30 workers. From 30 to 60 the difference is almost unnoticeable;
- When using Ruby 1.9, the memory usage was a lower than with Ruby 1.8, except for Unicorn with 30 workers on page 1.
Conclusions
After an exhaustive analysis of web servers performance, scalability and memory usage I can only state one fact:
The differences are very small, probably not noticeable and not really important to most people. One exception: Ruby 1.9. Start upgrading your applications, folks!
If you disagree with me, have another look at both charts regarding page 1 (with the different Ruby versions). Yes, real stuff there.
Diving into more detail, we can see that Unicorn with 60 workers generally yields better performance and scalability, although using a bit more memory than Thin.
We can also verify that the difference between 30 and 60 workers is completely insignificant - the database is the major bottleneck here. Maybe with efficient caching solutions (I’m looking at you, memcached!) the results could be a bit different. The native caching mechanisms of MySQL don’t seem to be highly effective.
You can still compare 30/60 workers with only 4 workers, by having a look at my previous benchmarks:
Interesting, huh? Happy Easter!