Optimising Django on Dreamhost

12th March 2009, 2030

After deploying the first version of this website to Dreamhost I couldn't help feeling that the site was feeling very slow. This was to be expected as I have read many horror stories about the performance and support of Dreamhost but I wanted to investigate a little closer to give them the benefit of the doubt. This blog post documents the process I went through and the improvements I made along the way.

Measuring up

The most important thing to remember when doing performance optimisation work is measurements, measurements, measurements. If you don't have a target and don't have a measurement of how far away you are from that target then you could be doing lots of pointless work. To kick off this work I need to know three things;

  1. What is the best performance I can get from Dreamhost, i.e. what is the performance of serving a static file
  2. The current performance I'm getting from dreamhost
  3. The current performance I'm getting in a perfect environment, i.e. the Django development server on my local machine

In order to take these measurements I used a tool called Apache Bench. This comes pre-installed on OS X and is available on debian based systems by installing the apache2-utils package via apt.

Using apache bench I made 100 requests, serially, not parallel, to my Medarbeidersamtale post in a test installation of the site. I'm using a test installation so that my performance testing does not affect my regular apache log statistics. This test site mirrors exactly the configuration of the production site only that it runs against a different database and runs under a subdomain.

From the terminal I ran the following commands;

  • sleep 300 This attempts to ensure that no python processes are running on the Dreamhost server before I run my tests, as you will see soon this was an important step. During this time I did not run any requests manually against the test site.
  • ab -n 100 http://test.sharebear.co.uk/robots.txt This is attempting to measure best possible performance. My robots.txt is insanely small, the time that it takes apache to serve that must be the fastest I can expect to be possible from Dreamhost.
  • ab -n 100 http://test.sharebear.co.uk/blog/2009/01/10/medarbeidersamtale/ This is measuring the current performance of a Django rendered page from Dreamhost.
  • ab -n 100 http://localhost:8000/blog/2009/01/10/medarbeidersamtale This is measuring the current performance of the same Django page from my local machine.

Of course, as with all fair tests I ran the above commands over a few times to verify I was getting consistent results. Lets take a look at the results.

$ ab -n 100 http://test.sharebear.co.uk/robots.txt
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking test.sharebear.co.uk (be patient).....done


Server Software:        Apache/2.0.63
Server Hostname:        test.sharebear.co.uk
Server Port:            80

Document Path:          /robots.txt
Document Length:        24 bytes

Concurrency Level:      1
Time taken for tests:   38.559 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      43800 bytes
HTML transferred:       2400 bytes
Requests per second:    2.59 [#/sec] (mean)
Time per request:       385.588 [ms] (mean)
Time per request:       385.588 [ms] (mean, across all concurrent requests)
Transfer rate:          1.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      180  190   6.8    190     219
Processing:   185  195  15.2    193     333
Waiting:      185  195  15.1    192     332
Total:        366  385  17.7    383     530

Percentage of the requests served within a certain time (ms)
  50%    383
  66%    387
  75%    389
  80%    391
  90%    398
  95%    404
  98%    419
  99%    530
 100%    530 (longest request)

First, a quick explanation of the connect, processing and waiting times as listed above.

Connect time is the time it took to establish a connection, and it's no surprise that this is around the same as the ping time to the server.

The waiting time is the time it took for the first bits of a response to come back, so included in this is the time for parsing the request, the time for handling the request, starting to send back the response and the time it takes the response to get back to my machine. As we know the time taken by the network (the connect time) we can subtract that to find out how long the server took initially handling the request, which in this case was 2 ms.

Processing time is the total time taken from connect to the complete response being received. Therefore this includes the waiting time discussed above. Subtracting the waiting time from the processing time you get a number that represents how long it took the server to render the rest of the response (if this is high it's a likely indication your templates are doing too much work) and transport it back to you. In this case that time would be 1 ms but as we're talking about robots.txt that is only a couple of bytes long, it's not too shocking.

From the maths above we can see that the server only used about 3 ms to handle this request, most of the time was spent on the network. Perhaps Dreamhost isn't that slow after all. Now that we have a baseline measurement lets try testing a real url from a test installation of my blog on Dreamhost.

$ ab -n 100 http://test.sharebear.co.uk/blog/2009/01/10/medarbeidersamtale/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking staging.sharebear.co.uk (be patient).....done


Server Software:        Apache/2.0.63
Server Hostname:        test.sharebear.co.uk
Server Port:            80

Document Path:          /blog/2009/01/10/medarbeidersamtale/
Document Length:        10037 bytes

Concurrency Level:      1
Time taken for tests:   98.984 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1040700 bytes
HTML transferred:       1003700 bytes
Requests per second:    1.01 [#/sec] (mean)
Time per request:       989.844 [ms] (mean)
Time per request:       989.844 [ms] (mean, across all concurrent requests)
Transfer rate:          10.27 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      179  189   6.0    188     216
Processing:   712  801 185.2    768    2537
Waiting:      340  423 184.3    395    2174
Total:        897  990 184.7    960    2721

Percentage of the requests served within a certain time (ms)
  50%    960
  66%    981
  75%    997
  80%   1003
  90%   1063
  95%   1123
  98%   1208
  99%   2721
 100%   2721 (longest request)

That's more like the numbers I was expecting to see from the way the site was reacting 960 ms (i.e. almost a whole second) to get a page out. That one request you can see taking > 2.5 seconds is when apache has to start up a new python thread to service the request. I have an idea for how to deal with that but for now we'll concentrate on the rest of the requests. Subtracting the 380 ms we know is lost to the network that means the server is spending 580 ms to serve the page. Sounds a little high to me for such a simple blog page. Lets look at the results running locally.

$ ab -n 100 http://localhost:8000/blog/2009/01/10/medarbeidersamtale/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        WSGIServer/0.1
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /blog/2009/01/10/medarbeidersamtale/
Document Length:        10037 bytes

Concurrency Level:      1
Time taken for tests:   12.500 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1018400 bytes
HTML transferred:       1003700 bytes
Requests per second:    8.00 [#/sec] (mean)
Time per request:       124.998 [ms] (mean)
Time per request:       124.998 [ms] (mean, across all concurrent requests)
Transfer rate:          79.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   108  125  18.9    114     175
Waiting:      108  124  18.8    113     175
Total:        109  125  18.9    114     175

Percentage of the requests served within a certain time (ms)
  50%    114
  66%    121
  75%    141
  80%    154
  90%    157
  95%    159
  98%    160
  99%    175
 100%    175 (longest request)

Something is beginning to look a little suspicious here, 114 ms to render a very simple blog post on pretty well spec'd MacBook Pro seems a little high so this is the first thing I'm going to investigate. However, before I do that, I first need to set myself a goal for optimisation.

Creating a target number is part common sense, part experience and part determined by the numbers above. From the numbers above we know that the network imposes as time of 380 ms. Experience tells me that 50 ms would probably be a sane expectation for rendering of a dynamic web page. Common sense tells me that running on a shared host I can expect that number to be a little higher so I'll add a 50 ms fudge factor. Putting that all together gives me a final target of ~500 ms. Once I've reached that I've reached the limits of what I can reasonally expect accessing a server in the states from Europe.

Markup filters considered harmful

As mentioned above rendering that one simple blog post shouldn't take XXX ms on my local machine, this indicates there is something really wrong. So what is the page doing? Answer: not much. It's using a generic view to render the blog post into a template. The template is insanely simple, just dumping out the relevant information from my blog.Entry model, apart from one small thing {{ entry.content|restructuredtext }} (I'm using reST for authoring my blog posts). On a hunch I removed the markup filter and re-ran the test... 13 ms, now that's a lot better.

From here the solution to this performance killer is obvious, move the post formatting into the save() method of my entity. I'm not going to write up this process in this article, try Google, there are many blog posts out there about this already.

Pushing this change up to the test server now reveals the following performance statistics.

$ ab -n 100 http://test.sharebear.co.uk/blog/2009/01/10/medarbeidersamtale/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking test.sharebear.co.uk (be patient).....done


Server Software:        Apache/2.0.63
Server Hostname:        test.sharebear.co.uk
Server Port:            80

Document Path:          /blog/2009/01/10/medarbeidersamtale/
Document Length:        10037 bytes

Concurrency Level:      1
Time taken for tests:   77.831 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1048900 bytes
HTML transferred:       1003700 bytes
Requests per second:    1.28 [#/sec] (mean)
Time per request:       778.315 [ms] (mean)
Time per request:       778.315 [ms] (mean, across all concurrent requests)
Transfer rate:          13.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      180  189   6.0    188     209
Processing:   553  589 160.1    575    2170
Waiting:      188  213 158.5    197    1781
Total:        737  778 160.1    764    2356

Percentage of the requests served within a certain time (ms)
  50%    764
  66%    768
  75%    772
  80%    774
  90%    783
  95%    790
  98%    807
  99%   2356
 100%   2356 (longest request)

Now this is a reasonable improvement in median response time (20%) but still not close to my target, and we know that the app performs well in a perfect environment.

Further Investigations

Due to the fact that the application responds so quickly on my machine I next start looking at environmental differences that I can have some influence on. The biggest difference between the two environments is that on DreamHost I'm using MySQL as a database and locally I'm using sqlite that is known for performing well under read only loads. Unfortunately after modifying the configuration to run sqlite on the server, there was no noticable change in the results.

Next I tried disabling mod_security from the DreamHost panel. This did give me ~50 ms speed boost and I'm pretty confident that mod_security doesn't really buy me anything when running Django. Every little helps but this still isn't enough to reach my target.

I also tried deploying the site as an fcgi app instead of mod_passenger, this gave no noticable differences.

The obvious next route to think of is to use some kind of caching. Unfortunately DreamHost doesn't offer any possibilty to run a cache such as Memcached so I decided to try activating the in-process cache. This also gave no speed difference.

Caching, always fastest

The fact that the in-memory cache didn't save anything in performance implies that it's not the time taken to render the page that is adding time over rendering a static page but just entering the python process at all is what is costing me in response time.

The obvious way to deal with this situation is to avoid running any python code, and how do we do that? Render the pages to a file based cache and let apache serve the html directly. Luckily I'm not the first person to think of this so someone else has already done all the hard work for me in the form of the StaticGenerator project.

One problem I had here was that it was simpler to configure the apache server to use the cached files. The performance after making this change is now.

$ ab -n 100 http://test.sharebear.co.uk/blog/2009/01/10/medarbeidersamtale/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking test.sharebear.co.uk (be patient).....done


Server Software:        Apache/2.0.63
Server Hostname:        test.sharebear.co.uk
Server Port:            80

Document Path:          /blog/2009/01/10/medarbeidersamtale/
Document Length:        271 bytes

Concurrency Level:      1
Time taken for tests:   41.045 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Non-2xx responses:      100
Total transferred:      70300 bytes
HTML transferred:       27100 bytes
Requests per second:    2.44 [#/sec] (mean)
Time per request:       410.446 [ms] (mean)
Time per request:       410.446 [ms] (mean, across all concurrent requests)
Transfer rate:          1.67 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      181  211 102.5    192    1141
Processing:   181  200  38.5    191     434
Waiting:      181  199  38.5    191     434
Total:        366  410 119.9    385    1336

Percentage of the requests served within a certain time (ms)
  50%    385
  66%    390
  75%    395
  80%    398
  90%    410
  95%    636
  98%    835
  99%   1336
 100%   1336 (longest request)

This brings the results well within the goal I had set myself (even though we have cheated a little).

Keeping python alive

The above change how has produced reasonable performance from the site but we still need to do something about the 2.5 second start up time for the first request if we have a cache miss and for any dynamic pages I may create later.

It's well documented that DreamHost will kill any inactive processes. In order to calculate when dreamhost kills the process I used the following script;

#!/usr/bin/env bash

for x in 30 60 90 120 150 180 210 240 270 300; do
    for y in 1 2 3; do
        sleep $x;
        echo -n "After sleeping $x seconds "
        ab http://test.sharebear.co.uk/keepalive/ | grep Total: | awk '{if($2>1000) {print "FAIL: ", $2;} else {print "PASS: ", $2}}'
    done;
done;

Then in order to keep the process alive we just need to make sure there is a request at least every X minutes, where X is determined by the output of the above script. This is done by a simple entry in my crontab running something similar to the following command.

$ wget --output-document /dev/null http://test.sharebear.co.uk/keepalive/

Note the use of a separate URL for this. This ensures that I will never accidentally enable caching for this URL and it make it really simple to filter these requests out of my normal apache request logs.

Summary

After all this work I've now managed to gain 20% speed improvement on uncached page rendering. I've also eliminated the startup time for the first request if noone's been to my site for a while so, for an uncached, first request page there is a 70% performance increase and cached pages are even faster. This means that even from Europe my site is beginning to feel snappy and responsive from a supposedly slow hosting service.

Interstingly the most important performance increase was to eliminate the use of the restructuredtext markup tag. Perhaps the markup tags should be documented as just a development convenience and not for use on a real site.

As I've been doing all of my performance measurements from Norway this means that anyone accessing my site from nearer DreamHost's datacenter (i.e. anyone in the USA) should get awesome response times. A quick test from a friend's SliceHost server (thanks Rune) reveals response times of around 265 ms again, not bad for a host everyone slates for being slow. Although the only way I'm going to get that kind of response times in Europe is to move to a more expensive host.

This Blog represents my personal views and experiences which are not necessarily the same as those of my employer.