= The fair load balancer module for Nginx (upstream_fair) = == What? == `upstream_fair` is a load balancer module for the fantastic Nginx web server. It implements somewhat smarter logic than the built in pure round-robin load balancer and may be better suited to diverse workloads (a mix of fast and slow pages) than the stock balancer. == Why? == === Smarter load balancing === The main feature of `upstream_fair` is that it knows how many requests each backend is processing (a backend is simply one of the servers, among which the load balancer has to make its choice). Thus it can make a more informed scheduling decision and avoid sending further requests to already busy backends. === Statistics === Another neat feature is the built-in status page (requires my StubStatus hook patches), which can tell you: * how may requests have been proxied * what the distribution between backends was * what the current workload is (per-backend) === Special load balancer for special needs === `upstream_fair` has several modes of operation, making it suitable for diverse environments. ==== default ==== The default mode is a simple WLC-RR (weighted least-connection round-robin) algorithm with a caveat that the weighted part isn't actually too fair under low load (under high load it all averages out, anyway). This is the `upstream_fair` many of you already know. Other modes are the result of recent development so grab a copy before your competition does ;) ==== no_rr ==== If you wish, you may disable the "-RR" part, which means that whenever the first backend is idle, it's going to get the next request. If it's busy, the request will go to the second backend unless it's busy too etc. Why would you want to disable round-robin? A particularly good reason is when you're still unsure about how many backends you need and are starting the backends on demand (e.g. using my [wiki:Spawner]). With round robin enabled, the requests will get distributed roughly equally between backends, so all backends will have to run all the time (even if you actually use 10% of their capacity). When you disable round-robin, you are going to use exactly as many backends as you really need. ==== weight_mode=idle no_rr ==== However, by default an "idle" backend (a rather central concept in `upstream_fair`) is exactly that: a backend with zero requests being processed. Thus two concurrent requests will cause two backends to start up even if one would easily handle it. Enter `weight_mode=idle`. This mode redefines the meaning of "idle". It now means "less than ''weight'' concurrent requests". So you can easily benchmark your backends and determine that X concurrent requests is the maximum for you (e.g. while keeping latency below a limit or maximising throughput), set the weight to that amount and that's it. `upstream_fair` will balance between the minimum possible pool of backends, adding new ones as the load increases. Although the backends are all considered "idle" by the main algorithm, they are still scheduled using the least-connection algorithm (without the weighted part). ==== weight_mode=peak ==== On the opposite end of the scale, you may find out that your backends cannot keep up with the load and you'd rather return 50x errors to the client than try to process too many requests (you might e.g. have a funky tiered load-balancing setup or try to keep latency under control). Simply enable `weight-mode=peak` and be sure that Nginx will never send more than ''weight'' requests to any single backend. If all backends are full, you will start receiving 502 errors. == Where? == You may browse the code (and download a tarball) on github: Github: http://github.com/gnosek/nginx-upstream-fair/tree/master upstream_fair is also documented on the Nginx wiki: Nginx wiki: http://wiki.codemongers.com/NginxHttpUpstreamFairModule == How? == === Download === tarball:: http://github.com/gnosek/nginx-upstream-fair/tarball/master git repo:: git://github.com/gnosek/nginx-upstream-fair.git === Install === Add the following option to your Nginx `./configure` command: {{{ --add-module=path/to/upstream_fair/directory }}} Then "make" and "make install" as usual. === Configure === To enable the fair balancer, simply add 'fair' to the upstream block, like this: {{{ upstream backend { server server1; server server2; fair; } }}} The 'fair' directive accepts the parameters 'no-rr', 'weight-mode=idle' and 'weight-mode=peak' described above. == Anything else? == === Why all these modes? The syntax is ugly! === Yep. I know. However at the moment load balancer modules cannot define their own parameters to the `server` directive (e.g. `server 1.2.3.4 idle=4 peak=20`), so we have to live with what we've got (`fair weight-mode=idle; server 1.2.3.4 weight=4`). === Performance === `upstream_fair` shouldn't impact your throughput significantly. Below you'll find a totally meaningless benchmark, comparing the stock load balancer and `upstream_fair` in some synthetic conditions. The upstream section from config file: {{{ upstream testing { # fair; server 127.0.0.1:81 max_fails=3 weight=2; server 127.0.0.1:81 max_fails=3 weight=2; server 127.0.0.1:81 max_fails=3 weight=2; server 127.0.0.1:81 max_fails=3 weight=2; server 127.0.0.1:81 max_fails=3 weight=2; server 127.0.0.1:81 max_fails=3 weight=2; } }}} Port 81 is used by a Lighttpd instance serving whatever it does by default in Ubuntu (a simple static page). Both Nginx and Lighttpd serve about 15000 requests per second without proxying. NOTE: I have no idea about the extreme peak latency. This happens regardless of the load balancer or actually proxying at all. Even serving static content or the status page seems affected. I used Nginx 0.6.31 for testing. ==== default load balancer ==== {{{ Document Path: / Document Length: 3585 bytes Concurrency Level: 500 Time taken for tests: 9.172814 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 192084569 bytes HTML transferred: 179282265 bytes Requests per second: 5450.89 [#/sec] (mean) Time per request: 91.728 [ms] (mean) Time per request: 0.183 [ms] (mean, across all concurrent requests) Transfer rate: 20449.78 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 31 294.4 2 3012 Processing: 3 43 284.0 16 3043 Waiting: 2 41 283.9 14 3042 Total: 7 75 463.5 17 6045 Percentage of the requests served within a certain time (ms) 50% 17 66% 19 75% 20 80% 21 90% 24 95% 31 98% 88 99% 3024 100% 6045 (longest request) }}} ==== `upstream_fair` ==== {{{ Document Path: / Document Length: 3585 bytes Concurrency Level: 500 Time taken for tests: 9.289024 seconds Complete requests: 50000 Failed requests: 0 Write errors: 0 Total transferred: 192073046 bytes HTML transferred: 179271510 bytes Requests per second: 5382.70 [#/sec] (mean) Time per request: 92.890 [ms] (mean) Time per request: 0.186 [ms] (mean, across all concurrent requests) Transfer rate: 20192.76 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 35 349.8 2 8999 Processing: 1 35 235.8 16 3034 Waiting: 0 33 235.8 14 3030 Total: 8 71 451.2 17 9019 Percentage of the requests served within a certain time (ms) 50% 17 66% 19 75% 21 80% 22 90% 27 95% 32 98% 90 99% 3020 100% 9019 (longest request) }}} === Algorithm and internals === TODO (`sched_score`, shared memory etc.) === Sites using `upstream_fair` === Feel free to add your site here! * !http://you?