Skip to main content

Effects of thread pool size on concurrency

Let's imagine we have a web server that uses multi-threading as its concurrency approach with a fixed thread pool size of 100. Supposing that the server's request handler sleeps for 2 seconds for every 10th request, how many requests can this server handle every second?

This question persisted at the back of my mind as I was reading Deepu's series of posts on concurrency in modern programming languages. In Deepu's series he performed several web server benchmarks with the same 2 second delay for every 10th request, and the resulting requests per second figures appeared to be capped at a fixed ceiling. This post explores the factors affecting the maximum requests per second that a multi-threaded web server's can handle.

Benchmarks

To work out an answer to the opening question I ran several benchmark experiments on a simple web server that uses a thread pool with a fixed size for handling requests. One thread processes one request at a time. I performed the following load tests on the server using Apache Bench (ab):

  • 1 concurrent request
  • 10 concurrent requests
  • 100 concurrent requests
  • 500 concurrent requests

Concurrent request refers to the number of requests being made at a time.

For the web server I used Deepu's Rust web server:

main.rs
use rustws::ThreadPool;use std::fs;use std::io::prelude::*;use std::net::TcpListener;use std::net::TcpStream;use std::thread;use std::time::Duration;
fn main() {    let listener = TcpListener::bind("127.0.0.1:8080").unwrap();    // create thread pool size of 100    let pool = ThreadPool::new(100);
    let mut count = 0;
    for stream in listener.incoming() {        let stream = stream.unwrap();        count = count + 1;        pool.execute(move || {            // spawn each connection in a new thread            handle_connection(stream, count);        });    }}
fn handle_connection(mut stream: TcpStream, count: i64) {    let mut buffer = [0; 1024];    stream.read(&mut buffer).unwrap();
    // add 2 second delay to every 10th request    if (count % 10) == 0 {        println!("Adding delay. Count: {}", count);        thread::sleep(Duration::from_secs(2));    }
    let header = "HTTP/1.0 200 OKConnection: keep-aliveContent-Length: 174Content-Type: text/html; charset=utf-8    ";    let contents = fs::read_to_string("hello.html").unwrap();
    let response = format!("{}\r\n\r\n{}", header, contents);
    // write response    stream.write(response.as_bytes()).unwrap();    stream.flush().unwrap();}

Results

Here are the results of the benchmarks. We'll analyze them in the next section.

$ ab -n 1 -c 5000 http://localhost:8080/...Concurrency Level:      1Time taken for tests:   1003.114 secondsComplete requests:      5000Failed requests:        0Total transferred:      1415000 bytesHTML transferred:       880000 bytesRequests per second:    4.98 [#/sec] (mean)Time per request:       200.623 [ms] (mean)Time per request:       200.623 [ms] (mean, across all concurrent requests)Transfer rate:          1.38 [Kbytes/sec] received
Connection Times (ms)              min  mean[+/-sd] median   maxConnect:        0    0   0.0      0       1Processing:     0  200 600.1      0    2001Waiting:        0  200 600.1      0    2001Total:          0  200 600.1      1    2002
Percentage of the requests served within a certain time (ms)  50%      1  66%      1  75%      1  80%      1  90%   2000  95%   2001  98%   2001  99%   2001 100%   2002 (longest request)

$ ab -n 10 -c 5000 http://localhost:8080/...Concurrency Level:      10Time taken for tests:   100.329 secondsComplete requests:      5000Failed requests:        0Total transferred:      1415000 bytesHTML transferred:       880000 bytesRequests per second:    49.84 [#/sec] (mean)Time per request:       200.657 [ms] (mean)Time per request:       20.066 [ms] (mean, across all concurrent requests)Transfer rate:          13.77 [Kbytes/sec] received
Connection Times (ms)              min  mean[+/-sd] median   maxConnect:        0    0   0.1      0       1Processing:     0  200 600.1      0    2002Waiting:        0  200 600.1      0    2002Total:          0  201 600.1      0    2002
Percentage of the requests served within a certain time (ms)  50%      0  66%      1  75%      1  80%      1  90%   2000  95%   2001  98%   2001  99%   2001 100%   2002 (longest request)

$ ab -n 100 -c 5000 http://localhost:8080/...Concurrency Level:      100Time taken for tests:   10.328 secondsComplete requests:      5000Failed requests:        0Total transferred:      1415000 bytesHTML transferred:       880000 bytesRequests per second:    484.10 [#/sec] (mean)Time per request:       206.567 [ms] (mean)Time per request:       2.066 [ms] (mean, across all concurrent requests)Transfer rate:          133.79 [Kbytes/sec] received
Connection Times (ms)              min  mean[+/-sd] median   maxConnect:        0    2   2.2      0      10Processing:     0  202 600.1      1    2013Waiting:        0  201 600.2      1    2011Total:          0  203 600.1      2    2013
Percentage of the requests served within a certain time (ms)  50%      2  66%      5  75%      7  80%      9  90%   2000  95%   2003  98%   2006  99%   2009 100%   2013 (longest request)

$ ab -n 500 -c 5000 http://localhost:8080/...Concurrency Level:      500Time taken for tests:   10.243 secondsComplete requests:      5000Failed requests:        0Total transferred:      1415000 bytesHTML transferred:       880000 bytesRequests per second:    488.11 [#/sec] (mean)Time per request:       1024.349 [ms] (mean)Time per request:       2.049 [ms] (mean, across all concurrent requests)Transfer rate:          134.90 [Kbytes/sec] received
Connection Times (ms)              min  mean[+/-sd] median   maxConnect:        0   28  81.6     17    1012Processing:    10  830 1032.3     87    3909Waiting:        0  820 1033.9     81    3901Total:         32  858 1034.2    107    4075
Percentage of the requests served within a certain time (ms)  50%    107  66%   1863  75%   1868  80%   1872  90%   2050  95%   2098  98%   3859  99%   3883 100%   4075 (longest request)

Making sense of the results

We'll focus on the following summary of concurrent requests vs requests per second (RPS):

Concurrent requests110100500
Requests per second4.9849.84484.10488.11

Hmmm...

Why did we get these specific RPS figures?

Why did the RPS increase when we increase the concurrent requests from 1 to 10, and 10 to 100?

It seemed like the RPS grew linearly with the number of concurrent requests, but this didn't hold when concurrent requests went from 100 to 500. Why do we observe this?

Let's figure it out...

1 concurrent request, 4.98 RPS#

When the server receives one concurrent request, only one server thread actively processes the request at a time. For every 10 requests that are received, 9 of them complete nearly instantaneously (assume 0 seconds). The 10th request takes 2 seconds to complete because the thread sleeps for 2 seconds before returning the response. For this specific scenario, a single thread handles 10 requests in 2 seconds. This equates to ~5 RPS, which roughly matches the actual RPS value of 4.98.

10 concurrent requests, 49.84 RPS#

When ab makes 10 concurrent requests to the server, the server handled the first 9 requests immediately and sleeps on the 10th request for 2 seconds. Since the first 9 requests have completed ab sends the next 9 requests (i.e. the 11th to 19th requests), which are also handled instantly. Again, ab sends another batch of 9 requests. This time the 20th request is delayed for 2 seconds, while the remaining 8 requests complete. The server now has 2 threads waiting on the 10th and 20th requests to complete.

When we project what happens within a split-second of starting the ab load test, we will find the server will be blocked for 2 seconds handling 10 requests with 10 threads: the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, 90th, and 100th requests. After 2 seconds have elapsed the server finishes processing these 10 delayed requests, and ab is now ready to send more requests.

The server processed 100 requests within those 2 seconds: 90 instantly and 10 delayed. This represents 50 RPS, which is quite close to the actual value of 49.84.

Given the first two benchmarks, we can model the expected RPS with the following equation:

Requests handled per second = Concurrent requests * Average requests handled per thread per second

The equation tells us that the RPS can be determined by how many requests are hitting the server at a time and that they have a linear relationship.

100 concurrent requests, 484.10 RPS#

Applying the above equation to guess the RPS for 100 concurrent requests gives us an estimated 500 RPS (100 concurrent request * 5 RPS). Still quite similar to the actual 484.10 RPS figure we obtained from the benchmarks. Our mental model still works.

500 concurrent requests, 488.11 RPS#

For 500 concurrent requests, if we naively apply the earlier equation we might expect the server to achieve 2500 RPS (500 concurrent requests * 5 RPS). However, the benchmark reveals 488.11 RPS, which is similar to the result when 100 concurrent requests are made. The reason for this behavior is that the server isn't handling 500 requests at a time. Instead, the server only processes 100 requests concurrently because it has a limited thread pool size of 100 threads. The server queues the remaining 400 requests and waits until a thread in the thread pool is available before assigning the request to it.

Now that we understand the constraints imposed by a fixed thread pool size we can update our previous equation to the following:

Requests handled per second = MIN(Concurrent requests, Thread pool size) * Average requests handled per thread per second

Achieving greater concurrency

For our server to achieve a greater RPS we have several possible solutions.

Larger thread pool#

We could increase the thread pool size. However, each thread we create consumes additional memory and the server incurs a higher context switching overhead. Consequently, continuously increasing the thread pool size will eventually lead to diminishing returns or an out-of-memory situation.

Asynchronous processing#

Several programming language runtimes support a different concurrency model using asynchronous processing. Node.js is one such runtime that can process non-blocking requests in a single-threaded Event Loop instead of spawning a new thread for every request. Handling requests asynchronously in a single thread mitigates the context switching overheads and out-of-memory situations that a multi-threaded approach might present.

Green threads#

Other runtimes provide green threads. Green threads can be thought of as "virtual threads", which are created and managed by the runtime instead of the operating system. Such a runtime is responsible for scheduling green threads on native threads and deciding when to switch between green threads. The Go programming language is one popular language that provides green threads in the form of goroutines.

Suppose we have an example scenario where 500 concurrent requests are made to a server. A server with 8 CPU cores and support for green threads can spawn 500 green threads to handle the requests, but schedule them on 8 native threads. Because green threads are cheaper to create, quicker to switch between and, consume less memory than native threads, they can handle a greater number of concurrent requests.