Stop Wasting Connections, Use HTTP Keep-Alive

#tutorials #api #network #devops

With the proliferation of third-party APIs and microservice architectures, modern web servers can make as many outgoing HTTP requests as the number of incoming HTTP requests they serve. A typical web application can interact with third-party APIs to handle payment processing, send email, track analytics, dispatch text messages, verify mailing addresses, or even deliver physical mail. A server can also rely on internal APIs to fetch account information, start asynchronous processes, or perform complex searches. Programs that initiate a high volume of outgoing HTTP requests must minimize the overhead of each in order to remain performant and optimize resource utilization.

One of the best ways to minimize HTTP overhead is to reuse connections with HTTP Keep-Alive. This feature is commonly enabled by default for many HTTP clients. These clients will maintain a pool of connections—each connection initializes once and handles multiple requests until the connection is closed. Reusing a connection avoids the overhead of making a DNS lookup, establishing a connection, and performing an SSL handshake. However, not all HTTP clients, including the default client of Node.js, enable HTTP Keep-Alive.

One of Lob's backend services is heavily dependent on internal and external APIs to verify addresses, dispatch webhooks, start AWS Lambda executions, and more. This Node.js server has a handful of endpoints that make several outgoing HTTP requests per incoming request. Enabling connection reuse for these outgoing requests led to a 50% increase in maximum inbound request throughput, significantly reduced CPU usage, and lowered response latencies. It also eliminated sporadic DNS lookup errors.

Performance Benefits of HTTP Keep-Alive

Running a benchmark validates the performance benefits of HTTP Keep-alive. The following chart displays the total time taken to make 1000 GET requests for both non-reused and reused connections with a varying number of requests made concurrently. It shows that for all levels of tested concurrency, reusing connections reduces the total run time by a factor of roughly 3.

Another observed benefit of reusing HTTP connections is reduced CPU utilization. On Mac OS X, this reduction manifests in the Node process itself and in a process named mDNSResponder, an operating system service responsible for resolving DNS. Running top -stats pid,command,cpu | grep -E "(mDNSResponder|node)\s" during both benchmarks shows the contrast in CPU usage.

Without Connection Reuse

With Connection Reuse

Inspecting the flamegraph of the benchmark script without connection reuse reveals the reason for increased CPU utilization in Node. A large percentage of CPU time is spent on establishing connections and performing SSL handshakes. For example, the flame fragment below shows that 14% of measured CPU ticks occurred while creating a socket.

It should be noted that initiating connections also incurs overhead for HTTP servers. Therefore, reusing connections also reduces overhead for servers handling these requests.

Flamegraphs of each benchmark are available to explore: flamegraph without connection reuse, flamegraph with connection reuse.

The benchmarking scripts are documented in node-keep-alive-benchmark.

Reduced DNS Errors

Reusing connections also eliminated a set of DNS errors that occurred sporadically within our service. When connections are not reused, a new connection is initialized for each outgoing request. In Node, this initialization includes a DNS lookup to determine the IP of the domain to send the request to. A high volume of DNS lookups can lead to sporadic errors of the form Error: getaddrinfo ENOTFOUND.

Based on several issues in the Node repository (nodejs/node-v0.x-archive#7729, nodejs/node-v0.x-archive#5488, nodejs/node#5436) this error can occur when a DNS server fail to respond, perhaps due to it rate-limiting requests. Reducing DNS lookups can reduce or eliminate these errors.

Tips when Reusing Connections

Check Your Timeouts

In some cases, reusing connections can lead to hard-to-debug issues. Problems can arise when a client assumes that a connection is alive and well, only to discover that, upon sending a request, the server has terminated the connection. In Node, this problem surfaces as an Error: socket hang up.

To mitigate this, check the idle socket timeouts of both the client and the server. This value represents how long a connection will be kept alive when no data is sent or received. Make sure that the idle socket timeout of the client is shorter than that of the server. This should ensure that the client closes a connection before the server, preventing the client from sending a request down an unknowingly dead connection.

Don't Use Node's Default HTTP Agents

For Node services, the agentkeepalive library provides HTTP and HTTPS agents that enable connection reuse by default. These agents also have other sensible defaults that the standard libraries agents do not.

Wrap Up

Connection reuse should provide significant performance improvements to services written in any language that are making numerous outgoing HTTP requests. Some HTTP clients enable this behavior by default, but not all. Some widely used languages and libraries do not enable HTTP Keep-Alive by default, such as Node, so be sure to check the documentation and source code.

More on HTTP Keep-Alive from Mozilla

The Ops Community ⚙️