The BrotliLeaks Affair

Reading Time: 6 minutes

For more than a year now, we offer Brotli compression in our products. Of course, wao.io serves Brotli compressed responses, too. Usually this yields higher compression ratios than the common treatment with gzip.

By the end of last year, we have switched our systems from the preliminary Apache integration by kjdev to the official mod_brotli implementation shipping with Apache httpd since  2.4.26. At the same time, we needed to upgrade to brotli 1.0. Our tests were passing and the update went into production. However, we noticed an unusual increase in memory consumption on our Web servers.

Screenshot of top-ordered by memory usage:

1.2 GB is about two orders of magnitude larger than the expected size of a single httpd process.

A quick test with curl or ab and -H ‘Accept-Encoding: br’ showed that the RSS memory was increasing with each request, but not with gzip, deflate or no encoding. Therefore, we reverted the brotli and mod_brotli updates and pushed that onto our production systems.

We could immediately see the RSS size dropping. Our memory graphs were returning to their old level. However, rolling back to old software could not be the solution. Even more so because we realized that the old brotli setup did bloat the httpd processes, as well. Not as much, but still too much. We decided to investigate where the memory leak originated from.

Finding The Leak

The obvious choice to identify the source of a leak is Valgrind’s memcheck.The premise is simple: just let valgrind run your program, and it’ll provide you with a nice summary as well as a stack trace for each memory allocation that was lost:

Initial results looked promising:

So memcheck did find some potential issues with memory that didn’t get free()ed, and some of them – the largest blocks – involved mod_brotli and the brotli encoder. Hooray, something to work with!

But there were two obvious problems with that: There were 3 MB at most in potential leaks, but we were looking for 100 MB or more, and neither the size nor the number of leaks grew with the number of requests.

Since Apache httpd makes use of APR Memory Pools to limit the lifetime of memory allocations to a single request, and performs global cleanup of all pools when the httpd child terminates, we suspected that this may mask any leaks from memcheck. mod_brotli even redirects all internal allocations of brotli to use APR by setting alloc_func and free_func appropriately when calling BrotliEncoderCreateInstance(). To make sure that memcheck could find all potential leaks we removed any use of APR allocators from mod_brotli and brotli. The result: no change – the reported potential leaks were still tiny, the memory usage reported by top or ps stayed as big as before.

So back to the basics: add a printf() to every single call to malloc(), free(), apr_pcalloc(), apr_bucket_alloc() and friends, and check for corresponding pointers, in mod_brotli and brotli itself. After a few more rounds of ab and curl, combined with a bit of grepping, cutting, sorting and diffing our debug output, it turned out that every single call to malloc() had a matching call to free(), no matter the size or number of requests we inflicted upon our test setup.

Spotting the difference: When a Leak is not a Leak

Considering everything we knew so far, whatever was causing the excessive memory use with mod_brotli could not be a traditional memory leak. The next potential cause was heap fragmentation. Instead of outright forgetting to free memory, heap fragmentation is caused by a sequence of allocations and deallocations that split the heap into non-contiguous blocks interspersed with unused chunks that are neither used nor returned to the system. Think of nicely chopped up memory sashimi. This results in memory that is reserved and possibly used from the point of view of the kernel, but is not practically usable for the program, for example because it is only available in chunks that are too small.

The easiest way to confirm our suspicion was to switch to another malloc implementation. We’ve had good experiences with TCMalloc from the Google perftools, which is simple to use (just LD_PRELOAD the dynamic library). Lo and behold, httpd processes still grow depending on the size of the response they have to compress, but always shrink back to about 20 MB when they idle.

Now that we finally knew the cause of the increased memory consumption, we needed a solution. Since the makers of TCMalloc do not recommend the use via LD_PRELOAD, we had to look elsewhere.

Tuning glibc

The stock malloc() and free() implementations for the most Linux distributions is from GNU libc. This ubiquitous library provides a small number of tunable parameters that can be used to influence when memory is acquired or returned to the system.

Some experiments later, we found that we had to limit both MALLOC_TRIM_THRESHOLD and MALLOC_ARENA_MAX to effectively curtail mod_brotli’s hunger for memory. Since both variables control performance optimization mechanisms, we were reluctant to just set them to a very small value to minimize memory usage. After some experiments and benchmarks, the following values were chosen, because they had no discernible effect on the CPU time used (utime and stime), but still reduced memory usage of httpd to a level very close to the minimum observed with any other values:

We use this parameters in the Apache start script. The effect on memory usage was clearly visible:

Memory use after rollout

See for yourself

We have written this article to provide information for everyone who may be experiencing the same problem. To isolate it from our application, we have built a reproducer with Docker. We took the official httpd Dockerfile, adapted it to work with CentOS and included mod_brotli. It’s available as a github gist.

Let’s try it out! To see the increased memory use for yourself, first build an image from the Dockerfile:

Then run it in the background:

The fresh, idling httpd processes need about 4MB of RAM each:

In the Dockerfile, we had to create a larger index file for brotli to compress in order to increase bloat. Request that large file a few times:

Let’s check the size of the httpd processes again:

That is more than 200 megs per process!

To stop the container, run:

To see the effect of tuning glibc’s malloc, specify the environment variables when you run the container and exec everything as above:

Run ab again:

Now the httpd children return their used memory and shrink back to less than 10 MB when idle:

Conclusion

We had to learn that the cause of this kind of memory leak is hard to find. All instruments show “green”, but the memory is trapped. And even worse, there is no one to blame 🙂 glibc, brotli, and Apache, they all behave correctly from their point of view. It’s not a bug. However, the combination of these software components create an undesirable effect and it is up to you, the operator or developer, to make sure they play together nicely.

Leave a Comment

Your email address will not be published.