Reusable GZIP Streams

Details: 14 August 2016

This post explains the use of a pooled GZIP OutputStream. The initial motivation were performance measurements of the GZIP'ing inside of logstash-gelf.

Using new instances of GZIPOutputStream is costly and allocates a bunch of objects (byte buffer, Deflater) and more.

A reusable GZIPOutputStream differs from the JDK-provided class in some points:

Do not write the GZIP header upon instance creation but expose a writeHeader() method
Expose a reset() method to reset the deflater state
Do not close the stream to keep it reusable

A design pattern of Java streams is that the target requires being set upon construction. That’s a limitation for reusing but can be mitigated by constructing the pooled OutputStream instance with a custom OutputStream that allows you to switch the target.

Running a benchmark with the unpooled and pooled streams speaks for itself. The reduced garbage per compression run also affects the deviation in a positive way.

Benchmark                                  Mode  Cnt       Score       Error  Units
GelfMessageAssemblerPerf.compressPooled    avgt    5   18164,796 ±  5717,793  ns/op
GelfMessageAssemblerPerf.compressUnpooled  avgt    5  184431,045 ± 46292,939  ns/op

Find the code at Github: https://github.com/mp911de/reusing-gzip-streams