This post explains the use of a pooled GZIP OutputStream. The initial
motivation were performance measurements of the GZIP’ing inside of logstash-gelf.
Using new instances of GZIPOutputStream
is costly and allocates a bunch of objects (byte buffer, Deflater
) and more.
A reusable GZIPOutputStream
differs from the JDK-provided class in some points:
- Do not write the GZIP header upon instance creation but expose a
writeHeader()
method - Expose a
reset()
method to reset the deflater state - Do not close the stream to keep it reusable
A design pattern of Java streams is that the target requires being set upon construction. That’s a limitation
for reusing but can be mitigated by constructing the pooled OutputStream
instance with a custom OutputStream
that allows you to switch the target.
Running a benchmark with the unpooled and pooled streams speaks for itself. The reduced garbage per compression run also
affects the deviation in a positive way.
Benchmark Mode Cnt Score Error Units
GelfMessageAssemblerPerf.compressPooled avgt 5 18164,796 ± 5717,793 ns/op
GelfMessageAssemblerPerf.compressUnpooled avgt 5 184431,045 ± 46292,939 ns/op
Find the code at Github: https://github.com/mp911de/reusing-gzip-streams