This post explains the use of a pooled GZIP OutputStream. The initial motivation were performance measurements of the GZIP'ing inside of logstash-gelf.
Using new instances of
GZIPOutputStream is costly and allocates a bunch of objects (byte buffer,
Deflater) and more.
GZIPOutputStream differs from the JDK-provided class in some points:
- Do not write the GZIP header upon instance creation but expose a
- Expose a
reset()method to reset the deflater state
- Do not close the stream to keep it reusable
A design pattern of Java streams is that the target requires being set upon construction. That’s a limitation
for reusing but can be mitigated by constructing the pooled
OutputStream instance with a custom
that allows you to switch the target.
Running a benchmark with the unpooled and pooled streams speaks for itself. The reduced garbage per compression run also affects the deviation in a positive way.
Benchmark Mode Cnt Score Error Units GelfMessageAssemblerPerf.compressPooled avgt 5 18164,796 ± 5717,793 ns/op GelfMessageAssemblerPerf.compressUnpooled avgt 5 184431,045 ± 46292,939 ns/op
Find the code at Github: https://github.com/mp911de/reusing-gzip-streams