hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Broberg <tbrob...@yahoo.com>
Subject Compressor setInput input permanence
Date Sat, 03 Dec 2011 08:18:42 GMT
The question is, how long can a Compressor count on the user buffer to stick around after a
call to setInput()?
 
The Compressor object has a method, setInput whose inputs are an array reference, an offset
and a length.
 
I would expect that this input would no longer be guaranteed to persist after the setInput
call returns.
 
...but in ZlibCompressor and SnappyCompressor, when there is no buffer room for len bytes,
the Compressor makes a copy of the reference, offset, and length, clears the needsInput condition,
and returns waiting for a call to compress() to unload the buffers through the compressor.
The Compressor implementations count on the data to persist after setInput returns until compress()
is called.
 
So, the data persist after the call. Does all such data persist?
 
In theory, could a Compressor avoid a copy by just collecting references to each input user
buffer passed in and then sending all these references to the compression library when compress()
is called?
 
...or do these user buffers get reused before that time?
 
By keeping references to these buffers, am I preventing them from getting garbage collected
and potentially soaking up large amounts of memory?
 
Where is the persistence of the contents of these user buffers supposed to be documented?
 
TIA,
    - Tim.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message