commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niall Pemberton" <>
Subject Re: RereadableInputStream
Date Sun, 14 Oct 2007 02:35:27 GMT
On 10/11/07, Keith R. Bennett <> wrote:
> Hello, all.  I am working with the Apache Tika project.  We found the need to
> get a newly opened input stream from the user, and possibly read it multiple
> times.  I am aware of the mark and release methods, but we needed to support
> streams of arbitrary length, so I thought we'd have to figure something else
> out.

I don't see anything in the javadocs for the mark/reset methods in
InputStream that prevent it from being used for streams of arbitrary
length. Is this an assumption based on the fact that the mark method
specifies a "readLimit" parameter? In the reset method javadocs it
only says an IOException "Might" be thrown if the readLimit has been
exceeded - so my take is that it would not be inconsistent to create
an implementation that ignores that parameter. Better IMO to use these
than invent a new "rewind" method.

> I created a class, and I'd like your feedback on it.  If you'd like to
> include it, or something based on it, in a future version of your project,
> feel free.  Or, if it's a bad idea, or you can suggest modifications or a
> totally different approach that would fulfill the need more wisely, please
> let me know.

I have a couple of comments. Firstly although the InputStream byte
array read methods delegate to the single byte read method - not all
implementations do (FileInputStream doesn't appear to) and so
RereadableInputStream funnelling all reads (and writes) through the
single byte read method could limit this impl. from taking advantage
of any performance benefits that the streams it delegates to might
have when processing an array of bytes. So my suggestion would be to
also implement the read(byte[], offset, length) as well.

Secondly, from a Commons IO perspective, we already have some of the
functionality for some parts of what you're trying to achieve:

1) DeferredFileOutputStream (see - writes
to a byte array until a threshold is reached and then switches to a
file. Doesn't currently support temporary files, but could be easily
(IMO) enhanced to do so (see

2) TeeInputStream (see - as it reads an
InputStream it also writes to an OutputStream

So to achieve something like the functionality in your "first pass",
you could do something like

File tempFile = new File("tikka.tmp");
DeferredFileOutputStream deferred = new DeferredFileOutputStream(1024,
InputStream currentInput = new TeeInputStream(origInput, deferred);

After the streams been processed thru' once - then you could switch
the current input stream:
if (deferred.isInMemory()) {
    currentInput = new ByteArrayInputStream(deferred.getData());
} else {
    currentInput = new RereadableFileInputStream(deferred.getFile());

RereadableFileInputStream doesn't yet exist, but a proxy that supports
mark/reset and closes/re-creates an underlying FileInputStream on
reset. AIUI ByteArrayInputStream already supports mark/reset - so
whereever the stream is cached it could use the standard mark/reset to

The main advantage of this is that if the different pieces of
functionality that make up your RereadableInputStream are broken down
in to smaller/simpler components it makes it much easier to test those
indivdually and then compose them together to create the more complex
behaviour you require.


> It's called RereadableInputStream.  It saves the bytes read from the
> original stream in a byte [], until a user-specified threshold is reached,
> then it moves the buffer to a temporary file.
> I'm attaching the file and a basic unit test class to this message.  This
> version is newer than the one currently in Tika's subversion repository.
> For reasons that I won't bore you with, this version is not yet committed.
> Thanks for any help you can offer.
> Regards,
> Keith Bennett
> --
> View this message in context:
> Sent from the Commons - Dev mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message