commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [commons-io] TeeInputStream that ignores skip/reset?
Date Thu, 17 Dec 2015 11:57:36 GMT
Right, that's the use case.  In Tika, we have no control over what our dependencies are doing
to the stream.  

The current implementation does a mark/reset for digesting then parsing... up to a certain
limit, after which we cache to disk and then digest then parse the tmp file separately.  

The downside to this (TIKA-1701) is that for truncated zip/package files, the digester reads
to the end of the stream for an embedded file and hits the zip exception and then the parser
fails to extract the contents of as many files as it would have if it had just been parsing
the file without the digester.

If skip/reset don't make any sense for a DigestingInputStream generally, I'll keep our modified
TeeInputStream over in Tika land.

If there are other recommendations for handling this, let me know.

Thank you!

Best,

          Tim

-----Original Message-----
From: sebb [mailto:sebbaz@gmail.com] 
Sent: Wednesday, December 16, 2015 1:07 PM
To: Commons Users List <user@commons.apache.org>
Subject: Re: [commons-io] TeeInputStream that ignores skip/reset?

I'm not sure what the use case for this is, apart from avoiding the bug in DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B. <tallison@mitre.org> wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset (unlike Java's
v <= 1.8 [0]).  Before we reinvent the wheel, is there an InputStream similar to TeeInputStream
that ignores skip/reset, so that the Digester would only see the stream as if it were read
sequentially without skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to commons-io as
an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this?  Thank
you!
>
>          Best,
>
>                  Tim
>
> [0] 
> http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CD
> M2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.ou
> tlook.com%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Mime
View raw message