commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <>
Subject [jira] [Commented] (COMPRESS-234) Patch: TAR InputStream Huge Speed Improvements
Date Sun, 21 Jul 2013 08:00:52 GMT


Stefan Bodewig commented on COMPRESS-234:

First of all, many thanks for the work you are putting into Compress.

I'm a bit reluctant about explictly adding a Buffered*Stream ourselves:

* we tell people to use Buffered*Stream as Compress wouldn't do so:
- this means we end up layering two buffers on top of each other
* the decision to not use Buffered*Streams has been deliberate to grant more control to the
user.  If I have a stream containing two subsequent TAR archives for whatever reason, will
the stream be in a state that I can read the second archive cleanly after the first one is
done or will (the now discarded) BufferedInputStream have read more bytes than it needed and
leave the inner stream pointing ahead of the start of the second archive?

That being said, I'm sure we could remove TarBuffer the way you've done.  In the output case
I don't think we actually need the explicit BufferedOutputStream at all and I haven't looked
close enough at the input case to see whether it actually requires it code-wise.

I'd be interested in your benchmark code, have you tried the original code when you wrap the
original stream in a Buffered*Stream from outside of Compress' code base?

Finally Re: skip - I think FindBugs detects cases where you don't use the return value of
skip and it found a few places in the skip package where we implemented something similar
to the skipFully method you've added.  It may be worth to look through the other packages
to see whether they could benefit from the new method - not in the scope of this ticket, of

> Patch: TAR InputStream Huge Speed Improvements
> ----------------------------------------------
>                 Key: COMPRESS-234
>                 URL:
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>            Reporter: BELUGA BEHR
>         Attachments: Archiver_Tar.patch,,
> I have looked over TarBuffer And TarArchiveInputStream and found some ways to improve
performance orders of magnitude.
> I used a 1 GB TAR archive file (no compression).
> Times for reading all entry file names:
> Current - 630ms
> Mine - 17ms
> Times for extracting all entry files:
> Current 2446ms
> Mine - 2214ms
> As you can see, I have enhanced the "skip" methods greatly.  Actual extraction was within
a margin of error and the timings bounces around a lot.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message