commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <>
Subject [jira] [Commented] (COMPRESS-234) Patch: TAR InputStream Huge Speed Improvements
Date Thu, 01 Aug 2013 18:13:50 GMT


Stefan Bodewig commented on COMPRESS-234:

Sorry, my net-time is currently a bit flaky.

I've run the BigFilesIT to get an idea what amount of improvement we are talking about.  Initially
the test didn't use any external buffering, so I added it and ran a few tests.  On my machine
I get the following runtimes

current code, no buffering
readFileHeadersOfArchiveBiggerThan8GByte: 26.4s
readFileBiggerThan8GBytePosix: 24.0s

current code, external buffering
readFileHeadersOfArchiveBiggerThan8GByte: 23.6s
readFileBiggerThan8GBytePosix: 23.9s

"latest" patch, no buffering
readFileHeadersOfArchiveBiggerThan8GByte: 22.7s
readFileBiggerThan8GBytePosix: 21.8s

"latest" patch, external buffering
readFileHeadersOfArchiveBiggerThan8GByte: 22.6s
readFileBiggerThan8GBytePosix: 21.4s

I was surprised to see that even the tests that actually read code win by the patch - the
readFileHeaders test hits skip internally a lot.  In either case I ~10% gain for cases where
there is no external buffer and still ~5% with an external BufferedInputStream.  With a bit
of fiddling to make the "end of archive" case what it used to be, I really think this is worth
dropping TarBuffer and intend to make it work.

As for NullPointerExceptions - I'm sure they'd be worth a bug report nevertheless :-)
> Patch: TAR InputStream Huge Speed Improvements
> ----------------------------------------------
>                 Key: COMPRESS-234
>                 URL:
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>            Reporter: BELUGA BEHR
>         Attachments: Archiver_Tar.2.patch, Archiver_Tar.3.patch, Archiver_Tar.patch,,,
> I have looked over TarBuffer And TarArchiveInputStream and found some ways to improve
performance orders of magnitude.
> I used a 1 GB TAR archive file (no compression).
> Times for reading all entry file names:
> Current - 630ms
> Mine - 17ms
> Times for extracting all entry files:
> Current 2446ms
> Mine - 2214ms
> As you can see, I have enhanced the "skip" methods greatly.  Actual extraction was within
a margin of error and the timings bounces around a lot.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message