commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominique De Munck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-381) performance issue when using default Wiki/docs bzip2 compression Factory methods
Date Sat, 04 Feb 2017 15:37:51 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852829#comment-15852829
] 

Dominique De Munck commented on COMPRESS-381:
---------------------------------------------

I'm not sure wich 'additional note' you are talking about, is it this one?

"When (de)compressing smaller files you may even benefit from reading the whole file to uncompress
into memory before decompressing it or compressing to a ByteArrayOutputStream so all operations
happen in memory. "

My last finding did not show any speed difference between using the Buffer and reading the
file entirely in memory, it was between the factory method and the buffered method, regardless
of size. 
Maybe add here some timing info for example of my tests.

I would add the fast "compress" examples using  BufferedOutputStream  also, because that's
the Documentation that 90% of users will take a look at in the first place and it is not that
trivial to modify the decompress exampes to compress.
-> https://commons.apache.org/proper/commons-compress/examples.html

And then in the Wiki also warn that the given General Factory examples are, at least for bzip2,
about 3x slower dan buffered stream compress/decompress
https://wiki.apache.org/commons/Compress

> performance issue when using default Wiki/docs bzip2 compression Factory methods
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-381
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-381
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.13
>         Environment: Windows/All
>            Reporter: Dominique De Munck
>            Priority: Minor
>              Labels: documentation, easyfix, performance
>
> Hello
> We are going to use this project's bzip2 implementation as it performed best for our
use case (tested using https://github.com/ning/jvm-compressor-benchmark).
> However, when following the default examples using the wiki/example/javadoc pages (*),
we were hitting a serious performance bottleneck.
> The reason: the default "compress" operation on a file which is suggested, is very slow,
maybe because of disk I/O and lack of caching.
> For a 2 MB tiff file, bzip2 compression takes about 3 seconds with code (A), whereas
code (B) takes only about 0.5 seconds!
> So it would be good to adapt documentation or take a look at bottle neck.
> Kind regards
> Dominique
> >>>
> FileInputStream fin = new FileInputStream(infile);
> BufferedInputStream bufferin = new BufferedInputStream(fin);
> final FileOutputStream outStream = new FileOutputStream(outfile);
> CompressorOutputStream cos = new CompressorStreamFactory()		         .createCompressorOutputStream(CompressorStreamFactory.BZIP2,
outStream);
> IOUtils.copy(fin, cos);
> cos.close();
> >>>
> B:
> <<<<<
> final byte[] uncompressed = Files.readAllBytes(infile.toPath());
> ByteArrayOutputStream rawOut = new ByteArrayOutputStream(uncompressed.length);
> 		
> BZip2CompressorOutputStream out = new BZip2CompressorOutputStream(rawOut, COMPRESSION_LEVEL);
> out.write(uncompressed);
> out.close();
> FileOutputStream fos = new FileOutputStream(outfile);
> rawOut.writeTo(fos);
> fos.close();
> >>>>
> (*)
> Pages with documentation:
> https://wiki.apache.org/commons/Compress
> https://commons.apache.org/proper/commons-compress/examples.html
> https://commons.apache.org/proper/commons-compress/javadocs/api-release/index.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message