commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominique De Munck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-381) performance issue when using default Wiki/docs bzip2 compression Factory methods
Date Fri, 03 Feb 2017 14:45:52 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851574#comment-15851574
] 

Dominique De Munck commented on COMPRESS-381:
---------------------------------------------

I got some time to figure this one out.
Following code works for 4GB+ files and any Java version. Using the BufferedOutputStream makes
a huge difference.
It turns a 1.4GB dbf file into a 63MB bzip2 file in 1min25sec on my portable, whereas the
default tutorial code needs about 4min41sec.
7zip needs about 3min for 80MB, Rar 40 seconds also for 80MB.

So my suggestion to add to the examples is the following code (or variant), this can make
a huge difference for users!

>>>
int COMPRESSION_LEVEL = 2;
int buffersize = 4000;

		FileInputStream fin = new FileInputStream(infile);
		FileOutputStream fos = new FileOutputStream(outfile);
		BufferedOutputStream bufferout = new BufferedOutputStream(fos, buffersize);
		BZip2CompressorOutputStream bzOut = new BZip2CompressorOutputStream(bufferout, COMPRESSION_LEVEL);
		try {
			final byte[] buffer = new byte[buffersize];
			int n = 0;
			while (-1 != (n = fin.read(buffer))) {
				bzOut.write(buffer, 0, n);
			}
		}
		finally {
			bzOut.close();
			fin.close();	
		}
>>>>

> performance issue when using default Wiki/docs bzip2 compression Factory methods
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-381
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-381
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 1.13
>         Environment: Windows/All
>            Reporter: Dominique De Munck
>            Priority: Minor
>              Labels: documentation, easyfix, performance
>
> Hello
> We are going to use this project's bzip2 implementation as it performed best for our
use case (tested using https://github.com/ning/jvm-compressor-benchmark).
> However, when following the default examples using the wiki/example/javadoc pages (*),
we were hitting a serious performance bottleneck.
> The reason: the default "compress" operation on a file which is suggested, is very slow,
maybe because of disk I/O and lack of caching.
> For a 2 MB tiff file, bzip2 compression takes about 3 seconds with code (A), whereas
code (B) takes only about 0.5 seconds!
> So it would be good to adapt documentation or take a look at bottle neck.
> Kind regards
> Dominique
> >>>
> FileInputStream fin = new FileInputStream(infile);
> BufferedInputStream bufferin = new BufferedInputStream(fin);
> final FileOutputStream outStream = new FileOutputStream(outfile);
> CompressorOutputStream cos = new CompressorStreamFactory()		         .createCompressorOutputStream(CompressorStreamFactory.BZIP2,
outStream);
> IOUtils.copy(fin, cos);
> cos.close();
> >>>
> B:
> <<<<<
> final byte[] uncompressed = Files.readAllBytes(infile.toPath());
> ByteArrayOutputStream rawOut = new ByteArrayOutputStream(uncompressed.length);
> 		
> BZip2CompressorOutputStream out = new BZip2CompressorOutputStream(rawOut, COMPRESSION_LEVEL);
> out.write(uncompressed);
> out.close();
> FileOutputStream fos = new FileOutputStream(outfile);
> rawOut.writeTo(fos);
> fos.close();
> >>>>
> (*)
> Pages with documentation:
> https://wiki.apache.org/commons/Compress
> https://commons.apache.org/proper/commons-compress/examples.html
> https://commons.apache.org/proper/commons-compress/javadocs/api-release/index.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message