commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-291) decompress .7z archive very very slow
Date Wed, 20 Jan 2016 05:06:39 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108019#comment-15108019
] 

Stefan Bodewig commented on COMPRESS-291:
-----------------------------------------

It would be good to know, whether the issue is LZMA(2) or the way SevenZFile uses it - I'm
afraid it's the former. Commons Compress doesn't implement LZMA itself but uses the XZ for
Java libraray http://tukaani.org/xz/java.html . I'm not saying this to pass the blame but
rather to ensure people spend their energy where it is needed more. Lasse Collin, the author
of XZ for Java also is the author of its C cousin.

When Commons Compress uses Deflate (the GZIP code or when ZIP or 7z use Deflate) then {{java.util.zip.Deflater}}/{{Inflater}}
are at work which are JNI layers on top of zlib. This should better be close to the performance
of zlib :-)

Personally I've spent quite a bit of time in out bzip2 code - which is a close port of Julian
Seward's C library and can tell you that Java itself often is an obstacle for efficient compression.
The lack of unsigned types and the indirect memory access - including bounds checks - of byte[]s
produces very different code from what you can do in C. In the LZ77 family of compressors
you look for matching sequences of bytes. In C you simply cast the {{char*}} pointing to the
raw data to an {{int*}} and compare four bytes at once (sacrificing matches that are not aligned
at four byte boundaries). Using {{sun.misc.Unsafe}} is a frowned upon option that we've not
chosen so far. LZMA is most probably even worse since it works at the bit level rather than
the byte level.

Enough of my rambling. To answer Robert's question: I'm not aware of anybody actively looking
into it.

> decompress .7z archive very very slow
> -------------------------------------
>
>                 Key: COMPRESS-291
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-291
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.9
>         Environment: Windows 7 x64, jdk1.7.0_21 x64
>            Reporter: Robert Jansen
>            Priority: Minor
>
> I have 7z archives with one large image and many small files. The following code decompresses
to a directory and returns the largest file. It is glacially slow and not usable for GB size
files:
> public File unSevenZipToDir(File sevenZipFile, File outputDir) {
> 		
> 		File imgFile = null;
> 		// Make sure output dir exists
> 		outputDir.mkdirs();
> 		if (outputDir.exists()) {
> 			
> 			//FileInputStream stream;
> 			try {
> 			
> 				FileOutputStream output = null;
> 				SevenZFile f7z = new SevenZFile(sevenZipFile);
> 				SevenZArchiveEntry entry;
> 				long maxSize = 0;
> 				while ((entry = f7z.getNextEntry()) != null) {
> 					if (entry != null) {
> 						String s = entry.getName();
> 						if (s != null) {
> 							long sz = entry.getSize();
> 							
> 							if (sz > 0) {
> 								int count;
> 								byte data[] = new byte[4096];
> 								
> 								String outFileName = outputDir.getPath() + "/"
> 										+ new File(entry.getName()).getName(); 
> 																				
> 																				
> 																				
> 																				
> 								 
> 								File outFile = new File(outFileName);
> 								
> 								// Extract only if it does not already exist		
> 								if (outFile.exists() == false) {
> 									System.out.println("Extracting " + s + " => size = " + sz);
> 									
> 									
> 									
> 									FileOutputStream fos = new FileOutputStream(
> 											outFile);
> 											
> 									BufferedOutputStream dest = new BufferedOutputStream(
> 											fos);
> 	
> 									while ((count = f7z.read(data)) != -1) {
> 										dest.write(data, 0, count);
> 									}
>                                   
> 									dest.flush();
> 									dest.close(); 
> 								
> 								} else {
> 									System.out.println("Using already Extracted " + s + " => size = " + sz);
> 								}
> 								if (s.endsWith(".h5") || s.endsWith(".tif") || 
> 										s.endsWith(".cos") || s.endsWith(".nitf") 
> 										|| s.endsWith(".ntf")
> 										|| s.endsWith(".jpg") && sz > maxSize) {
> 									maxSize = sz;
> 									imgFile = new File(outFileName);
> 								}
> 							} // end sz > 0
> 						} // end s != null
> 					} // end if entry
> 				} // end while
> 				f7z.close();
> 			} catch (FileNotFoundException e) {
> 				// TODO Auto-generated catch block
> 				e.printStackTrace();
> 			} catch (IOException e) {
> 				// TODO Auto-generated catch block
> 				e.printStackTrace();
> 			}
> 		}
> 		return imgFile;
> 	}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message