commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-291) decompress .7z archive very very slow
Date Fri, 15 Jan 2016 07:35:39 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101375#comment-15101375
] 

Dawid Weiss commented on COMPRESS-291:
--------------------------------------

No problem at all, Stefan. I dug into the code, it's actually a lot better at explaining what's
going on in the format than the "official" specification is ({{7zFormat.txt}})...

bq. then "almost random access" to single entries should be possible

Yes, you'd basically have to decode "a bit more" if the required encoded file is nested somewhere
inside a compressed block. This is not an uncommon thing -- "solid" archives in RAR have this
property too. The gain is for lots of small (or very similar) files when the compression dictionary
of the encoder is reused for multiple files.

Like I said, I'll try to fix it for our own purposes -- I'll provide a patch if I get it working.



> decompress .7z archive very very slow
> -------------------------------------
>
>                 Key: COMPRESS-291
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-291
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.9
>         Environment: Windows 7 x64, jdk1.7.0_21 x64
>            Reporter: Robert Jansen
>            Priority: Minor
>
> I have 7z archives with one large image and many small files. The following code decompresses
to a directory and returns the largest file. It is glacially slow and not usable for GB size
files:
> public File unSevenZipToDir(File sevenZipFile, File outputDir) {
> 		
> 		File imgFile = null;
> 		// Make sure output dir exists
> 		outputDir.mkdirs();
> 		if (outputDir.exists()) {
> 			
> 			//FileInputStream stream;
> 			try {
> 			
> 				FileOutputStream output = null;
> 				SevenZFile f7z = new SevenZFile(sevenZipFile);
> 				SevenZArchiveEntry entry;
> 				long maxSize = 0;
> 				while ((entry = f7z.getNextEntry()) != null) {
> 					if (entry != null) {
> 						String s = entry.getName();
> 						if (s != null) {
> 							long sz = entry.getSize();
> 							
> 							if (sz > 0) {
> 								int count;
> 								byte data[] = new byte[4096];
> 								
> 								String outFileName = outputDir.getPath() + "/"
> 										+ new File(entry.getName()).getName(); 
> 																				
> 																				
> 																				
> 																				
> 								 
> 								File outFile = new File(outFileName);
> 								
> 								// Extract only if it does not already exist		
> 								if (outFile.exists() == false) {
> 									System.out.println("Extracting " + s + " => size = " + sz);
> 									
> 									
> 									
> 									FileOutputStream fos = new FileOutputStream(
> 											outFile);
> 											
> 									BufferedOutputStream dest = new BufferedOutputStream(
> 											fos);
> 	
> 									while ((count = f7z.read(data)) != -1) {
> 										dest.write(data, 0, count);
> 									}
>                                   
> 									dest.flush();
> 									dest.close(); 
> 								
> 								} else {
> 									System.out.println("Using already Extracted " + s + " => size = " + sz);
> 								}
> 								if (s.endsWith(".h5") || s.endsWith(".tif") || 
> 										s.endsWith(".cos") || s.endsWith(".nitf") 
> 										|| s.endsWith(".ntf")
> 										|| s.endsWith(".jpg") && sz > maxSize) {
> 									maxSize = sz;
> 									imgFile = new File(outFileName);
> 								}
> 							} // end sz > 0
> 						} // end s != null
> 					} // end if entry
> 				} // end while
> 				f7z.close();
> 			} catch (FileNotFoundException e) {
> 				// TODO Auto-generated catch block
> 				e.printStackTrace();
> 			} catch (IOException e) {
> 				// TODO Auto-generated catch block
> 				e.printStackTrace();
> 			}
> 		}
> 		return imgFile;
> 	}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message