commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Inspico (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-284) Multi Thread Uncompress TGZ - CRC32 ERROR
Date Mon, 16 Jun 2014 09:49:02 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032274#comment-14032274
] 

Inspico commented on COMPRESS-284:
----------------------------------

We use a common code to uncompress archive given to our EAI.
Some filepolling search the archive and launch thread to extract files.
The archive is used by only one thread.

If each archive have their own thread, it's the same JVM for all thread.

Here is the code we use :

	public static Vector<String> extractFilesFromTarGzData(InputStream archiveData, String
extractFolder, String fileNamePattern) throws Exception { 

		BufferedOutputStream bout = null;
		TarArchiveInputStream tarIn = null;
		Vector<String> entryVector = new Vector<String>();
		byte [] btoRead = null;
		
		
		try {
			String sPatternRegex = "^("+fileNamePattern.replace(',', '|')+").*";
			tarIn = new TarArchiveInputStream(new GzipCompressorInputStream(new BufferedInputStream(archiveData)));
			TarArchiveEntry tarEntry = tarIn.getNextTarEntry();

			while (tarEntry != null) {
				String entryName = tarEntry.getName();
				entryName = entryName.substring(entryName.lastIndexOf("/")+1);
				if (entryName.matches(sPatternRegex)) {
					btoRead = new byte[1024];
					bout = new BufferedOutputStream(new FileOutputStream(extractFolder+entryName));
					int len = 0;

					while((len = tarIn.read(btoRead, 0, 1024)) != -1) {
						bout.write(btoRead,0,len);
					}
					
					bout.flush();
					bout.close();
					btoRead = null;
					entryVector.add(entryName);
				}
				tarEntry = tarIn.getNextTarEntry();
			}
			tarIn.close();

			return entryVector;
		} catch (Exception e) {
			throw new Exception ("Error while extracting list of files from Archive : " + e.toString());
		} finally {
			if (tarIn != null) tarIn.close();
			if (bout != null) bout.close();
		}
	}


> Multi Thread Uncompress TGZ - CRC32 ERROR
> -----------------------------------------
>
>                 Key: COMPRESS-284
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-284
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.8.1
>         Environment: Linux
>            Reporter: Inspico
>
> We have to uncompress ".tar.gz".
> So we use an "TarArchiveInputStream(GzipCompressorInputStream)".
> An archive extracted alone works perfectly.
> But when we have to launch paralleles thread to extract many archives at the same time
we get the same error for each thread :
> java.lang.Exception: Error while extracting list of files from Archive : java.io.IOException:
Gzip-compressed data is corrupt (CRC32 error)
> Sometimes we may have a success for only one archive among all errors.
> Is there any problems on the use of TarInputStream or GZIPInputStream in multi-thread
?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message