commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Torsten Curdt (JIRA)" <>
Subject [jira] Commented: (SANDBOX-168) TAR extraction fails with FileNotFoundException (directories not being created)
Date Wed, 07 Jan 2009 13:56:44 GMT


Torsten Curdt commented on SANDBOX-168:

I guess this is probably obsolete?

> TAR extraction fails with FileNotFoundException (directories not being created)
> -------------------------------------------------------------------------------
>                 Key: SANDBOX-168
>                 URL:
>             Project: Commons Sandbox
>          Issue Type: Bug
>          Components: Compress
>    Affects Versions: Nightly Builds
>         Environment: Probably irrelevant, but am using JDK 1.5.0_07 on a win xp sp2 box.
>            Reporter: Sam Smith
> --------------------------------------------------
> Summary
> --------------------------------------------------
> I am able to create TAR archive files using the org.apache.commons.compress code, however,
when I go to extract the contents of TAR archive using that same code, it fails.
> I think that there must be a bug with org.apache.commons.compress because can use the
program 7-zip to successfully extract the contents of the archive.
> --------------------------------------------------
> Background
> --------------------------------------------------
> I need Java TAR support for archiving purposes; see this forum thread if you want to
know why:
> The library
> proved inadequate because it does not support long paths reliably (the GNU TAR extensions
are essential).
> So, I am turning to this apache code, which does handle long paths and seems to be actively
> --------------------------------------------------
> Details of how the TAR archive was created
> --------------------------------------------------
> Because there appears to be no stable release for the org.apache.commons.compress code,
I just grabbed the latest nightly build, commons-compress-20060814.  MAYBE THIS IS THE PROBLEM:
if this is a known bad build and there is a better one, by all means please let me know and
what build to use.  Also, somehow this info should be put as a comment for each nightly build.
> Assuming that the above is not the case, and that this is a new bug, here is how I stumbled
across it.
> First, I construct a new TAR archive with code that ultimately boils down to this:
> 		String path = fileParent.getRelativePath(file);	// Note: getRelativePath will ensure
that directories end with a separator
> 		if (File.separatorChar != '/') path = path.replace(File.separatorChar, '/');	// CRITICAL:
handles bizarre systems like windoze which use other chars than / for directory separation;
the TAR format requires / to be used
> 		TarEntry entry = new TarEntry( file );
> 		entry.setName( path );
> 		out.putNextEntry( entry );
> 		writeFileData(file, out);
> 		out.closeEntry();
> 		if ( file.isDirectory() ) {
> 			for (File fileChild : DirUtil.getContents(file, null)) {	// supply null, since we
test at beginning of this method (supplying filter here which just add a redundant test)
> 				archive( fileChild, fileParent, out, filter );
> 			}
> 		}
> Note that FileParent is my own class that I originally wrote for a ZIP archiver.  This
class keeps track of the root directory that is being TARed because I want all of my paths
to be stored as relative offsets from this root; I do NOT want any path elements above that
root directory to be included.  The apache TarEntry class appears to me to include a lot of
extraneous path elements (albeit it will strip off drive letters or an initial '/' char).
> In addition to controlling the paths, I also need to use low level classes like TarOutputStream
to force the use of GNU long paths via a call like
> 	tarOutputStream.setLongFileMode(TarOutputStream.LONGFILE_GNU);
> If I were to use the high level Archiver functionality that you document here
> (for ZIPs) or
> (for TARs), then I would have no such control over relative paths or GNU TAR extensions.
 There is also an efficient file filtering technique that I do that would not be supported
if used an Archiver.
> --------------------------------------------------
> Error when extracting the TAR archive with org.apache.commons.compress
> --------------------------------------------------
> I think that the archive produced by the above code is legitimate, because I can successfully
extract it using the program 7-zip.  As proof, I have a program called DirectoryComparer which
compares 2 directories, notes any paths which are not in common, and for common paths examines
every normal file byte-for-byte to find any discrepancies.  Running that program on the original
directory and the archived/extracted one found zero differences.
> But, when I tried extracting the archive using the org.apache.commons.compress code,
I got the following error:
> Exception in thread "main" org.apache.commons.compress.UnpackException: Exception while
>         at org.apache.commons.compress.archivers.tar.TarArchive.doUnpack(
>         at org.apache.commons.compress.AbstractArchive.unpack(
>         at
>         at$Test.test_archive_extract_pathLengthLimit(
>         at$Test.main(
> Caused by: F:\longPaths\2B6vLVrp4c (The system cannot
find the path specified)
>         at Method)
>         at<init>(
>         at<init>(
>         at org.apache.commons.compress.archivers.tar.TarArchive.doUnpack(
>         ... 4 more
> --------------------------------------------------
> Details of how the TAR archive was extracted
> --------------------------------------------------
> The code that I used to do the extraction is
> 		TarArchive archive = null;
> 		try {
> 			Archive archiver = ArchiverFactory.getInstance(tarFile);
> 			archiver.unpack(directoryToExtractInto);
> 		}
> 		finally {
> 			close(archive);
> 		}
> Here, unlike archiving, I went ahead and used the convenient Archiver functionality because
no low level control was needed.
> Also, the original target directory being archived is named longPaths and, as its name
indicates, it has all kinds of super long path elements inside it.  (I wrote a program to
auto generate really long subdirectory structures like this for torture testing my archiving
> --------------------------------------------------
> Where the bug lies
> --------------------------------------------------
> I say this because there is a normal file left on my filesystem after doing the above
that is named longPaths.  But longPaths should be a directory; since it was actually miscreated
by the apache code as a file, then of course the subdirectory
> 	longPaths\2B6vLVrp4c
> cannot be created as reported by the stacktrace above.
> Again, let me mention that 7-zip did sucessfully completely extract the complicated contents
of longPaths, correctly recreating all of the subdirectories etc, so I do not suspect that
my code for creating the TAR archive is wrong.
> Furthermore, when I tried abandoning the above TAR creation code and used your Archiver
technique with code like
> 	Archive archiver = ArchiverFactory.getInstance("tar");
> 	for (File file : files) {
> 		archive(file, archiver, filter);
> 	}
> 		// this is the relevant code snippet from the archive method:
> 	archiver.add( file );
> 	if ( file.isDirectory() ) {
> 		for (File fileChild : DirUtil.getContents(file, null)) {
> 			archive( fileChild, archiver, filter );
> 		}
> 	}
> then I still get an error:
> Exception in thread "main" Z:\longPaths (Access is denied)
>         at Method)
>         at<init>(
>         at org.apache.commons.compress.AbstractArchive.add(
>         at
>         at
>         at$Test.test_archive_extract_pathLengthLimit(
>         at$Test.main(
> --------------------------------------------------
> Misc issues
> --------------------------------------------------
> 1) I am sorry if this is a known issue that has been beaten to death on the mailing list.
 But I am a newcomer, and I was unable to figure out how to search the mailing list archives!
> Clicking on the "Search the mailing list archive" link on
> brought me to
> which only seems to offer manual browsing, which is a tedious and inefficient way to
find issues with the compress code, especially as the mailing list seems to discuss every
commons project.
> Is there a better way?
> 2) there seem to be redundant TAR packages:
> 	older one?:
> 	newer one?:
> Which one am I supposed to use?
> 3) GNU tar apparently supports unlimited path lengths, but what about file sizes?  Traditional
TAR only support files up to 8 GB in size.  Does the org.apache.commons.compress TAR code
have any file size limits?  Please add documentation about this.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message