commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Commons Wiki] Update of "ExtractAndDecompressGzipFiles" by KenTanaka
Date Wed, 07 Nov 2007 19:57:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Commons Wiki" for change notification.

The following page has been changed by KenTanaka:
http://wiki.apache.org/jakarta-commons/ExtractAndDecompressGzipFiles

The comment on the change is:
Found a more direct method of extracting files and simplified the code example

------------------------------------------------------------------------------
  = Overview =
  Try using VFS to read the content of a compressed (gz) file inside of a tar file. Extract
tar file objects. If they are gzip files, decompress them. Any directory structure in the
tarfile is not being preserved, the contents are pulled out to the same location regardless
of directory hierarchy (for the purposes of this example, all objects in the tar file have
unique names, so there are no file name conflicts).
  
+ Use a two phase approach.
+  1. look at each of the files in the tar file
+  2. if it's a directory, recursively process it, otherwise
+     * if it's a non-gzipped file, extract it to a file
+     * if it's a gzipped file, decompress gzipped content to file
- Use a multiple step approach.
-  1. extract gzipped file from tar file
-  2. decompress gzipped content to a temporary directory
-  3. move decompressed content to desired destination
-  4. remove temporary directory
-  5. remove gzipped file
- 
- There should be a cleaner, more direct route. Maybe someone more familiar with VFS can post
better code.
  
  Conceptually there is a tar file:
  {{{
  archive.tar
   +- tardir/
       +- content.txt.gz
+      +- non-gzip.txt
  }}}
- I'd like to end up with an uncompressed file "content.txt". 
+ I'd like to end up with an uncompressed file "content.txt" and "non-gzip.txt". 
  
  = Sample data file =
  Create this sample {{{archive.tar}}} file with some (unix) commands along the lines of:
  {{{
  ls -l > context.txt
  gzip content.txt
+ ls -l > non-gzip.txt
  mkdir tardir
- mv content.txt.gz tardir
+ mv content.txt.gz non-gzip.txt tardir
  tar cvf archive.tar tardir
  rm -r tardir
  }}}
- The content of the {{{content.txt}}} file is just a directory listing, dump in anything
you want here.
+ The contents of the {{{content.txt}}} and {{{non-gzip.txt}}} files are just a directory
listings, dump in anything you want here.
- For this example the sample {{{archive.tar}}} is located in the {{{/extra/data/tryVfs}}}
directory. You can see that hardcoded in the java example below. The {{{content.txt}}} file
will be extracted into the same location.
+ For this example the sample {{{archive.tar}}} is located in the {{{/extra/data/tryVfs}}}
directory. You can see that hardcoded in the java example below. The {{{content.txt}}} and
{{{non-gzip.txt}}} files will be extracted into the same location.
  
  = pom.xml Project file =
  This example uses Maven2. There is a '''{{{pom.xml}}}''' to define the project
@@ -67, +66 @@

                      </descriptorRefs>
                      <archive>
                          <manifest>
-                             <mainClass>gov.noaa.eds.tryVfs.MultiStep</mainClass>
+                             <mainClass>gov.noaa.eds.tryVfs.ExtractFromGzipInTar</mainClass>
                          </manifest>
                      </archive>
                  </configuration>
@@ -91, +90 @@

  }}}
  
  = Source Code =
- Content of '''{{{src/main/java/gov/noaa/eds/tryVfs/MultiStep.java}}}'''
+ Content of '''{{{src/main/java/gov/noaa/eds/tryVfs/ExtractFromGzipInTar.java}}}'''
  {{{
  /*
-  * MultiStep.java
+  * ExtractFromGzipInTar.java
   */
  package gov.noaa.eds.tryVfs;
  
@@ -116, +115 @@

   * the purposes of this example, all objects in the tar file have unique names,
   * so there are no file name conflicts).
   *
-  * Use a multiple step approach.
-  * 1. extract gzipped file from tar file
-  * 2. decompress gzipped content to a temporary directory
-  * 3. move decompressed content to desired destination
-  * 4. remove temporary directory
-  * 5. remove gzipped file
-  *
-  * There should be a cleaner more direct route, but I haven't discovered it yet.
-  * 
-  * @author ktanaka
+  * @author Ken Tanaka
   */
- public class MultiStep {
+ public class ExtractFromGzipInTar 
+ {
      FileSystemManager fsManager = null;
      static String extractDirname = "/extra/data/tryVfs";
-     LocalFile extractDir = null;
      
      /**
       * Extract files from a tar file. If the file extracted is gzipped,
       * decompress it and remove the gzipped version.
       * @param args command line arguments are currently not used
       */
-     public static void main( String[] args ) {
+     public static void main( String[] args )
-         MultiStep msExtract = new MultiStep();
-         
+     {
+         ExtractFromGzipInTar extract = new ExtractFromGzipInTar();
-         try {
+         
+         try {
-             msExtract.fsManager = VFS.getManager();
+             extract.fsManager = VFS.getManager();
          } catch (FileSystemException ex) {
              throw new RuntimeException("failed to get fsManager from VFS", ex);
          }
          
-         try {
-             msExtract.extractDir = (LocalFile) msExtract.fsManager.resolveFile("file://"
-                     + extractDirname);
-             if (! msExtract.extractDir.exists()) {
-                 msExtract.extractDir.createFolder();
-             }
-         } catch (FileSystemException ex) {
-             throw new RuntimeException("failed to prepare extract directory " 
-                     + extractDirname, ex);
-         }
+         
+         /* Create a tarFile FileObject to connect to the tarfile on disk */
-         
-         
-         /* Create a tarFile object */
          FileObject tarFile;
          try {
+             String tarName = new String("tar:file://" + extractDirname + "/archive.tar");
-             System.out.println("Resolve tar file:");
+             System.out.println("Resolve " + tarName);
-             tarFile = msExtract.fsManager.resolveFile(
+             tarFile = extract.fsManager.resolveFile(tarName);
-                     "tar:/extra/data/tryVfs/archive.tar");
              
              FileName tarFileName = tarFile.getName();
              System.out.println("  Path     : " + tarFileName.getPath());
@@ -181, +161 @@

          }
          
          for (FileObject f : children) {
-             msExtract.processChild(f);
+             extract.processChild(f);
-         }
-         
+         }
      } // main( String[] args )
      
      private void processChild(FileObject f) {
@@ -196, +175 @@

                  }
              } else {
                  FileName fname = f.getName();
-                 String extractName = new String(this.extractDir.getName() + "/"
+                 String extractName = new String("file://" + extractDirname + "/"
                          + fname.getBaseName());
                  System.out.println("Extracting " + extractName);
                  LocalFile extractFile = (LocalFile) this.fsManager.resolveFile(extractName);
-                 extractFile.copyFrom(f, new AllFileSelector());
                  
                  // if the file is gzipped, decompress it
                  if (extractFile.getName().getExtension().equals("gz")) {
                      System.out.println("Decompressing " + extractName);
+                     
+                     // The uncompressed filename we seek
+                     // content.txt
+                     String fileName = extractFile.getName().getBaseName().replaceAll(".gz$",
"");
+                     
+                     // Build the direct path to the uncompressed content of the 
+                     // gzip file in the tar file.
+                     // gz:tar:file:///archive.tar!/tardir/content.txt.gz!content.txt
-                     String gzName = new String("gz://" + extractFile.getName().getPath());
+                     String gzName = new String("gz:" + fname.getURI() + "!" + fileName);
-                     System.out.println("gzName=" + gzName);
                      FileObject gzFile = this.fsManager.resolveFile(gzName);
-                     String fileName = extractFile.getName().getBaseName().replaceAll(".gz$",
"");
                      
                      // The decompressed path we want
-                     String decompName = new String(this.extractDir.getName() + "/" 
+                     String decompName = new String("file://" + extractDirname + "/" 
                              + fileName);
+                     LocalFile decompFile = (LocalFile) this.fsManager.resolveFile(decompName);
-                     
-                     // A temporary Directory
-                     String tmpDirname = new String(this.extractDir.getName() + "/" 
-                             + fileName + ".tmp");
-                     
-                     // A temporary file path
-                     String tmpFilename = new String(tmpDirname + "/" + fileName);
                      
                      // Some debug lines
                      System.out.println("fileName   =" + fileName);
                      System.out.println("decompName =" + decompName);
-                     System.out.println("tmpDirname =" + tmpDirname);
+                     System.out.println("gzName=" + gzName);
-                     System.out.println("tmpFilename=" + tmpFilename);
-                     
-                     // Extracting from gzip file ends up with a directory containing what
-                     // we want.
+                     
-                     LocalFile tmpDir = (LocalFile) this.fsManager.resolveFile(tmpDirname);
+                     // Extracting
-                     tmpDir.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
+                     decompFile.copyFrom(gzFile, new FileTypeSelector(FileType.FILE));
-                     
+                 } else {
+                     // just extract the non-gzip file
-                     // Move the uncompressed file to the location desired.
-                     LocalFile tmpFile = (LocalFile) this.fsManager.resolveFile(tmpFilename);
-                     LocalFile decompFile = (LocalFile) this.fsManager.resolveFile(decompName);
-                     tmpFile.moveTo(decompFile);
-                     
-                     // Delete the temporary directory.
-                     tmpDir.delete(new AllFileSelector());
-                     
-                     // Delete the gzip file now that we have the uncompressed version.
-                     // Note that the plain file FileObject (extractFile) is used 
-                     // for deleting instead of the gzip FileObject (gzFile).
-                     extractFile.delete(new AllFileSelector());
+                     extractFile.copyFrom(f, new AllFileSelector());
                  }
              }
          } catch (FileSystemException ex) {
@@ -269, +234 @@

  
  == Sample Output ==
  {{{
- Nov 6, 2007 2:38:56 PM org.apache.commons.vfs.VfsLog info
+ Nov 7, 2007 12:22:01 PM org.apache.commons.vfs.VfsLog info
  INFO: Using "/tmp/vfs_cache" as temporary files store.
- Resolve tar file:
+ Resolve tar:file:///extra/data/tryVfs/archive.tar
    Path     : /
    URI      : tar:file:///extra/data/tryVfs/archive.tar!/
+ Extracting file:///extra/data/tryVfs/non-gzip.txt
  Extracting file:///extra/data/tryVfs/content.txt.gz
  Decompressing file:///extra/data/tryVfs/content.txt.gz
- gzName=gz:///extra/data/tryVfs/content.txt.gz
  fileName   =content.txt
  decompName =file:///extra/data/tryVfs/content.txt
+ gzName=gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt
- tmpDirname =file:///extra/data/tryVfs/content.txt.tmp
- tmpFilename=file:///extra/data/tryVfs/content.txt.tmp/content.txt
  }}}
- In addition to the {{{archive.tar}}} file, there should now be a {{{content.txt}}} file
in the same location.
+ In addition to the {{{archive.tar}}} file, there should now be {{{content.txt}}} and {{{non-gzip.txt}}}
files in the same location.
  

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message