commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Karich (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (COMPRESS-224) Cannot uncompress very large bzip2 files
Date Tue, 30 Apr 2013 07:58:16 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644645#comment-13644645
] 

Peter Karich edited comment on COMPRESS-224 at 4/30/13 7:56 AM:
----------------------------------------------------------------

Compared with bzip2 it looks like apache-compress works fine now (I'll report in a few days
if parsing the resulting xml is ok)! Can I set this as the default setting or will certain
bz2 files fail?

BTW: It looks like apache-compress is only 2 times slower than bzip2 which is quite good IMO
:) ! But do you know how one could improve this further? (Would save 3 hours on such big beasts
:))
                
      was (Author: peathal):
    Compared with bzip2 it looks like apache-compress works fine now (I'll report in a few
days if parsing the resulting xml is ok)! Can I set this as the default setting or will certain
bz2 files fail?

BTW: It looks like apache-compress is only 2.4 times slower than bzip2 which is quite good
IMO :) ! Or do you think there is room for improvement?
                  
> Cannot uncompress very large bzip2 files
> ----------------------------------------
>
>                 Key: COMPRESS-224
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-224
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.5
>         Environment: Java 1.7.0_03
>            Reporter: Peter Karich
>            Priority: Blocker
>
> When extracting big files like http://download.geofabrik.de/europe/germany/bayern-latest.osm.bz2
apache-compress works nicely. But when trying the same for e.g. http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/planet/planet-latest.osm.bz2
it stops without an error after exactly 900000 bits.
> I'm using the following code:
> {code:title=App.java|borderStyle=solid}
>  public static void main(String[] args) throws IOException {
>         if (args.length == 0)
>             throw new IllegalArgumentException("You need to specify the bz2 file!");
>         String fromFile = args[0];
>         if (!fromFile.endsWith(".bz2"))
>             throw new IllegalArgumentException("You need to specify a bz2 file! But was:"
+ fromFile);
>         String toFile = pruneFileEnd(fromFile);
>         FileInputStream in = new FileInputStream(fromFile);
>         FileOutputStream out = new FileOutputStream(toFile);
>         BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
>         try {
>             final byte[] buffer = new byte[1024 * 8];
>             int n = 0;
>             while (-1 != (n = bzIn.read(buffer))) {
>                 out.write(buffer, 0, n);
>             }
>         } finally {
>             out.close();
>             bzIn.close();
>         }
>     }
>     public static String pruneFileEnd(String file) {
>         int index = file.lastIndexOf(".");
>         if (index < 0)
>             return file;
>         return file.substring(0, index);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message