commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédérik Bilhaut (JIRA) <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-325) Unable to uncompress bzip2 dbPedia files
Date Tue, 13 Oct 2015 06:49:05 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954501#comment-14954501
] 

Frédérik Bilhaut commented on COMPRESS-325:
-------------------------------------------

OK it works that way !

Thank you very much, and sorry for the irrelevant ticket ! My plain fault, I should have better
read the doc.

However, it may be not possible to know in advance if a given bzip is concatenated, and I
suppose that, by default, one expects to get the full content of the compressed file as will
happen with most tools/apis I know. So:

- Either there is no circumstance where having this parameter set to {{true}} is inappropriate.
In this case why not making it {{true}} by default ? 

- Either setting it to {{true}} may be harmful in some circumstances, but in this case there
should be a test to detect the fact that the stream is concatenated ?

I understand the backward compatibility concerns, but I think that changing the default behavior
would make it more consistent with the general InputStream contract where there is only one
EOF. Just my cents, and maybe there are some other problems I don't have in mind...

Anyway thanks again for your help !!

> Unable to uncompress bzip2 dbPedia files
> ----------------------------------------
>
>                 Key: COMPRESS-325
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-325
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Frédérik Bilhaut
>
> Sample code :
> {code:java}
> URL url = new URL("http://downloads.dbpedia.org/current/core-i18n/en/labels_en.nt.bz2");
> InputStream input = new BZip2CompressorInputStream(url.openConnection().getInputStream());
> BufferedReader reader = new BufferedReader(new InputStreamReader(input, "US-ASCII"));
> 			
> int count = 0;
> for(String line = reader.readLine(); line != null; line = reader.readLine()) {
> 	if(++count > 10000) break;
> 	else System.out.println(count + ": " + line);
> }
> {code}
> It stops at line 7801 (EOF) :
> {code}
> 7799: <http://dbpedia.org/resource/Gamemaster> <http://www.w3.org/2000/01/rdf-schema#label>
"Gamemaster"@en .
> 7800: <http://dbpedia.org/resource/Genetic_engineering> <http://www.w3.org/2000/01/rdf-schema#label>
"Genetic engineering"@en .
> 7801: <http://dbpedia.org/resource/Gradius_(video_game)> <http://www.w3.org/2000/01/rdf-s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message