commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frédérik Bilhaut (JIRA) <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-325) Unable to uncompress bzip2 dbPedia files
Date Mon, 12 Oct 2015 12:33:05 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953059#comment-14953059
] 

Frédérik Bilhaut commented on COMPRESS-325:
-------------------------------------------

Actually, re-compressing the same file with another tool ("bzip2" command under macos) renders
a file that's properly readable with BZip2CompressorInputStream.

> Unable to uncompress bzip2 dbPedia files
> ----------------------------------------
>
>                 Key: COMPRESS-325
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-325
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.10
>            Reporter: Frédérik Bilhaut
>
> Sample code :
> {code:java}
> URL url = new URL("http://downloads.dbpedia.org/current/core-i18n/en/labels_en.nt.bz2");
> InputStream input = new BZip2CompressorInputStream(url.openConnection().getInputStream());
> BufferedReader reader = new BufferedReader(new InputStreamReader(input, "US-ASCII"));
> 			
> int count = 0;
> for(String line = reader.readLine(); line != null; line = reader.readLine()) {
> 	if(++count > 10000) break;
> 	else System.out.println(count + ": " + line);
> }
> {code}
> It stops at line 7801 (EOF) :
> {code}
> 7799: <http://dbpedia.org/resource/Gamemaster> <http://www.w3.org/2000/01/rdf-schema#label>
"Gamemaster"@en .
> 7800: <http://dbpedia.org/resource/Genetic_engineering> <http://www.w3.org/2000/01/rdf-schema#label>
"Genetic engineering"@en .
> 7801: <http://dbpedia.org/resource/Gradius_(video_game)> <http://www.w3.org/2000/01/rdf-s
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message