hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Zaliva <kroko...@gmail.com>
Subject broken gzip file
Date Tue, 29 Jan 2008 18:33:40 GMT
I have a bunch of gzip files which I am trying to process with Hadoop  
task. The task fails with exception:
java.io.EOFException: Unexpected end of ZLIB input stream at  
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:223)  
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java: 
141) at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at  
org.apache.hadoop.io.compress.GzipCodec 
$GzipInputStream.read(GzipCodec.java:124) at  
java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at  
java.io.BufferedInputStream.read(BufferedInputStream.java:237) at  
org 
.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: 
136) at  
org 
.apache.hadoop.mapred.LineRecordReader.readLine(LineRecordReader.java: 
128) at  
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: 
117) at  
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java: 
39) at org.apache.hadoop.mapred.MapTask 
$TrackedRecordReader.next(MapTask.java:147) at  
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at  
org.apache.hadoop.mapred.MapTask.run(MapTask.java:208) at  
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2016)
I guess some of files are invalid. However I could not find anywhere  
in logs file name of the file causing this exception. Due to the huge  
size of the dataset I would not want to extract files from DFS and  
verify them with Gzip one by one. Any suggestions? Thanks!
Sincerely,
Vadim



Mime
View raw message