hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenjun Huang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6817) SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library
Date Thu, 10 Jun 2010 07:19:13 GMT
SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce
job without native compression library
---------------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-6817
                 URL: https://issues.apache.org/jira/browse/HADOOP-6817
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 0.20.2
         Environment: Cluster:CentOS 5,jdk1.6.0_20
Client:Mac SnowLeopard,jdk1.6.0_20
            Reporter: Wenjun Huang


An hadoop job output a gzip compressed sequence file(whether record compressed or block compressed).The
client program use SequenceFile.Reader to read this sequence file,when reading the client
program shows the following exceptions:

2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
Exception in thread "main" java.io.EOFException
	at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
	at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
	at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
	at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
	at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
	at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
	at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
	at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170)
	at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
	at com.shiningware.intelligenceonline.taobao.mapreduce.HtmlContentSeqOutputView.main(HtmlContentSeqOutputView.java:28)

I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method and read:
      // Initialize... *not* if this we are constructing a temporary Reader
      if (!tempReader) {
        valBuffer = new DataInputBuffer();
        if (decompress) {
          valDecompressor = CodecPool.getDecompressor(codec);
          valInFilter = codec.createInputStream(valBuffer, valDecompressor);
          valIn = new DataInputStream(valInFilter);
        } else {
          valIn = valBuffer;
        }
the problem seems to be caused by "valBuffer = new DataInputBuffer();" ,because GzipCodec.createInputStream
creates an instance of GzipInputStream whose constructor creates an instance of ResetableGZIPInputStream
class.When ResetableGZIPInputStream's constructor calls it base class java.util.zip.GZIPInputStream's
constructor ,it trys to read the empty "valBuffer = new DataInputBuffer();" and get no content,so
it throws an EOFException.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message