hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuri Nagarin <secs...@gmail.com>
Subject Modifying Grep to read Sequence/Snappy files
Date Tue, 08 Oct 2013 17:52:08 GMT
Hi,

I am trying to get the Grep example bundled with CDH to read
Sequence/Snappy files.

By default, the program throws errors trying to read Sequence/Snappy files:
java.io.EOFException: Unexpected end of block in input stream
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
at java.io.InputStream.read(InputStream.java:82)


So I edited the code to read Sequence files.

Changed:
FileInputFormat.setInputPaths(grepJob, args[0]);

To:
FileInputFormat.setInputPaths(grepJob, args[0]);
grepJob.setInputFormatClass(SequenceFileAsTextInputFormat.class);

But I still get the same error.

1) Do I need to manually set the input compression codec? I thought the
SequenceFile reader automatically detects compression.
2) If I need to manually set compression, do I do it using the
"setInputFormatClass" or is it something I set in the "conf" object?

TIA,

Xuri

Mime
View raw message