Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of secsubs@gmail.com designates
 209.85.212.48 as permitted sender)
MIME-Version: 1.0
Date: Tue, 8 Oct 2013 10:52:08 -0700
Message-ID: 
 <CADPi3fjaXpH=PFqNpq+A3iueXFLTsxeSZbO4k7ow=Cm8NbSdcg@mail.gmail.com>
Subject: Modifying Grep to read Sequence/Snappy files
From: Xuri Nagarin <secsubs@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e016339d870aa0d04e83e6dab

--089e016339d870aa0d04e83e6dab
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am trying to get the Grep example bundled with CDH to read
Sequence/Snappy files.

By default, the program throws errors trying to read Sequence/Snappy files:
java.io.EOFException: Unexpected end of block in input stream
at
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83)
at java.io.InputStream.read(InputStream.java:82)


So I edited the code to read Sequence files.

Changed:
FileInputFormat.setInputPaths(grepJob, args[0]);

To:
FileInputFormat.setInputPaths(grepJob, args[0]);
grepJob.setInputFormatClass(SequenceFileAsTextInputFormat.class);

But I still get the same error.

1) Do I need to manually set the input compression codec? I thought the
SequenceFile reader automatically detects compression.
2) If I need to manually set compression, do I do it using the
"setInputFormatClass" or is it something I set in the "conf" object?

TIA,

Xuri

--089e016339d870aa0d04e83e6dab
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>I am trying to get the Grep example=
 bundled with CDH to read Sequence/Snappy files.</div><div><br></div><div>B=
y default, the program throws errors trying to read Sequence/Snappy files:<=
/div>
<div><div>java.io.EOFException: Unexpected end of block in input stream</di=
v><div><span class=3D"" style=3D"white-space:pre">	</span>at org.apache.had=
oop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressor=
Stream.java:121)</div>
<div><span class=3D"" style=3D"white-space:pre">	</span>at org.apache.hadoo=
p.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.ja=
va:95)</div><div><span class=3D"" style=3D"white-space:pre">	</span>at org.=
apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:8=
3)</div>
<div><span class=3D"" style=3D"white-space:pre">	</span>at java.io.InputStr=
eam.read(InputStream.java:82)</div></div><div><br></div><div><br></div><div=
>So I edited the code to read Sequence files.</div><div><br></div><div>Chan=
ged:</div>
<div><div>FileInputFormat.setInputPaths(grepJob, args[0]);</div></div><div>=
<br></div><div>To:</div><div><div>FileInputFormat.setInputPaths(grepJob, ar=
gs[0]);</div></div><div><div>grepJob.setInputFormatClass(SequenceFileAsText=
InputFormat.class);</div>
</div><div><br></div><div>But I still get the same error.</div><div><br></d=
iv><div>1) Do I need to manually set the input compression codec? I thought=
 the SequenceFile reader automatically detects compression.</div><div>2) If=
 I need to manually set compression, do I do it using the &quot;setInputFor=
matClass&quot; or is it something I set in the &quot;conf&quot; object?</di=
v>
<div><br></div><div>TIA,</div><div><br></div><div>Xuri</div><div><br></div>=
<div><br></div><div><br></div><div><br></div></div>

--089e016339d870aa0d04e83e6dab--