Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C8D5E104A0 for ; Tue, 8 Oct 2013 17:52:50 +0000 (UTC) Received: (qmail 18044 invoked by uid 500); 8 Oct 2013 17:52:38 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 17913 invoked by uid 500); 8 Oct 2013 17:52:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17906 invoked by uid 99); 8 Oct 2013 17:52:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Oct 2013 17:52:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of secsubs@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vb0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Oct 2013 17:52:29 +0000 Received: by mail-vb0-f48.google.com with SMTP id w16so4372475vbf.7 for ; Tue, 08 Oct 2013 10:52:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=8vH3FfLVjdNO99IePdvEf58uhXZWdc/fKD6nKg5V6X4=; b=kT64Krr92lz1azjN4qPOJbct/OCRDekDMBhcxjQMDFtyjqO48nvBsHqqWqQcVztjDK moQIm8jIB5eGVR3bAOdbFQw6EQns0bUxJkcjs39+WphtfbzubAasGy6e7fR8d9PIrtHy MLwiuGhzqkxr3oKZ3BQAkqaO4fjV/t21eO1uSROEr9rGN2oh3Jf0d0FWgBAQ9UCL/+xy aSMv7Mz8NbW3T1BdNXvcizLSYs3oH9QvR4lnEHg2NZk4eNTele/s6DEWR9/3+UAZ53JY AcbKG/j7k0zHs+r3EcbpNRMmyVls3BODoThljwmd6ZI9Scvy9RH+ig4JxW/iDu5RrP9Z eMAA== MIME-Version: 1.0 X-Received: by 10.52.169.37 with SMTP id ab5mr1613543vdc.31.1381254728557; Tue, 08 Oct 2013 10:52:08 -0700 (PDT) Received: by 10.221.57.129 with HTTP; Tue, 8 Oct 2013 10:52:08 -0700 (PDT) Date: Tue, 8 Oct 2013 10:52:08 -0700 Message-ID: Subject: Modifying Grep to read Sequence/Snappy files From: Xuri Nagarin To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e016339d870aa0d04e83e6dab X-Virus-Checked: Checked by ClamAV on apache.org --089e016339d870aa0d04e83e6dab Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to get the Grep example bundled with CDH to read Sequence/Snappy files. By default, the program throws errors trying to read Sequence/Snappy files: java.io.EOFException: Unexpected end of block in input stream at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:121) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:95) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:83) at java.io.InputStream.read(InputStream.java:82) So I edited the code to read Sequence files. Changed: FileInputFormat.setInputPaths(grepJob, args[0]); To: FileInputFormat.setInputPaths(grepJob, args[0]); grepJob.setInputFormatClass(SequenceFileAsTextInputFormat.class); But I still get the same error. 1) Do I need to manually set the input compression codec? I thought the SequenceFile reader automatically detects compression. 2) If I need to manually set compression, do I do it using the "setInputFormatClass" or is it something I set in the "conf" object? TIA, Xuri --089e016339d870aa0d04e83e6dab Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

I am trying to get the Grep example= bundled with CDH to read Sequence/Snappy files.

B= y default, the program throws errors trying to read Sequence/Snappy files:<= /div>
java.io.EOFException: Unexpected end of block in input stream
at org.apache.had= oop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressor= Stream.java:121)
at org.apache.hadoo= p.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.ja= va:95)
at org.= apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:8= 3)
at java.io.InputStr= eam.read(InputStream.java:82)


So I edited the code to read Sequence files.

Chan= ged:
FileInputFormat.setInputPaths(grepJob, args[0]);
=
To:
FileInputFormat.setInputPaths(grepJob, ar= gs[0]);
grepJob.setInputFormatClass(SequenceFileAsText= InputFormat.class);

But I still get the same error.

1) Do I need to manually set the input compression codec? I thought= the SequenceFile reader automatically detects compression.
2) If= I need to manually set compression, do I do it using the "setInputFor= matClass" or is it something I set in the "conf" object?

TIA,

Xuri

=



--089e016339d870aa0d04e83e6dab--