Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 34676 invoked from network); 27 Feb 2008 20:31:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Feb 2008 20:31:25 -0000 Received: (qmail 74833 invoked by uid 500); 27 Feb 2008 20:31:19 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 74573 invoked by uid 500); 27 Feb 2008 20:31:18 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 74560 invoked by uid 99); 27 Feb 2008 20:31:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2008 12:31:18 -0800 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [72.14.220.154] (HELO fg-out-1718.google.com) (72.14.220.154) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2008 20:30:45 +0000 Received: by fg-out-1718.google.com with SMTP id 16so2250781fgg.35 for ; Wed, 27 Feb 2008 12:30:53 -0800 (PST) Received: by 10.86.95.20 with SMTP id s20mr6619419fgb.6.1204144253539; Wed, 27 Feb 2008 12:30:53 -0800 (PST) Received: by 10.86.73.11 with HTTP; Wed, 27 Feb 2008 12:30:53 -0800 (PST) Message-ID: Date: Wed, 27 Feb 2008 12:30:53 -0800 From: "Riccardo Boscolo" To: core-user@hadoop.apache.org Subject: Re: Decompression Blues In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_21358_4503020.1204144253529" References: <1712AEBD-F7A3-4DCD-8E15-DD85803A64D3@yahoo-inc.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_21358_4503020.1204144253529 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Jeff, if I were to debug this I would write a M/R job that generates some random data and uses Zlib for output compression. Then you can try to read the output of the job again and see if the issue is still there. We use Hadoop native libraries for compression all the time and I have never seen this exception before. One thing though, we usually use compressed SequenceFiles, but from your stack trace I can see that you are working with a compressed text file. In your tests, you might want to try both output formats and see if you encounter the same problem. RB On 2/27/08, Jeff Eastman wrote: > > I unzipped and rezipped all the files using gzip 1.3.3 and uploaded the > files again. I got the same exceptions. > > I set the hadoop.native.lib property to false and bounced the cloud, > then ran my job. I still get the same exceptions. > > Any more suggestions? > > > Jeff > > -----Original Message----- > From: Arun C Murthy [mailto:acm@yahoo-inc.com] > Sent: Tuesday, February 26, 2008 3:47 PM > To: core-user@hadoop.apache.org > Subject: Re: Decompression Blues > > > Jeff, > > On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote: > > > I'm processing a number of .gz compressed Apache and other logs using > > Hadoop 0.15.2 and encountering fatal decompression errors such as: > > > > > > How did you compress your input files? Could you share details on the > version of your gzip and other tools? > > Try setting "hadoop.native.lib" property to 'false' via > NativeCodeLoader.setLoadNativeLibraries for you job and see how it > works... > > Arun > > > > > 08/02/26 12:09:12 INFO mapred.JobClient: Task Id : > > task_200802171116_0001_m_000005_0, Status : FAILED > > > > java.lang.InternalError > > > > at > > org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(Native > > Method) > > > > at > > org.apache.hadoop.io.compress.zlib.ZlibDecompressor. > > (ZlibDecompres > > sor.java:111) > > > > at > > org.apache.hadoop.io.compress.GzipCodec.createDecompressor > > (GzipCodec.jav > > a:188) > > > > at > > org.apache.hadoop.io.compress.GzipCodec.createInputStream > > (GzipCodec.java > > :170) > > > > at > > org.apache.hadoop.mapred.LineRecordReader. > > (LineRecordReader.java:7 > > 5) > > > > at > > org.apache.hadoop.mapred.TextInputFormat.getRecordReader > > (TextInputFormat > > .java:50) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:156) > > > > at > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1787) > > > > > > > > I looked in Jira but did not find any issues. Is this pilot error? > > Some > > of the files work just fine. Is there a workaround besides > > unzipping all > > the files in the DFS? > > > > > > > > Jeff > > > > -- ------------------------------- Riccardo Boscolo, PhD V.P. of Core Technology Netseer Inc. 11943 Montana Ave, Suite 200 Los Angeles, CA 90049 T: 310-597-4482 F: 310-597-4489 Email: drboscolo@netseer.com ------------------------------- ------=_Part_21358_4503020.1204144253529--