Return-Path: Delivered-To: apmail-incubator-pig-user-archive@locus.apache.org Received: (qmail 73197 invoked from network); 6 Dec 2007 17:06:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Dec 2007 17:06:57 -0000 Received: (qmail 31855 invoked by uid 500); 6 Dec 2007 17:06:46 -0000 Delivered-To: apmail-incubator-pig-user-archive@incubator.apache.org Received: (qmail 31801 invoked by uid 500); 6 Dec 2007 17:06:46 -0000 Mailing-List: contact pig-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-user@incubator.apache.org Delivered-To: mailing list pig-user@incubator.apache.org Received: (qmail 31792 invoked by uid 99); 6 Dec 2007 17:06:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2007 09:06:45 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2007 17:06:44 +0000 Received: from afterside.corp.yahoo.com (afterside.corp.yahoo.com [10.72.110.226]) by mrout1.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id lB6H5vqq047636; Thu, 6 Dec 2007 09:05:57 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=t7kl2bV053r9Lt3VafaXxrAnR+ILsAJe9FQ0ET+/KcxW7gjVFp0tlJK9zrEGKGNw Message-ID: <47582BF5.3040303@yahoo-inc.com> Date: Thu, 06 Dec 2007 09:05:57 -0800 From: Alan Gates User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: pig-user@incubator.apache.org CC: marty.springer@gmail.com, sam990912@gmail.com Subject: Re: error with pig job References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Utkarsh, I can submit a patch for this today. Do you know of a simple test case that reproduces the error? Alan. Utkarsh Srivastava wrote: > Alan, this is a problem with the combiner part (the problem of putting > an indexed tuple directly into the bag, the first point in my comment > about the combiner patch that was committed). Some of the mappers that > spill their bags to disk, have a problem reading them back, because > what was written out was an indexed tuple, while what is expected to > be read is a regular Tuple. > > > Utkarsh > > > > > > > On Dec 5, 2007, at 3:50 PM, Andrew Hitchcock wrote: > >> Hi folks, >> >> I'm having a problem with a Pig job I wrote, it is throwing exceptions >> in the map phase. I'm using the latest SVN of Pig, compiled against >> the Hadoop15 jar included in SVN. My cluster is running Hadoop 0.15.1 >> on Java 1.6.0_03. Here's the pig job (which I ran through grunt): >> >> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS >> (movie,user,rating,date); >> B = GROUP A BY movie; >> C = FOREACH B GENERATE group, COUNT(A.user) as ratingcount, >> AVG(A.rating) as averagerating; >> D = ORDER C BY averagerating; >> STORE D INTO 'output/output.tsv'; >> >> A large number of jobs fail (but not all, some succeed) with the >> following exception: >> >> error: Error message from task (map) tip_200712051644_0002_m_000003 >> java.lang.RuntimeException: Unexpected data while reading tuple from >> binary file >> at >> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:81) >> >> at >> org.apache.pig.impl.io.DataBagFileReader$myIterator.next(DataBagFileReader.java:41) >> >> at >> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:89) >> >> at >> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) >> at >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273) >> >> at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) >> at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:216) >> at >> org.apache.pig.impl.eval.FuncEvalSpec$1.add(FuncEvalSpec.java:105) >> at >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.(GenerateSpec.java:165) >> >> at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:77) >> at >> org.apache.pig.impl.mapreduceExec.PigCombine.reduce(PigCombine.java:101) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:439) >> >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:418) >> >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364) >> >> at >> org.apache.pig.impl.mapreduceExec.PigMapReduce$MapDataOutputCollector.add(PigMapReduce.java:309) >> >> at >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56) >> >> at >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.add(GenerateSpec.java:242) >> >> at >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56) >> >> at >> org.apache.pig.impl.eval.collector.DataCollector.addToSuccessor(DataCollector.java:93) >> >> at >> org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:35) >> at >> org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:273) >> >> at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:86) >> at >> org.apache.pig.impl.eval.collector.UnflattenCollector.add(UnflattenCollector.java:56) >> >> at >> org.apache.pig.impl.mapreduceExec.PigMapReduce.run(PigMapReduce.java:113) >> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) >> >> As a comparison, the following job runs successfully: >> >> A = LOAD 'netflix/netflix.csv' USING PigStorage(',') AS >> (movie,user,rating,date); >> B = FILTER A BY movie == '8'; >> C = GROUP B BY movie; >> D = FOREACH C GENERATE group, COUNT(B.user) as ratingcount, >> AVG(B.rating) as averagerating; >> DUMP D; >> >> Any help in tracking this down would be greatly appreciated. So far, >> Pig is looking really slick and I'd love to write more advanced >> programs with it. >> >> Thanks, >> Andrew Hitchcock >