Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55BE2DBB4 for ; Mon, 1 Oct 2012 20:37:08 +0000 (UTC) Received: (qmail 47216 invoked by uid 500); 1 Oct 2012 20:37:01 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 47075 invoked by uid 500); 1 Oct 2012 20:37:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 47030 invoked by uid 99); 1 Oct 2012 20:37:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 20:37:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.51.167.101] (HELO gundel.cs.uni-kassel.de) (141.51.167.101) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Oct 2012 20:36:56 +0000 Received: from localhost (localhost [127.0.0.1]) by gundel.cs.uni-kassel.de (Postfix) with ESMTP id 902E12CB151; Mon, 1 Oct 2012 22:36:34 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at gundel.cs.uni-kassel.de Received: from gundel.cs.uni-kassel.de ([127.0.0.1]) by localhost (gundel.cs.uni-kassel.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v9VB8LzWLdGR; Mon, 1 Oct 2012 22:36:32 +0200 (CEST) Received: by gundel.cs.uni-kassel.de (Postfix, from userid 33) id 14FCB2CB176; Mon, 1 Oct 2012 22:36:32 +0200 (CEST) To: Subject: Re: HDFS "file" missing a part-file X-PHP-Originating-Script: 2154:func.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Mon, 01 Oct 2012 22:36:32 +0200 From: =?UTF-8?Q?Bj=C3=B6rn-Elmar_Macek?= Organization: =?UTF-8?Q?Universit=C3=A4t_Kassel_-_Fachgebiet_Wissensverar?= =?UTF-8?Q?beitung?= In-Reply-To: References: <5069C0B0.2090201@cs.uni-kassel.de> <03503a442b790fbb48303b88f2f05855@cs.uni-kassel.de> Message-ID: X-Sender: ema@cs.uni-kassel.de User-Agent: RoundCube Webmail/0.1b X-Virus-Checked: Checked by ClamAV on apache.org The script i now want to executed looks like this: x = load 'tag_count_ts_pro_userpair' as (group:tuple(),cnt:int,times:bag{t:tuple(c:chararray)}); y = foreach x generate *, moins.daysFromStart('2011-06-01 00:00:00', times); store y into 'test_daysFromStart'; The problem is, that i do not have the logs anymore due to space constraints within the cluster. But i think i can explain the important parts: The script that created this data was a GROUP statement followed by a FOREACH calculating a COUNT on the bag mentioned above as "times" which is represented in the 2nd column named "cnt". The results were stored via a simple "store". The resulting pig calculation started as expected, but stoppped showing me progress at a certain percentage. A "tail -f" on the hadoop/logs dir revealed that the hadoop calculation progressed nontheless - although some of the tasktrackers permanently vanished during the shuffle phase with the committed/eof/mortbay exception and stopped at least producing any more log output. As i really continiously watched the log i could see, that those work packages were handled by the remaining servers after some of them already calculated packages of progress 1.0. Even the cleanup phase in the end was done, ALTHOUGH(!) the pig log didn't reflect the calculations of the cluster. And since i found the file as output in hdfs i supposed the missing pig progress log entries were simply pig problems. Maybe im wrong with that. But i did the calculations several times and this happened during every execution. Is there something wrong with the data or the calculations? On Mon, 1 Oct 2012 13:01:41 -0700, Robert Molina wrote: > It seems that maybe the previous pig script didn't generate the > output > data or write correctly on hdfs. Can you provide the pig script you > are trying to run?  Also, for the original script that ran and > generated the file, can you verify if that job had any failed tasks? > > On Mon, Oct 1, 2012 at 10:31 AM, Björn-Elmar Macek wrote: > > Hi Robert, > > the exception i see in the output of the grunt shell and in the pig > log respectively is: > > Backend error message > --------------------- > java.util.EmptyStackException >         at java.util.Stack.peek(Stack.java:102) >         at > > org.apache.pig.builtin.Utf8StorageConverter.consumeTuple(Utf8StorageConverter.java:182) >         at > > org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(Utf8StorageConverter.java:501) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:905) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) >         at > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:233) >         at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:271) >         at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266) >         at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) >         at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >         at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >         at java.security.AccessController.doPrivileged(Native > Method) >         at javax.security.auth.Subject.doAs(Subject.java:415) >         at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >         at org.apache.hadoop.mapred.Child.main(Child.java:249) > > On Mon, 1 Oct 2012 10:12:22 -0700, Robert Molina wrote: > > Hi Bjorn,  > Can you post the exception you are getting during the map phase? > > On Mon, Oct 1, 2012 at 9:11 AM, Björn-Elmar Macek  wrote: > >  Hi, > >  i am kind of unsure where to post this problem, but i think it is > more related to hadoop than to pig. > >  By successfully executing a pig script i created a new file in my > hdfs. Sadly though, i cannot use it for further processing except > for > "dump"ing and viewing the data: every data-manipulation > script-command > just as "foreach" gives exceptions during the map phase. >  Since there was no problem executing the same script on the first > 100 > lines of my data (LIMIT statement),i copied it to my local fs > folder. >  What i realized is, that one of the files namely part-r-000001 was > empty and contained within the _temporary folder. > >  Is there any reason for this? How can i fix this issue? Did the job > (which created the file we are talking about) NOT run properly til > its > end, although the tasktracker worked til the very end and the file > was > created? > >  Best regards, >  Björn > > Links: > ------ > [1] mailto:macek@cs.uni-kassel.de [3] > > > > Links: > ------ > [1] mailto:ema@cs.uni-kassel.de > [2] mailto:rmolina@hortonworks.com > [3] mailto:macek@cs.uni-kassel.de