From user-return-1779-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Wed Oct 3 00:26:53 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2BF09DD10 for ; Wed, 3 Oct 2012 00:26:53 +0000 (UTC) Received: (qmail 93293 invoked by uid 500); 3 Oct 2012 00:26:48 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 93204 invoked by uid 500); 3 Oct 2012 00:26:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 93197 invoked by uid 99); 3 Oct 2012 00:26:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Oct 2012 00:26:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.161.176] (HELO mail-gg0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Oct 2012 00:26:43 +0000 Received: by ggke5 with SMTP id e5so852066ggk.35 for ; Tue, 02 Oct 2012 17:26:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=Nv3Q1rVoNaz0ttL88ZB0n+s2m3sAuph3DuK2Kmg5Tig=; b=llF3fLBTLcjj88lWHphr1pgx0xyBWnymdwx3b5B9VsSTge0pGaWzr9zF2j3nUDa0Zs IRfU0QT569CJez1O6H9gx5hsiF/hcoHCeDE+6uGXyydRy23KxJpeqKnZAF+czKsCWU8P eRoVnA8HRR4eRzm7IiwaYIKOIFIEEwKNAzJbOGemA8UNfIuGSUS4ZPByzmug6mDBiU7W MkhN6NS8NKhttn1i7Bel2+ICvLNZDnmZhqTnUGGvbunEgfgpQV7xhFJLPvZDdDKfTwsG Ni18O33AjYq4MPLpq1a22QL2ieOddYrDwCQa2fvJnpLiRWKn5k+5rJRleY7wQ0e0jn15 PFIQ== MIME-Version: 1.0 Received: by 10.236.76.132 with SMTP id b4mr401819yhe.106.1349223982358; Tue, 02 Oct 2012 17:26:22 -0700 (PDT) Received: by 10.100.129.17 with HTTP; Tue, 2 Oct 2012 17:26:22 -0700 (PDT) In-Reply-To: <506ADD7C.1070800@cs.uni-kassel.de> References: <5069C0B0.2090201@cs.uni-kassel.de> <03503a442b790fbb48303b88f2f05855@cs.uni-kassel.de> <506ADD7C.1070800@cs.uni-kassel.de> Date: Tue, 2 Oct 2012 17:26:22 -0700 Message-ID: Subject: Re: HDFS "file" missing a part-file From: Robert Molina To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf300facef30e19004cb1cb01c X-Gm-Message-State: ALoCoQla6Vref4fmmtm+k3lWvS2Qi544WitSJqtxHxFY79oOEzbugOgzlLFYfgY9PGv5u5AYiz9c X-Virus-Checked: Checked by ClamAV on apache.org --20cf300facef30e19004cb1cb01c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable What I guess might be happening is that your data may contain some text data that pig is not fully parsing because the data contains characters that pig uses as delimiters (i.e commas and curly brackets). Thus, you can probably take a look at the data and see if you can find any of the characters used by pig to distinguish values, bags, tuples. You also might want to move this topic to the pig forums to see if anyone else has faced a similar issue. On Tue, Oct 2, 2012 at 5:26 AM, Bj=F6rn-Elmar Macek wrote: > Hi again, > > i executed a slightly different script again, that included some more > operations. The logs look similar, but this time i have 2 attempt files f= or > the same job-package: > (1) _temporary/_attempt_**201210021204_0001_r_000001_0/**part-r-00001 > (2) _temporary/_attempt_**201210021204_0001_r_000001_1/**part-r-00001 > > For me it looks like 2 results of the same jobpackage - this time both > being not empty as before, but with about blocksize which is about 700 mb= . > I hoped, that both files contained the same content, but "diff" showed me > that this was not the case. I merged both files with a combination of "ca= t" > and "sort -u": the result is a file of about 1.2 gb, which indicates for > me, that there were many different lines. I suppose, that the cluster did= nt > manage to compute this part-file, tho i have no idea what makes this file > so special, that it is always this one which is corrupt(?). > > The worst solution would be for me to simply ignore this error and > continue working with the merged file. Is there anybody who has experienc= ed > similar things? > If there is a way to fix this, i would love to know, how? Possible reason= s > for the problems are also very appreciated! :) > > > Am 01.10.2012 22:36, schrieb Bj=F6rn-Elmar Macek: > > >> The script i now want to executed looks like this: >> >> x =3D load 'tag_count_ts_pro_userpair' as (group:tuple(),cnt:int,times:*= * >> bag{t:tuple(c:chararray)}); >> y =3D foreach x generate *, moins.daysFromStart('2011-06-**01 00:00:00', >> times); >> store y into 'test_daysFromStart'; >> >> >> The problem is, that i do not have the logs anymore due to space >> constraints within the cluster. But i think i can explain the important >> parts: >> The script that created this data was a GROUP statement followed by a >> FOREACH calculating a COUNT on the bag mentioned above as "times" which = is >> represented in the 2nd column named "cnt". The results were stored via a >> simple "store". >> The resulting pig calculation started as expected, but stoppped showing >> me progress at a certain percentage. A "tail -f" on the hadoop/logs dir >> revealed that the hadoop calculation progressed nontheless - although so= me >> of the tasktrackers permanently vanished during the shuffle phase with t= he >> committed/eof/mortbay exception and stopped at least producing any more = log >> output. As i really continiously watched the log i could see, that those >> work packages were handled by the remaining servers after some of them >> already calculated packages of progress 1.0. Even the cleanup phase in t= he >> end was done, ALTHOUGH(!) the pig log didn't reflect the calculations of >> the cluster. And since i found the file as output in hdfs i supposed the >> missing pig progress log entries were simply pig problems. Maybe im wron= g >> with that. >> >> But i did the calculations several times and this happened during every >> execution. >> >> Is there something wrong with the data or the calculations? >> >> >> On Mon, 1 Oct 2012 13:01:41 -0700, Robert Molina >> wrote: >> >>> It seems that maybe the previous pig script didn't generate the output >>> data or write correctly on hdfs. Can you provide the pig script you >>> are trying to run? Also, for the original script that ran and >>> generated the file, can you verify if that job had any failed tasks? >>> >>> On Mon, Oct 1, 2012 at 10:31 AM, Bj=F6rn-Elmar Macek wrote: >>> >>> Hi Robert, >>> >>> the exception i see in the output of the grunt shell and in the pig >>> log respectively is: >>> >>> Backend error message >>> --------------------- >>> java.util.EmptyStackException >>> at java.util.Stack.peek(Stack.**java:102) >>> at >>> >>> org.apache.pig.builtin.**Utf8StorageConverter.**consumeTuple(**Utf8Stor= ageConverter.java:182) >>> >>> at >>> >>> org.apache.pig.builtin.**Utf8StorageConverter.**bytesToTuple(**Utf8Stor= ageConverter.java:501) >>> >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> expressionOperators.POCast.**getNext(POCast.java:905) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> PhysicalOperator.getNext(**PhysicalOperator.java:334) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> relationalOperators.POForEach.**processPlan(POForEach.java:**332) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> relationalOperators.POForEach.**getNext(POForEach.java:284) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> PhysicalOperator.processInput(**PhysicalOperator.java:290) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.** >>> relationalOperators.POForEach.**getNext(POForEach.java:233) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> PigGenericMapBase.runPipeline(**PigGenericMapBase.java:271) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> PigGenericMapBase.map(**PigGenericMapBase.java:266) >>> at >>> >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> PigGenericMapBase.map(**PigGenericMapBase.java:64) >>> at >>> org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144) >>> at >>> org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**java:764) >>> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370) >>> at org.apache.hadoop.mapred.**Child$4.run(Child.java:255) >>> at java.security.**AccessController.doPrivileged(**Native >>> Method) >>> at javax.security.auth.Subject.**doAs(Subject.java:415) >>> at >>> >>> org.apache.hadoop.security.**UserGroupInformation.doAs(** >>> UserGroupInformation.java:**1121) >>> at org.apache.hadoop.mapred.**Child.main(Child.java:249) >>> >>> On Mon, 1 Oct 2012 10:12:22 -0700, Robert Molina wrote: >>> >>> Hi Bjorn, >>> Can you post the exception you are getting during the map phase? >>> >>> On Mon, Oct 1, 2012 at 9:11 AM, Bj=F6rn-Elmar Macek wrote: >>> >>> Hi, >>> >>> i am kind of unsure where to post this problem, but i think it is >>> more related to hadoop than to pig. >>> >>> By successfully executing a pig script i created a new file in my >>> hdfs. Sadly though, i cannot use it for further processing except for >>> "dump"ing and viewing the data: every data-manipulation >>> script-command >>> just as "foreach" gives exceptions during the map phase. >>> Since there was no problem executing the same script on the first >>> 100 >>> lines of my data (LIMIT statement),i copied it to my local fs folder. >>> What i realized is, that one of the files namely part-r-000001 was >>> empty and contained within the _temporary folder. >>> >>> Is there any reason for this? How can i fix this issue? Did the job >>> (which created the file we are talking about) NOT run properly til >>> its >>> end, although the tasktracker worked til the very end and the file >>> was >>> created? >>> >>> Best regards, >>> Bj=F6rn >>> >>> Links: >>> ------ >>> [1] mailto:macek@cs.uni-kassel.de [3] >>> >>> >>> >>> Links: >>> ------ >>> [1] mailto:ema@cs.uni-kassel.de >>> [2] mailto:rmolina@hortonworks.com >>> [3] mailto:macek@cs.uni-kassel.de >>> >> >> >> > --20cf300facef30e19004cb1cb01c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable =A0What I guess might be happening is that your data may contain some text = data that pig is not fully parsing because the data contains characters tha= t pig uses as delimiters (i.e commas and curly brackets). =A0Thus, you can = probably take a look at the data and see if you can find any of the charact= ers used by pig to distinguish values, bags, tuples. =A0You also might want= to move this topic to the pig forums to see if anyone else has faced a sim= ilar issue.=A0

On Tue, Oct 2, 2012 at 5:26 AM, Bj=F6rn-Elma= r Macek <macek@cs.uni-kassel.de> wrote:
Hi again,

i executed a slightly different script again, that included some more opera= tions. The logs look similar, but this time i have 2 attempt files for the = same job-package:
(1) _temporary/_attempt_201210021204_0001_r_000001_0/part-r-0= 0001
(2) _temporary/_attempt_201210021204_0001_r_000001_1/part-r-0= 0001

For me it looks like 2 results of the same jobpackage - this time both bein= g not empty as before, but with about blocksize which is about 700 mb. I ho= ped, that both files contained the same content, but "diff" showe= d me that this was not the case. I merged both files with a combination of = "cat" and "sort -u": the result is a file of about 1.2 = gb, which indicates for me, that there were many different lines. I suppose= , that the cluster didnt manage to compute this part-file, tho i have no id= ea what makes this file so special, that it is always this one which is cor= rupt(?).

The worst solution would be for me to simply ignore this error and continue= working with the merged file. Is there anybody who has experienced similar= things?
If there is a way to fix this, i would love to know, how? Possible reasons = for the problems are also very appreciated! :)


Am 01.10.2012 22:36, schrieb Bj=F6rn-Elmar Macek:


The script i now want to executed looks like this:

x =3D load 'tag_count_ts_pro_userpair' as (group:tuple(),cnt:int,ti= mes:bag{t:tuple(c:chararray)});
y =3D foreach x generate *, moins.daysFromStart('2011-06-01 00:0= 0:00', times);
store y into 'test_daysFromStart';


The problem is, that i do not have the logs anymore due to space constraint= s within the cluster. But i think i can explain the important parts:
The script that created this data was a GROUP statement followed by a FOREA= CH calculating a COUNT on the bag mentioned above as "times" whic= h is represented in the 2nd column named "cnt". The results were = stored via a simple "store".
The resulting pig calculation started as expected, but stoppped showing me = progress at a certain percentage. A "tail -f" on the hadoop/logs = dir revealed that the hadoop calculation progressed nontheless - although s= ome of the tasktrackers permanently vanished during the shuffle phase with = the committed/eof/mortbay exception and stopped at least producing any more= log output. As i really continiously watched the log i could see, that tho= se work packages were handled by the remaining servers after some of them a= lready calculated packages of progress 1.0. Even the cleanup phase in the e= nd was done, ALTHOUGH(!) the pig log didn't reflect the calculations of= the cluster. And since i found the file as output in hdfs i supposed the m= issing pig progress log entries were simply pig problems. Maybe im wrong wi= th that.

But i did the calculations several times and this happened during every exe= cution.

Is there something wrong with the data or the calculations?


On Mon, 1 Oct 2012 13:01:41 -0700, Robert Molina <rmolina@hortonworks.com> wrot= e:
It seems that maybe the previous pig script didn't generate the output<= br> data or write correctly on hdfs. Can you provide the pig script you
are trying to run? =A0Also, for the original script that ran and
generated the file, can you verify if that job had any failed tasks?

On Mon, Oct 1, 2012 at 10:31 AM, Bj=F6rn-Elmar Macek =A0wrote:

=A0Hi Robert,

=A0the exception i see in the output of the grunt shell and in the pig
log respectively is:

=A0Backend error message
=A0---------------------
=A0java.util.EmptyStackException
=A0 =A0 =A0 =A0 =A0at java.util.Stack.peek(Stack.java:102)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.builtin.Utf8StorageConverter.consumeTuple(<= /u>Utf8StorageConverter.java:182)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(<= /u>Utf8StorageConverter.java:501)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.e= xpressionOperators.POCast.getNext(POCast.java:905)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.P= hysicalOperator.getNext(PhysicalOperator.java:334)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.r= elationalOperators.POForEach.processPlan(POForEach.java:332) =
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.r= elationalOperators.POForEach.getNext(POForEach.java:284)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.P= hysicalOperator.processInput(PhysicalOperator.java:290)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.physicalLayer.r= elationalOperators.POForEach.getNext(POForEach.java:233)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.= PigGenericMapBase.runPipeline(PigGenericMapBase.java:271) =A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.= PigGenericMapBase.map(PigGenericMapBase.java:266)
=A0 =A0 =A0 =A0 =A0at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.= PigGenericMapBase.map(PigGenericMapBase.java:64)
=A0 =A0 =A0 =A0 =A0at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
=A0 =A0 =A0 =A0 =A0at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:76= 4)
=A0 =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.MapTask.run(MapTask.j= ava:370)
=A0 =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.Child$4.run(Child.jav= a:255)
=A0 =A0 =A0 =A0 =A0at java.security.AccessController.doPrivileged(
Native
Method)
=A0 =A0 =A0 =A0 =A0at javax.security.auth.Subject.doAs(Subject.java:= 415)
=A0 =A0 =A0 =A0 =A0at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGrou= pInformation.java:1121)
=A0 =A0 =A0 =A0 =A0at org.apache.hadoop.mapred.Child.main(Child.java= :249)

=A0On Mon, 1 Oct 2012 10:12:22 -0700, Robert Molina =A0wrote:

=A0Hi Bjorn,
=A0Can you post the exception you are getting during the map phase?

=A0On Mon, Oct 1, 2012 at 9:11 AM, Bj=F6rn-Elmar Macek =A0wrote:

=A0 Hi,

=A0 i am kind of unsure where to post this problem, but i think it is
=A0more related to hadoop than to pig.

=A0 By successfully executing a pig script i created a new file in my
=A0hdfs. Sadly though, i cannot use it for further processing except for =A0"dump"ing and viewing the data: every data-manipulation
script-command
=A0just as "foreach" gives exceptions during the map phase.
=A0 Since there was no problem executing the same script on the first
100
=A0lines of my data (LIMIT statement),i copied it to my local fs folder. =A0 What i realized is, that one of the files namely part-r-000001 was
=A0empty and contained within the _temporary folder.

=A0 Is there any reason for this? How can i fix this issue? Did the job
=A0(which created the file we are talking about) NOT run properly til
its
=A0end, although the tasktracker worked til the very end and the file
was
=A0created?

=A0 Best regards,
=A0 Bj=F6rn

=A0Links:
=A0------
=A0[1] mailto:m= acek@cs.uni-kassel.de [3]



Links:
------
[1] mailto:ema@cs= .uni-kassel.de
[2] mailto:rmo= lina@hortonworks.com
[3] mailto:mace= k@cs.uni-kassel.de




--20cf300facef30e19004cb1cb01c--