hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Pankov <apan...@iponweb.net>
Subject Re: Streaming and subprocess error code
Date Thu, 15 May 2008 08:42:04 GMT
Hi Rick,

Double checked my test. The syslog output contains msg about non-zero 
exit code (in this case mapper finished with segfault)

2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed: 
PipeMapRed.waitOutputThreads(): subprocess exited with code 134 in 
org.apache.hadoop.streaming.PipeMapRed

stderr contains message with dump or smth about segfault.

Reducer job also finished with error:

2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed: 
PipeMapRed.waitOutputThreads(): subprocess exited with code 55 in 
org.apache.hadoop.streaming.PipeMapRed

Hence entire job is successful

08/05/14 18:12:03 INFO streaming.StreamJob:  map 0%  reduce 0%
08/05/14 18:12:05 INFO streaming.StreamJob:  map 100%  reduce 0%
08/05/14 18:12:06 INFO streaming.StreamJob:  map 100%  reduce 100%
08/05/14 18:12:06 INFO streaming.StreamJob: Job complete: 
job_200805131958_0020
08/05/14 18:12:06 INFO streaming.StreamJob: Output: 
/user/hadoop/data1_result




Rick Cox wrote:
> Hi,
> 
> Thanks: that message indicates the stream.non.zero.exit.is.failure
> feature isn't enabled for this task; the log is just reporting the
> exit status, but not raising the RuntimeException that it would if the
> feature were turned on.
> 
> I've had problems getting this parameter through from the command line
> before. If you've got access, you could try setting it in the
> hadoop-site.xml instead (I think it should be the tasktrackers that
> read that parameter).
> 
> (Sorry about the confusion here, we've been using that patch for so
> long I had forgotten it wasn't yet released, and I'm not exactly sure
> where we stand with these other bugs.)
> 
> rick
> 
> On Wed, May 14, 2008 at 11:05 PM, Andrey Pankov <apankov@iponweb.net> wrote:
>> Hi Rick,
>>
>>  Double checked my test. The syslog output contains msg about non-zero exit
>> code (in this case mapper finished with segfault)
>>
>>  2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed:
>> PipeMapRed.waitOutputThreads(): subprocess exited with code 134 in
>> org.apache.hadoop.streaming.PipeMapRed
>>
>>  stderr contains message with dump or smth about segfault.
>>
>>  Reducer job also finished with error:
>>
>>  2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed:
>> PipeMapRed.waitOutputThreads(): subprocess exited with code 55 in
>> org.apache.hadoop.streaming.PipeMapRed
>>
>>  Hence entire job is successful
>>
>>  08/05/14 18:12:03 INFO streaming.StreamJob:  map 0%  reduce 0%
>>  08/05/14 18:12:05 INFO streaming.StreamJob:  map 100%  reduce 0%
>>  08/05/14 18:12:06 INFO streaming.StreamJob:  map 100%  reduce 100%
>>  08/05/14 18:12:06 INFO streaming.StreamJob: Job complete:
>> job_200805131958_0020
>>  08/05/14 18:12:06 INFO streaming.StreamJob: Output:
>> /user/hadoop/data1_result
>>
>>
>>
>>
>>
>>
>>  Rick Cox wrote:
>>
>>> Does the syslog output from a should-have-failed task contain
>>> something like this?
>>>
>>>    java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
>>> subprocess failed with code 1
>>>
>>> (In particular, I'm curious if it mentions the RuntimeException.)
>>>
>>> Tasks that consume all their input and then exit non-zero are
>>> definitely supposed to be counted as failed, so there's either a
>>> problem with the setup or a bug somewhere.
>>>
>>> rick
>>>
>>> On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov <apankov@iponweb.net>
>> wrote:
>>>> Hi,
>>>>
>>>>  I've tested this new option "-jobconf
>>>> stream.non.zero.exit.status.is.failure=true". Seems working but still
>> not
>>>> good for me. When mapper/reducer program have read all input data
>>>> successfully and fails after that, streaming still finishes successfully
>> so
>>>> there are no chances to know about some data post-processing errors in
>>>> subprocesses :(
>>>>
>>>>
>>>>
>>>>  Andrey Pankov wrote:
>>>>
>>>>
>>>>> Hi Rick,
>>>>>
>>>>> Thank you for the quick response! I see this feature is in trunk and
>> not
>>>> available in last stable release. Anyway will try if it works for me
>> from
>>>> the trunk, and will try does it catch segmentation faults too.
>>>>
>>>>> Rick Cox wrote:
>>>>>
>>>>>
>>>>>> Try "-jobconf stream.non.zero.exit.status.is.failure=true".
>>>>>>
>>>>>> That will tell streaming that a non-zero exit is a task failure.
To
>>>>>> turn that into an immediate whole job failure, I think configuring
0
>>>>>> task retries (mapred.map.max.attempts=1 and
>>>>>> mapred.reduce.max.attempts=1) will be sufficient.
>>>>>>
>>>>>> rick
>>>>>>
>>>>>> On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov <apankov@iponweb.net>
>>>>>>
>>>> wrote:
>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>>  I'm looking a way to force Streaming to shutdown the whole job
in
>>>>>>>
>>>> case when
>>>>
>>>>>>> some of its subprocesses exits with non-zero error code.
>>>>>>>
>>>>>>>  We have next situation. Sometimes either mapper or reducer could
>>>>>>>
>>>> crush, as
>>>>
>>>>>>> a rule it returns some exit code. In this case entire streaming
>> job
>>>> finishes
>>>>
>>>>>>> successfully, but that's wrong. Almost the same when any
>> subprocess
>>>> finishes
>>>>
>>>>>>> with segmentation fault.
>>>>>>>
>>>>>>>  It's possible to check automatically if a subprocess crushed
only
>> via
>>>> logs
>>>>
>>>>>>> but it means you need to parse tons of outputs/logs/dirs/etc.
>>>>>>>  In order to find logs of your job you have to know it's jobid
~
>>>>>>> job_200805130853_0016. I don't know easy way to determine it
-
>> just
>>>> scan
>>>>
>>>>>>> stdout for the pattern. Then find logs of each mapper, each
>> reducer,
>>>> find a
>>>>
>>>>>>> way to parse them, etc, etc...
>>>>>>>
>>>>>>>  So, is there any easiest way get correct status of the whole
>>>>>>>
>>>> streaming job
>>>>
>>>>>>> or I still have to build rather fragile parsing systems for such
>>>>>>>
>>>> purposes?
>>>>
>>>>>>>  Thanks in advance.
>>>>>>>
>>>>>>>  --
>>>>>>>  Andrey Pankov
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>  --
>>>>  Andrey Pankov
>>>>
>>>>
>>>>
>>>
>>
>>  --
>>  Andrey Pankov
>>
>>
> 


-- 
Andrey Pankov


Mime
View raw message