hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: Streaming and subprocess error code
Date Tue, 13 May 2008 15:17:37 GMT

On May 13, 2008, at 8:09 AM, Rick Cox wrote:

> Try "-jobconf stream.non.zero.exit.status.is.failure=true".
>

Anyone willing to document this on http://hadoop.apache.org/core/docs/ 
current/streaming.html?
Ideally HADOOP-2057 should have documented this useful feature,  
better late than never... I've opened
http://issues.apache.org/jira/browse/HADOOP-3379 for this  
documentation request.

Arun

> That will tell streaming that a non-zero exit is a task failure. To
> turn that into an immediate whole job failure, I think configuring 0
> task retries (mapred.map.max.attempts=1 and
> mapred.reduce.max.attempts=1) will be sufficient.
>
> rick
>
> On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov  
> <apankov@iponweb.net> wrote:
>> Hi all,
>>
>>  I'm looking a way to force Streaming to shutdown the whole job in  
>> case when
>> some of its subprocesses exits with non-zero error code.
>>
>>  We have next situation. Sometimes either mapper or reducer could  
>> crush, as
>> a rule it returns some exit code. In this case entire streaming  
>> job finishes
>> successfully, but that's wrong. Almost the same when any  
>> subprocess finishes
>> with segmentation fault.
>>
>>  It's possible to check automatically if a subprocess crushed only  
>> via logs
>> but it means you need to parse tons of outputs/logs/dirs/etc.
>>  In order to find logs of your job you have to know it's jobid ~
>> job_200805130853_0016. I don't know easy way to determine it -  
>> just scan
>> stdout for the pattern. Then find logs of each mapper, each  
>> reducer, find a
>> way to parse them, etc, etc...
>>
>>  So, is there any easiest way get correct status of the whole  
>> streaming job
>> or I still have to build rather fragile parsing systems for such  
>> purposes?
>>
>>  Thanks in advance.
>>
>>  --
>>  Andrey Pankov
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message