hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (MAPREDUCE-613) Streaming should allow to re-start the command if it failed in the middle of input
Date Mon, 12 Jul 2010 11:14:49 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu resolved MAPREDUCE-613.
-----------------------------------------------

    Resolution: Duplicate

Can be achieved through skipping bad records feature.

> Streaming should allow to re-start the command if it failed in the middle of input
> ----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-613
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-613
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: arkady borkovsky
>
> Sometimes, we need to use imperfect programs to process data.
> Recently, I used a public domain program that did what I needed, but crashed after processing
few million records (in my case, more than half of the mappers would succeed, with the rest
failing at different %%).
> It would be nice to be able to tell the Streaming Framework :
>      if the streaming command fails at some input record (and you get "pipe broken" from
it), 
>      restart the command and continue feeding it the data.
>      Please log the failing record.
> In textmining, quite often, loosing few record of the input makes no  difference at all.
> Of course this feature should be disabled by default, and should some "are really sure"
provision.  (an expert feature).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message