hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-153) skip records that throw exceptions
Date Mon, 28 Apr 2008 04:16:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592734#action_12592734
] 

Joydeep Sen Sarma commented on HADOOP-153:
------------------------------------------

hey folks - we are having a discussion on a similar jira (covering a smaller subset of issues)
- 3144. we are actually hitting this problem (corrupted records causing OOM) and have a simple
workaround specific to our problem.

but i am a little intrigued by the proposal here. for the recordreader issues - why not, simply,
let the record reader skip the bad record(s). as the discussions here mentions - there have
to be additional api's in the record reader to be able to skip problematic records. If the
framework trusts record readers to be able to skip bad records - why bother re-executing?
why not allow them to detect and skip bad records on the very first try. if TT/JT want to
keep track and impose a limit on the bad records skipped - they could ask the record reader
to report the same through an api.

the exceptions from map/reduce functions are different  - if they make the entire task unstable
due to OOM issues then a re-execution makes sense. but if we separate the two issues - we
may have a more lightweight  way of tolerating pure data corruption/validity issues (as we
are trying to in 3144).

> skip records that throw exceptions
> ----------------------------------
>
>                 Key: HADOOP-153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-153
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.2.0
>            Reporter: Doug Cutting
>            Assignee: Devaraj Das
>
> MapReduce should skip records that throw exceptions.
> If the exception is thrown under RecordReader.next() then RecordReader implementations
should automatically skip to the start of a subsequent record.
> Exceptions in map and reduce implementations can simply be logged, unless they happen
under RecordWriter.write().  Cancelling partial output could be hard.  So such output errors
will still result in task failure.
> This behaviour should be optional, but enabled by default.  A count of errors per task
and job should be maintained and displayed in the web ui.  Perhaps if some percentage of records
(>50%?) result in exceptions then the task should fail.  This would stop jobs early that
are misconfigured or have buggy code.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message