hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlo Curino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5176) Preemptable annotations (to support preemption in MR)
Date Wed, 24 Apr 2013 00:03:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639864#comment-13639864

Carlo Curino commented on MAPREDUCE-5176:

In this patch, we introduce an annotation used to express a property of user defined classes
(such as Reducer and OutputCommitter). The annotation is @Preemptable, and the intended semantics
is that the tagged class is safe to be preempted between invocations. The use of an annotation
instead of interfaces allows us to avoid automatic (possibly involuntary) inheritance.

More concretely: 

# stateless operators:  a simple use case for Reducers is when the user defined function is
a "pure" reducer, i.e., a Reducer that does not maintain state across key-groups (or if it
does is for performance and it is not required for correctness). Note that the default class
Reducer.java is indeed a "pure" reducer, hence it is tagged with @Preemptable, however a user
supplied reducer must explicitly state this if it wants to be treated as preemptable. If the
@Preemptable annotation is provided the system can automatically handle preemption, by saving
the output produced so far  and subsequently restart the execution of this task from the next
key group. (this will be posted in separate patches/jiras)

# statefull operators:  advanced users can also tag as @Preemptable non-pure reducers  (i.e.,
reducers that accumulate non-trivial state across key boundaries), however the default preemption
mechanism we provide will not be sufficient, and the user will be required to override default
checkpoint/restart logic, to include operator-specific state saving and retreival.  
# for OutputCommitter being @Preemptable means that the output committer can be used to commit
partial output from a given task. In order to handle failure scenarios we also require the
OutputCommitter to provide a cleanupPartialOutput(TaskAttemptId tid) method that can be invoked
by the system to completely reset the execution for a given task.  The simple case we show
in the patch is an extended version of FileOutputCommitter, in which we provide a simple mechanism
to commit partial output for a task (by including the task_attempt_id in the file name), and
an equivalent cleanup functionality.

Note that this is a first use of annotations to describe properties of user-provided classes,
it is easy to imagine several other such use cases, e.g., @KeyPreserving, @OrderPreserving,
 etc… which could be used to pipeline maps and reduces, or to leverage JVM reuse etc. 

This is part of umbrella JIRA MAPREDUCE-4584, and is related to the preemption protocol changes
discussed in YARN-45, and supported in YARN-567, YARN-568, and YARN-569. 

> Preemptable annotations (to support preemption in MR)
> -----------------------------------------------------
>                 Key: MAPREDUCE-5176
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5176
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
> Proposing a patch that introduces a new annotation @Preemptable that represents to the
framework property of user-supplied classes (e.g., Reducer, OutputCommiter). The intended
semantics is that a tagged class is safe to be preempted between invocations. 
> (this is in spirit similar to the Output Contracts of [Nephele/PACT | https://stratosphere.eu/sites/default/files/papers/ComparingMapReduceAndPACTs_11.pdf])

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message