hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9485) TableOutputCommitter should implement recovery if we don't want jobs to start from 0 on RM restart
Date Wed, 04 Dec 2013 00:25:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838382#comment-13838382

Vinod Kumar Vavilapalli commented on HBASE-9485:

bq. That's it?
Things have changed with MRv2 in a way.
 - In Hadoop 1, if the JobTracker goes down, the users were responsible for any cleanup of
temporary data from before and resubmit jobs afresh. This also avoided multiple incarnations
of a job to run at the same time.
 - With Hadoop 2, ResourceManager automatically restarts per job ApplicationMaster(AM) in
case of node/cluster failures and also enables one to not lose old completed work from the
previous incarnation of the jobs. So two things need to happen, promote outputs from previous
incarnation and also avoid multiple ApplicationMasters of the same job don't conflict. We
designed the recoverTask() API for that reason - the second AM invokes this API for every
taskAttempt that succeeded - the implementation can chose to promote output from the previous
AM in an implementation specific manner.

Seems like with this patch, all the old work that was already 'committed' into HBase is automatically
retained and any redone work will automatically replace old outputs because of HBase put-idempotency.

It's this easy apparently because HBase OutputCommitter doesn't have a staging table to account
for job failures. So, if a job fails half-way through, the table is 'corrupted' and users
depend on external mechanisms to clean it up?

> TableOutputCommitter should implement recovery if we don't want jobs to start from 0
on RM restart
> --------------------------------------------------------------------------------------------------
>                 Key: HBASE-9485
>                 URL: https://issues.apache.org/jira/browse/HBASE-9485
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 9485-v2.txt
> HBase extends OutputCommitter which turns recovery off. Meaning all completed maps are
lost on RM restart and job starts from scratch. FileOutputCommitter implements recovery so
we should look at that to see what is potentially needed for recovery.

This message was sent by Atlassian JIRA

View raw message