hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-230) Make changes for RM restart phase 1
Date Sat, 24 Nov 2012 10:13:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503371#comment-13503371

Bikas Saha commented on YARN-230:

Ignoring orphaned attempts was explicitly added because of the current implementation of deleting
the app info first to act as a marker about app completion even if attempts failed to get
deleted after that. The code should have discarded orphaned attempts from the store but I
forgot about that. Good catch! Will fix.
The store and remove methods have been made mirrors because it helps maintain symmetry of
operations that is logically clear. An actual implementation could choose to remove the entire
app data including attempts in removeApplication() making removeApplicationAttempt() a no-op.
So that alternative is not precluded in the current interface while still maintaining flexibility
at the interface.
Also, the directory implementation can still be done in which case removeApplication() could
call FS.delete(Path_to_dir) and removeApplicationAttempt() would remove the attempt file under
the app directory or return success if the app dir has already been deleted.
I chose to not use directories for FileSystem because one could put a key value store behind
a FileSystem interface and I am not sure how directories would work in them. Also rmdir is
atomic on HDFS but may not be atomic on every file system.
For HDFS one could certainly write a directory based file structure for store in which apps
would have their own directories. But IMO the best implementation might be a transaction log
type implementation similar to what HBase uses I think. It might also have better HA characteristics
because HDFS guarantees single writer to a file. That however requires considerable investment
of time.
It is really hard to guarantee atomicity of removal when we dont know how the file system
is implemented. We could use log structure implementations or for HDFS we could use atomic
rmdir. Also, lets look at the following scenario. We cannot removeApplication() until we know
that the AM has exited and the job is really done. Just after the RM knows that the job is
done, the RM could die before updating state. So upon restart we can never guarantee that
a completed application was recorded as completed. This is one reason why I chose not to make
the state machine wait for removeApplication() to complete.
One improvement would be to update the store with an attempts final state (failed/killed/succeeded)
and wait for it to be recorded before completing the state machine. This would allow us to
not count killed as failed and also complete the application state machine if the last attempt
had succeeded. This would implement the preferable solution in your second point above. This
would still be an optimization since the RM could fail before storing the attempt state (like
above) and we are back to square one. I would like to make this change after YARN-218 is done
so that all related changes can be made together.
I consciously chose to not provide defaults for the store because I think its important that
users understand and think about it when they enable a store. And changing the config helps
trigger important questions like which store works for me, what permissions are needed etc.
I you still feel strongly about it then I could add defaults like you suggest. Could you please
help by providing a good system path. I am not quite familiar with typical rules used to determine
I have explained the temporary choice of Exception on YARN-231.
I will address the remaining comments in the next patch.

Thanks for all the feedback. This is a good discussion. I am sure that there are improvements
to be made. Unless there are big issues with the current state of the work it would be great
if we can commit it and address improvements in subsequent sub-tasks. This would help keep
the changes smaller and easier to manage. The current code refactors and places basic interface/infrastructure
in place. What do you think?

> Make changes for RM restart phase 1
> -----------------------------------
>                 Key: YARN-230
>                 URL: https://issues.apache.org/jira/browse/YARN-230
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, YARN-230.1.patch
> As described in YARN-128, phase 1 of RM restart puts in place mechanisms to save application
state and read them back after restart. Upon restart, the NM's are asked to reboot and the
previously running AM's are restarted.
> After this is done, RM HA and work preserving restart can continue in parallel. For more
details please refer to the design document in YARN-128

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message