hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand
Date Tue, 28 Feb 2012 01:15:49 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ravi Prakash updated MAPREDUCE-3614:

    Attachment: MAPREDUCE-3614.patch

Thanks Daryn and Vinod for your careful reviews. Attaching a patch addressing comments from

Here are some clarifications
bq. There's some issues with the filesystem handling. It's creating a filesystem to get the
fs scheme so it can create the same filesystem again.
I chose not to modify the code which had already been checked in. For myself I'm now getting
the schema directly from the config.

bq. The code is a bit misleading about only running when a SIGTERM hits. Unless something
unregisters the shutdown hook, it runs on a clean exit too. You have to make sure that none
of the changes are going to be detrimental to a normal shutdown.
You are right! I have renamed the variables to more accurately say isSignalled etc. I reviewed
the code and it will run fine in case of a normal shutdown too and ran it on my 1 node cluster
a few times and checked the logs. For JHEH, all jobIds should have been closed in case of
a normal shutdown, so nothing will be different. In RMCommunicator there is the stopped variable
which makes sure it isn't run on stopped objects.

bq. Is there a compelling need to maintain a parallel set of JobIds? Why not check the fileMap
values to see if the writer is open? The event construction looks a bit odd, but I don't know
much about event injection.
You are right! Again! :) I'm checking for open eventWriters in fileMap now. I had to modify
EventWriter for that. But I like it better.

bq. I'd suggest reverting all the fs manipulation code, and try this:
I had already tried that approach. And failed :( The problem with using "fs.automatic.close"
was that a FileSystem object had already been opened with that parameter set to true. If you
see FileSystem.java:2080, it doesn't overwrite that value. As a result on SIGTERM, the FileSystem
object was being closed.

bq. FindBugs warning is still not fixed as Jenkins reports.
Sorry about that. Should be fixed now.
bq. Not sure why you need all the magic with the FileSystem objects, can you explain?
FileSystem registers its own Runtime shutdownhook (FileSystem.java:2076) This closes all instances
of FileSystem objects without the "magic". :( So when JHEH wants to write the JobUnsuccessfulCompletionEvent
to HDFS / move the files to done_intermediate, it would throw an IOException.
bq. You can make MRAppMaster.MRAppMasterShutdownHook extend CompositeServiceShutdownHook
Daryn pointed out I'd made another boo boo here =D I should not have been starting off another
thread in MRAppMasterShutdownHook for CompositeServiceShutdownHook. I just needed to call
stop() so there was no need for extending anymore
bq. RMCommunicator: It seems unnatural to set killed final state if the job was running. I
think you should set a flag with RMCommunicator similar to JobHistoryEventHandler.
Thanks for the suggestion. Done! :) 

bq. Trivial: isSigTermed == false can be written as !isSigTermed
bq. Again trivial: The sigterm flag will never go back to false from true. So the code snippet
"while (isSigTermed && jobIt.hasNext()) .." can be rewritten as "if (isSigTermed)
{ while .. }"
bq. For the same snippet, can you simply add a comment saying, we are doing this so as to
drain everything and only write finish-events to shutdown as fast as possible.
bq. The test seems very wired into the code. Can we instead do something like this:
Done! I've tried being as close as possible to the SIGTERM without the test becoming too onerous.
Please let me know if it needs some change still.

>  finalState UNDEFINED if AM is killed by hand
> ---------------------------------------------
>                 Key: MAPREDUCE-3614
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>             Fix For: 0.23.2
>         Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch, MAPREDUCE-3614.patch,
MAPREDUCE-3614.patch, MAPREDUCE-3614.patch, MAPREDUCE-3614.patch
> Courtesy [~dcapwell]
> {quote}
> If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would
be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would say "History" and point to
the proxy url (which will redirect to the history server).
> The state should be more descriptive that the job failed and the tracker url shouldn't
point to the history server.
> {quote}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message