hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
Date Mon, 01 Apr 2013 01:29:15 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ivan Mitic updated MAPREDUCE-5066:
----------------------------------

    Attachment: MAPREDUCE-5066.patch
                MAPREDUCE-5066.branch-1.patch
                MAPREDUCE-5066.branch-1-win.3.patch

Hi Arun,

I am attaching branch-1-win/branch-1 and branch-2 compatible patches.

A few notes on the patches:
 - Fixed a test verification issue in branch-1-win.3.patch
 - branch-1 and branch-1-win patches are fully compatible (and equivalent)
 - Branch-2 codebase changed significantly and I did my best effort to find the appropriate
forward patch. There are two implementations of the JobEndNotifier, mapred#JobEndNotifier
(based on the one from branch-1 but simplified) and mapreduce.v2.app#JobEndNotifier (new implementation).
The former is used in the LocalJobRunner and latter in the MR AppMaster. In my patch I did
the following:
    *a.* Applied the bugfixes to current state of mapred#JobEndNotifier and included the corresponding
unittests
    *b.* Given that mapreduce.v2.app#JobEndNotifier already sets the timeout to 5 seconds,
I did the same in mapred#JobEndNotifier. In other words, I did not introduce a config knob
that would allow the timeout to be configurable. My reasoning was that in branch-1, people
might see 5 second timeout as a regression and might want to change it to a different value.
In trunk, given that the timeout is already set to 5 seconds, this should be fine until proved
otherwise. Please advise if you think this is needed.
                
> JobTracker should set a timeout when calling into job.end.notification.url
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5066
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win, 2.0.3-alpha, 1.3.0
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: MAPREDUCE-5066.branch-1.patch, MAPREDUCE-5066.branch-1-win.2.patch,
MAPREDUCE-5066.branch-1-win.3.patch, MAPREDUCE-5066.branch-1-win.patch, MAPREDUCE-5066.patch
>
>
> In current code, timeout is not specified when JobTracker (JobEndNotifier) calls into
the notification URL. When the given URL points to a server that will not respond for a long
time, job notifications are completely stuck (given that we have only a single thread processing
all notifications). We've seen this cause noticeable delays in job execution in components
that rely on job end notifications (like Oozie workflows). 
> I propose we introduce a configurable timeout option and set a default to a reasonably
small value.
> If we want, we can also introduce a configurable number of workers processing the notification
queue (not sure if this is needed though at this point).
> I will prepare a patch soon. Please comment back.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message