hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-2529) Recognize Jetty bug 1342 and handle it
Date Sat, 04 Jun 2011 03:25:47 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated MAPREDUCE-2529:
-------------------------------------

    Attachment: M2529-1.patch
                M2529-1-20s.patch

Minor nits:
* As a default, always incrementing the metric for undefined regex probably makes more sense
* {{null}} is probably a better default than the empty string
* There's a possible NPE if the exception message is {{null}}
* The unit test is setting combinations of the stack/message regex, but it calls {{checkStackException}}
in a few places, which doesn't exercise that logic (I think it's covered, but that could be
clearer)
* While this will be useful while we work around bugs emerging from Jetty, we should probably
keep it as an undocumented config setting.
* The trunk patch updates {{MRJobConfig}}, which is for user jobs. Moved to {{JTConfig}}

This slight modification defines exceptions with {{null}} messages as matching no regexp.
Let me know if it looks OK to you

> Recognize Jetty bug 1342 and handle it
> --------------------------------------
>
>                 Key: MAPREDUCE-2529
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.204.0, 0.23.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: M2529-1-20s.patch, M2529-1.patch, jetty1342-20security.patch, mapred2529-trunk.patch
>
>
> We are seeing many instances of the Jetty-1342 (http://jira.codehaus.org/browse/JETTY-1342).
The bug doesn't cause Jetty to stop responding altogether, some fetches go through but a lot
of them throw exceptions and eventually fail. The only way we have found to get the TT out
of this state is to restart the TT.  This jira is to catch this particular exception (or perhaps
a configurable regex) and handle it in an automated way to either blacklist or shutdown the
TT after seeing it a configurable number of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message