hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3851) Allow more aggressive action on detection of the jetty issue
Date Sat, 18 Feb 2012 00:49:59 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Graves updated MAPREDUCE-3851:
-------------------------------------

    Target Version/s: 1.0.1
        Release Note: 
added new configuration variables to control when TT aborts if it sees a certain number of
exceptions:

    // Percent of shuffle exceptions (out of sample size) seen before it's
    // fatal - acceptable values are from 0 to 1.0, 0 disables the check.
    // ie. 0.3 = 30% of the last X number of requests matched the exception,
    // so abort.
      conf.getFloat(
          "mapreduce.reduce.shuffle.catch.exception.percent.limit.fatal", 0);

    // The number of trailing requests we track, used for the fatal
    // limit calculation
      conf.getInt("mapreduce.reduce.shuffle.catch.exception.sample.size", 1000);
              Status: Patch Available  (was: Open)
    
> Allow more aggressive action on detection of the jetty issue
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-3851
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3851
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.0.0
>            Reporter: Kihwal Lee
>            Assignee: Thomas Graves
>             Fix For: 1.1.0, 1.0.1
>
>         Attachments: MAPREDUCE-3851.patch, MAPREDUCE-3851.patch
>
>
> MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I propose
we add a periodic check inside TT and configurable action to self-destruct. Blacklisting helps
but is not enough. Hung jetty still accepts connection and it takes very long time for clients
to fail out. Short jobs are delayed for hours because of this. This feature will be a nice
companion to MAPREDUCE-3184.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message