hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2980) Fetch failures and other related issues in Jetty 6.1.26
Date Tue, 31 Jan 2012 18:46:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197107#comment-13197107
] 

Todd Lipcon commented on MAPREDUCE-2980:
----------------------------------------

Hey Kihwal. 6.1.27 still hasn't been released. We've been shipping a patched version of 6.1.26
with some fixes provided by Greg Wilkins - the tag is here: https://github.com/toddlipcon/jetty-hadoop-fix/tree/6.1.26.cloudera.1

The problems aren't 100% gone with this build but they seem to be improved -- at least nothing's
been escalated to me in a few months, so I'm assuming it's a good sign! The other patch we've
recently added is MAPREDUCE-3184, which is similar to the health check script approach - it
just suicides the TT if it detects the problem.
                
> Fetch failures and other related issues in Jetty 6.1.26
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-2980
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2980
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> Since upgrading Jetty from 6.1.14 to 6.1.26 we've had a ton of HTTP-related issues, including:
> - Much higher incidence of fetch failures
> - A few strange file-descriptor related bugs (eg MAPREDUCE-2389)
> - A few unexplained issues where long "fsck"s on the NameNode drop out halfway through
with a ClosedChannelException
> Stress tests with 10000Map x 10000Reduce sleep jobs reliably reproduce fetch failures
at a rate of about 1 per million on a 25 node test cluster. These problems are all new since
the upgrade from 6.1.14.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message