hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2247) Mappers fail easily due to repeated failures
Date Thu, 13 Dec 2007 13:31:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12551499

Hadoop QA commented on HADOOP-2247:

-1 overall.  Here are the results of testing the latest attachment 
against trunk revision r603824.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs -1.  The patch appears to introduce 1 new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1335/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1335/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1335/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1335/console

This message is automatically generated.

> Mappers fail easily due to repeated failures
> --------------------------------------------
>                 Key: HADOOP-2247
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2247
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>         Environment: 1400 Node hadoop cluster
>            Reporter: Srikanth Kakani
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.15.2
>         Attachments: HADOOP-2220.patch
> Related to HADOOP-2220, problem introduced in HADOOP-1158
> At this scale hardcoding the number of fetch failures to a static number: in this case
3 is never going to work. Although the jobs we are running are loading the systems 3 failures
can randomly occur within the lifetime of a map. Even fetching the data can cause enough load
for so many failures to occur.
> We believe that number of tasks and size of cluster should be taken into account. Based
on which we believe that a ratio between total fetch attempts and total failed attempts should
be taken into consideration.
> Given our experience with a task should be declared "Too many fetch failures" based on:
> failures > n /*could be 3*/ && (failures/total attempts) > k% /*could be
> Basically the first factor is to give some headstart to the second factor, second factor
then takes into account the cluster size and the task size.
> Additionally we could take recency into account, say failures and attempts in last one
hour. We do not want to make it too small.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message