hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3376) [HOD] HOD should have a way to detect and deal with clusters that violate/exceed resource manager limits
Date Mon, 19 May 2008 15:32:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597978#action_12597978
] 

Hadoop QA commented on HADOOP-3376:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12382301/HADOOP-3376.1
  against trunk revision 656939.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2498/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2498/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2498/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2498/console

This message is automatically generated.

> [HOD] HOD should have a way to detect and deal with clusters that violate/exceed resource
manager limits
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3376
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3376
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: checklimits.sh, HADOOP-3376, HADOOP-3376.1
>
>
> Currently If we set up resource manager/scheduler limits on the jobs submitted, any HOD
cluster that exceeds/violates these limits may 1) get blocked/queued indefinitely or 2) blocked
till resources occupied by old clusters get freed. HOD should detect these scenarios and deal
intelligently, instead of just waiting for a long time/ for ever. This means more and proper
information to the submitter.
> (Internal) Use Case:
>      If there are no resource limits, users can flood the resource manager queue preventing
other users from using the queue. To avoid this, we could have various types of limits setup
in either resource manager or a scheduler - max node limit in torque(per job limit), maxproc
limit in maui (per user/class), maxjob limit in maui(per user/class) etc. But there is one
problem with the current setup - for e.g if we set up maxproc limit in maui to limit the aggregate
number of nodes by any user over all jobs, 1) jobs get queued indefinitely if jobs exceed
max limit and 2) blocked if it asks for nodes < max limit, but some of the resources are
already used by jobs from the same user. This issue addresses how to deal with scenarios like
these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message