hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-613) Scheduler kills job too silently when out of slots
Date Fri, 19 Oct 2012 18:07:11 GMT

     [ https://issues.apache.org/jira/browse/HAMA-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thomas Jungblut updated HAMA-613:

    Comment: was deleted

(was: There is a small bug.

bq. if (maxTasks < job.getNumBspTask()) {

If we have let's say 10 tasks for a job, and the maxTasks (which is maximum tasks in cluster
minus the number of running tasks) are 10 as well (for example if no job runs, but we have
10 slots. This will fail.

Proposing to change this to > instead of lower than.

Certainly a change in naming will also be better suited:

 int availableSlots = clusterStatus.getMaxTasks() - clusterStatus.getTasks();
    if (availableSlots > job.getNumBspTask()) {
      LOG.error("Job failed! No more taks slots available");
> Scheduler kills job too silently when out of slots
> --------------------------------------------------
>                 Key: HAMA-613
>                 URL: https://issues.apache.org/jira/browse/HAMA-613
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Yuesheng Hu
>            Priority: Blocker
>             Fix For: 0.6.0
>         Attachments: HAMA-613.patch
> If for example a user submits two text files as input, it will sometimes be split in
4 chunks.
> This usually exceeds the number of tasks that are available in the cluster (if out of
the box installation just have 3 tasks configured).
> Mainly two questions that pop into my mind:
> -Why are two text files split into 4 tasks if the BSPJobClient should check if it exceeds
the number of available task slots?
> -Why does the Client schedules the job if it knows that there are not enough slots available?
> Of course this should yield into a less cryptic error message. Well, actually currently
there is no error messages, constantly confusing users.
> This is a blocker for 0.6.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message