hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2847) [HOD] Idle cluster cleanup does not work if the JobTracker becomes unresponsive to RPC calls
Date Thu, 21 Feb 2008 18:31:19 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hemanth Yamijala updated HADOOP-2847:
-------------------------------------

    Attachment: hadoop-2847

This patch adds some error handling around code which calls the hadoop client to determine
number of running jobs. If an exception is thrown here, typically due to SocketTimeout or
SocketException, the error code from the hadoop client is captured and used to determine idleness
time.

> [HOD] Idle cluster cleanup does not work if the JobTracker becomes unresponsive to RPC
calls
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2847
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.16.0
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.16.1
>
>         Attachments: hadoop-2847
>
>
> In some erroneous conditions, the Hadoop JobTracker becomes unresponsive to RPC calls
(for e.g. if a misconfiguration causes the JobTracker to run out of memory). In such cases,
a cluster allocated by HOD no longer runs any jobs and is wastefully holding up nodes. The
usual idle cluster cleaner should deallocate the cluster ideally, but it does not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message