hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3153) [HOD] Hod should deallocate cluster if there's a problem in writing information to the state file
Date Thu, 10 Apr 2008 09:22:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587549#action_12587549
] 

Hemanth Yamijala commented on HADOOP-3153:
------------------------------------------

- I am not sure we need the refactoring. Why can't we call hadoopCluster.deallocate from the
except code block ? Even if for some reason we cannot, I think we must refactor this differently.

{noformat}    
  def shutdown_job(self, ringClient=None):
    if ringClient is not None:
      self.__log.debug("Calling rm.stop()")
      ringClient.stopRM()
      self.__log.debug("Returning from rm.stop()")
      self.__log.info("Job Shutdown by informing ringmaster.")
    else:
      self.delete_job(self.jobId)
      self.__log.info("Job %s removed from queue directly." % self.jobId)
{noformat}
And there must be a way to get the ringClient from hadoopCluster in hodRunner.py

- In checkStateFile: I think we should check that self.__store is writable. Alternatively,
can we check if the file does not exist, by using errno to differentiate permission errors.
- Provide an accessor for _hodState__stateFile in hodState and use that.
- testAllocateWithInvalidStateStore - we can add a test case where the directory has no write
permissions.

> [HOD] Hod should deallocate cluster if there's a problem in writing information to the
state file
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3153
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.16.0
>            Reporter: Hemanth Yamijala
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-3153, HADOOP-3153.1
>
>
> Consider a scenario where hod runs allocate successfully, but isn't able to save teh
allocated information to the clusters.state file. In such a case, it gets an error and exits.
But the cluster remains allocated, and unfortunately the user cannot deallocate the cluster
now unless he knows the cluster directory.
> It is better if HOD can deallocate the cluster in such an error condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message