hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hanisha Koneru (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDDS-609) On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED state
Date Thu, 11 Oct 2018 20:11:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hanisha Koneru updated HDDS-609:
--------------------------------
    Status: Patch Available  (was: Open)

> On restart, SCM does not exit chill mode as it expects DNs to report containers in ALLOCATED
state
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-609
>                 URL: https://issues.apache.org/jira/browse/HDDS-609
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Namit Maheshwari
>            Assignee: Hanisha Koneru
>            Priority: Major
>         Attachments: HDDS-609.001.patch
>
>
> Note: Updated the description to describe the root cause of the bug and moved the error
logs to comments.
> On restart, SCM can exit chill mode only if it receives report of 99% (default) of containers
from the DNs. 
> SCM includes containers in ALLOCATED state in calculating the total number of containers.
But since ALLOCATED containers are not reported by DNs, the calculation of percentage of reported
containers is misconstrued.
> {code:java}
> For example, say we have 1DN in the cluster and we restart SCM.
> Total number of containers in SCM ContainerMap = 20
> Containers in OPEN state = 2
> Containers in ALLOCATED state = 18
> Containers reported by DN on SCM restart = 2 
> Fraction of reported containers as calculated by SCMChillNodeManager = (2/20) = 0.10
>  {code}
> We should not include the ALLOCATED containers while calculating the total number of
containers for chill mode exit rule. Otherwise, for scenarios such as above, SCM can never
come out of chill mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message