hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7585) NodeManager should go unhealthy when state store throws DBException
Date Fri, 22 Dec 2017 06:48:02 GMT

     [ https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wilfred Spiegelenburg updated YARN-7585:
----------------------------------------
    Attachment: YARN-7585.003.patch

The isHealthy must be accessible from test code and the only way to get that to work is by
setting the protection level as is. To get it more in line with how we access the inside the
class I added a getter visible for testing and made field it private.

Changed the assert to be more readable.

> NodeManager should go unhealthy when state store throws DBException 
> --------------------------------------------------------------------
>
>                 Key: YARN-7585
>                 URL: https://issues.apache.org/jira/browse/YARN-7585
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>         Attachments: YARN-7585.001.patch, YARN-7585.002.patch, YARN-7585.003.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state store does
not initialise. However if the state store becomes unavailable after that for any reason the
NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any more and
the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException:
java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
> at o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message