hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7585) NodeManager should go unhealthy when state store throws DBException
Date Sun, 10 Dec 2017 23:48:03 GMT

     [ https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wilfred Spiegelenburg updated YARN-7585:
----------------------------------------
    Attachment: YARN-7585.001.patch

Patch to maek service unhealthy when the DB throws errors after it has restored and the NM
is up and running. Failures during restore will mark the NM as failed during startup.

> NodeManager should go unhealthy when state store throws DBException 
> --------------------------------------------------------------------
>
>                 Key: YARN-7585
>                 URL: https://issues.apache.org/jira/browse/YARN-7585
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>         Attachments: YARN-7585.001.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state store does
not initialise. However if the state store becomes unavailable after that for any reason the
NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any more and
the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException:
java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log:
Read-only file system
> at o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message