hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6054) TimelineServer fails to start when some LevelDb state files are missing.
Date Mon, 09 Jan 2017 18:23:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812459#comment-15812459
] 

Naganarasimha G R commented on YARN-6054:
-----------------------------------------

Thanks for the patch [~raviprakashu], 
bq. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be
a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior.
Agree, but may be we can give some kind of tool and set of steps which can be taken to over
come it as we too faced it once.  but agree its not within this jira's scope !
Changes look good enough will wait for the jenkins report and if no further comments will
commit it tomorrow !

> TimelineServer fails to start when some LevelDb state files are missing.
> ------------------------------------------------------------------------
>
>                 Key: YARN-6054
>                 URL: https://issues.apache.org/jira/browse/YARN-6054
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: YARN-6054.01.patch, YARN-6054.02.patch, YARN-6054.03.patch
>
>
> We encountered an issue recently where the TimelineServer failed to start because some
state files went missing.
> {code}
> 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
failed in state INITED
> ; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException:
Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelines
> erver/leveldb-timeline-store.ldb/127897.sst
> org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException:
Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/lev
> eldb-timeline-store.ldb/127897.sst
> 2016-11-21 20:46:43,135 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
Error starting ApplicationHistoryServer
> org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException:
Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>         at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172)
>         at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182)
> Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing
files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst
>         at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         ... 5 more
> 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1
> {code}
> Ideally we shouldn't have any missing state files. However I'd posit that the TimelineServer
should have graceful degradation instead of failing to start at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message