hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: LevelDB corruption in YARN Application TimelineServer
Date Tue, 07 Mar 2017 19:50:44 GMT
Hi Abhishek!

You might also want to pull in
https://issues.apache.org/jira/browse/YARN-6054 .

HTH
Ravi

On Mon, Mar 6, 2017 at 8:39 AM, Jason Lowe <jlowe@yahoo-inc.com.invalid>
wrote:

> Verify that something outside of Hadoop/YARN is not coming along
> periodically and removing "old" files (e.g.: tmpwatch, etc.).  Users have
> reported similar cases in the past that were tracked down to an invalid
> setup.  State was being corrupted by a periodic cleanup tool, like
> tmpwatch, removing files.
> Jason
>
>
>     On Thursday, March 2, 2017 5:59 PM, Abhishek Das <
> abhishek.besu@gmail.com> wrote:
>
>
>  Hi,
>
> I am running a hadoop 2.6.0 cluster in ec2 instances with r3.2xlarge as
> instance of the master node. YARN Application TimelineServer running in the
> master node is throwing an exception because of leveldb corruption. This
> issue seems to be happening when the cluster has been up for a long time
> (more than 7 days). The stack trace is given below.
>
> ERROR org.apache.hadoop.yarn.server.timeline.TimelineDataManager: Skip the
> timeline entity: { id: <task_id>, type: TEZ_TASK_ID }
> java.lang.RuntimeException:
> org.fusesource.leveldbjni.internal.NativeDB$DBException: *IO error:
> /media/ephemeral0/hadoop-root/yarn/timeline/leveldb-
> timeline-store.ldb/330951.sst:
> No such file or directory*
>         at
> org.fusesource.leveldbjni.internal.JniDBIterator.seek(
> JniDBIterator.java:68)
>         at
> org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.getEntity(
> LeveldbTimelineStore.java:444)
>         at
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(
> TimelineDataManager.java:257)
>         at
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.
> postEntities(TimelineWebServices.java:259)
>         at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(
> JavaMethodInvokerFactory.java:60)
>         at
> com.sun.jersey.server.impl.model.method.dispatch.
> AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(
> AbstractResourceMethodDispatchProvider.java:185)
>         at
> com.sun.jersey.server.impl.model.method.dispatch.
> ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.
> java:75)
>         at
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.
> accept(HttpMethodRule.java:288)
>         at
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.
> accept(ResourceClassRule.java:108)
>         at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.
> accept(RightHandPathRule.java:147)
>         at
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(
> RootResourceClassesRule.java:84)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
> WebApplicationImpl.java:1469)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(
> WebApplicationImpl.java:1400)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
> WebApplicationImpl.java:1349)
>         at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(
> WebApplicationImpl.java:1339)
>         at
> com.sun.jersey.spi.container.servlet.WebComponent.service(
> WebComponent.java:416)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> service(ServletContainer.java:537)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:886)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:834)
>         at
> com.sun.jersey.spi.container.servlet.ServletContainer.
> doFilter(ServletContainer.java:795)
>         at
> com.google.inject.servlet.FilterDefinition.doFilter(
> FilterDefinition.java:163)
>         at
> com.google.inject.servlet.FilterChainInvocation.doFilter(
> FilterChainInvocation.java:58)
>         at
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(
> ManagedFilterPipeline.java:118)
>         at
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(
> StaticUserWebFilter.java:96)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(
> CrossOriginFilter.java:95)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.security.authentication.server.
> AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
>         at
> org.apache.hadoop.security.token.delegation.web.
> DelegationTokenAuthenticationFilter.doFilter(
> DelegationTokenAuthenticationFilter.java:269)
>         at
> org.apache.hadoop.security.authentication.server.
> AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(
> HttpServer2.java:1242)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>
> There are lot of  .sst files in the level db directory.
> *sudo ls -lrt
> /media/ephemeral0/hadoop-root/yarn/timeline/leveldb-timeline-store.ldb/ |
> wc -l*
> *3848*
>
> After this error the ResourceManager and Tez ApplicationMaster are not able
> to post entities in the YARN ATS. So not able to see the history of the
> running jobs.
>
> Does anyone have any idea what is the root cause of this leveldb corruption
> and how to get rid off this issue.
>
> Thanks in advance.
>
> Regards,
> Abhishek
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message