hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5432) Lock already held by another process while LevelDB cache store creation for dag
Date Tue, 26 Jul 2016 18:35:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394287#comment-15394287
] 

Li Lu commented on YARN-5432:
-----------------------------

Thanks for reporting this issue [~karams]! 

The main cause of this issue is that after concurrency changes in YARN-4987, it is possible
for readers to hold a cache item from being released. If during this period another read request
to the same entity group id occurs, the storage will try to create a new cache on the same
file location. This will cause the locking issue on the leveldb. This also explains why the
problem is severe when cache size is small and reader contention is high: with smaller cache
sizes, cache evictions are more frequent. At the same time, higher reader contention will
introduce higher chances for readers to "hold" a cache storage.

> Lock already held by another process while LevelDB cache store creation for dag
> -------------------------------------------------------------------------------
>
>                 Key: YARN-5432
>                 URL: https://issues.apache.org/jira/browse/YARN-5432
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>    Affects Versions: 2.8.0, 2.7.3
>            Reporter: Karam Singh
>            Assignee: Li Lu
>
> While running ATS  stress tests,  15 concurrent ATS reads (python thread which gives
ws/v1/time/TEZ_DAG_ID, ws/v1/time/TEZ_VERTEX_DI?primaryFilter=TEZ_DAG_ID:<dag_id> etc)
calls.
> Note: Summary store for ATSv1.5 is RLD, but as we for each dag/application ATS also creates
leveldb cache when vertex/task/taskattempts information is queried from ATS.
>  
> Getting following type of excpetion very frequently in ATS logs :- 
> 2016-07-23 00:01:56,089 [1517798697@qtp-1198158701-850] INFO org.apache.hadoop.service.AbstractService:
Service LeveldbCache.timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832
failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO
error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK:
already held by process
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: lock /grid/4/yarn_ats/atsv15_rld/timelineEntityGroupId_1469090881194_4832_application_1469090881194_4832-timeline-cache.ldb/LOCK:
already held by process
>         at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
>         at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
>         at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
>         at org.apache.hadoop.yarn.server.timeline.LevelDBCacheTimelineStore.serviceInit(LevelDBCacheTimelineStore.java:108)
>         at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at org.apache.hadoop.yarn.server.timeline.EntityCacheItem.refreshCache(EntityCacheItem.java:113)
>         at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getCachedStore(EntityGroupFSTimelineStore.java:1021)
>         at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresFromCacheIds(EntityGroupFSTimelineStore.java:936)
>         at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getTimelineStoresForRead(EntityGroupFSTimelineStore.java:989)
>         at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1041)
>         at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
>         at org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
>         at org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:117)
>         at sun.reflect.GeneratedMethodAccessor82.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>         at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>         at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>         at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>         at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>         at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>         at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>         at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>         at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>         at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>         at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>         at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>         at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message