hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7147) ATS1.5 crash due to OOM
Date Fri, 01 Sep 2017 14:58:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150673#comment-16150673
] 

Rohith Sharma K S commented on YARN-7147:
-----------------------------------------

Yes, this can be avoid by configuring _entity-group-fs-store.cache-store-class_. But wanted
to know about performance impact for serving queries. Given query performance is compromised
then we can use level db. I see there 2 implementation of level db i.e leveldb vs rolling
level db. Which one would be suitable for caching? 

> ATS1.5 crash due to OOM
> -----------------------
>
>                 Key: YARN-7147
>                 URL: https://issues.apache.org/jira/browse/YARN-7147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: Screen Shot - suspect-1.png, Screen Shot - suspect-2.png
>
>
> It is observed that in production cluster, though _app-cache-size_ is set to minimal
i.e less than 5, ATS server is going down with OOM. The _entity-group-fs-store.cache-store-class_
is configured with MemoryTimelineStore which is by default. The heap size configured for ATS
daemon is 8GB. 
> This is because ATS parse the entity log file per domain and caches it. If the domain
has lot of entity information, then in memory cache store loads all the entity information
which is causing OOM. After restart, again it caches same domain and goes OOM. 
> There are  possible way handle it are
> # threshold the number of entities loaded into in memory cache. This still can lead to
OOM if data size is huge. 
> # Based on the data size in the store. 
> We faced 1st issue where number of entities are very huge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message