hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
Date Sat, 09 Jan 2016 01:33:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Li Lu updated YARN-4265:
    Attachment: YARN-4265-trunk.004.patch

Thanks [~djp] for the review! I updated my patch according to your comments. Some quick comments:

bq. I am a bit confused with logic here: if appLogs is not done yet, but its detail logs is
empty, do we need to scanForLogs? If not, we should document the reason at the least.
Yes, we only update summary logs when the app is running. Updated comments for this. 

bq. If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later one's log file
name can match with previous groupId as well? If so, we may consider to match file name with
cache id more exactly? The same case with code below {{if (log.getFilename().contains(groupId.toString()))
Nice catch! What I'm trying to address here is the names with entity group id and a sequence
number. I've updated related logic here. 

bq. For cleanLogs(Path dirpath), it seems like the execution result of cleanup log depends
on the order of files/directories returned. Say an app dir include: file A, dir B, file A
is a fresh one and all files in dir B are older than logRetainMillis. If file A get return
first, the cleanLogs() do nothing, but if dir B get return first, cleanLogs() will clenup
dir B. Give fs.listStatusIterator(dirpath) could return file A, dir B in randomly order, is
this randomly behavior expected?
This is not possible because in the first part of cleanLogs(), we're only doing a DFS to decide
if we need to remove this dir. If any file in the directory is new, we will not remove it.
The detailed remove logic happens after the DFS process. 

bq. Is it a common case for a AppLogs have many summaryLogs (and detail logs)? 
Right now we're not facing this kind of use case. We can certainly optimize this logic in
future though. 

bq. Can we directly return appDirPath's modification time instead of go through all sub directories?
I believe we cannot. We're trying to return the latest time any file within a directory has
been changed to decide if the app is in UNKNOWN state for long enough in parseSummaryLogs.

> Provide new timeline plugin storage to support fine-grained entity caching
> --------------------------------------------------------------------------
>                 Key: YARN-4265
>                 URL: https://issues.apache.org/jira/browse/YARN-4265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, YARN-4265-trunk.003.patch,
YARN-4265-trunk.004.patch, YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
> To support the newly proposed APIs in YARN-4234, we need to create a new plugin timeline
store. The store may have similar behavior as the EntityFileTimelineStore proposed in YARN-3942,
but cache date in cache id granularity, instead of application id granularity. Let's have
this storage as a standalone one, instead of updating EntityFileTimelineStore, to keep the
existing store (EntityFileTimelineStore) stable. 

This message was sent by Atlassian JIRA

View raw message