hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching
Date Fri, 08 Jan 2016 16:03:40 GMT

    [ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089413#comment-15089413

Junping Du commented on YARN-4265:

Thanks [~gtCarrera9] for updating the patch. 
trunk.003 patch is getting closer, more comments: 
In yarn-default.xml,
+      How long the ATS v1.5 entity group file system storage systemwill keep an
+      application's data in the done directory.
"file system storage systemwill" should be "file system storage will"

In EntityCacheItem.java, inside of refreshCache():
+    if (needRefresh()) {
+      if (!appLogs.isDone()) {
+        appLogs.parseSummaryLogs();
+      } else if (appLogs.getDetailLogs().isEmpty()) {
+        appLogs.scanForLogs();
+      }
+      if (!appLogs.getDetailLogs().isEmpty()) {
I am a bit confused with logic here: if appLogs is not done yet, but its detail logs is empty,
do we need to scanForLogs? If not, we should document the reason at the least.

+        if (appLogs.getDetailLogs().isEmpty()) {
+          LOG.debug("cache id {}'s detail log is empty! ", groupId);
+        }
This is inside of case {{ if (!appLogs.getDetailLogs().isEmpty()) }}, does it possible for
detailLogs to become empty again? If not, we should remove it?

+          // Only refresh the log that matches the cache id
+          if (log.getFilename().contains(groupId.toString())) {
If we have two groupIds: 114859476_01_1 and 114859476_01_11, the later one's log file name
can match with previous groupId as well? If so, we may consider to match file name with cache
id more exactly? The same case with code below {{if (log.getFilename().contains(groupId.toString()))

{{appLogs.getDetailLogs().removeAll(removeList);}} should be move out of for each, or removeList
will be useless here.

In EntityGroupFSTimelineStore.java,
+          + "%04d" + Path.SEPARATOR // app num / 10000000
I think it should be app num / 1000,000 (1 million) but not 10 millions.

+  private final Map<TimelineEntityGroupId, EntityCacheItem> cachedLogs
+      = Collections.synchronizedMap(
+      new LinkedHashMap<TimelineEntityGroupId, EntityCacheItem>(
+          appCacheMaxSize + 1, 0.75f, true) {
It should move cachedLogs initiaization into serviceInit after appCacheMaxSize is setup already.

+  @SuppressWarnings("serial")
+  @Override
+  protected void serviceInit(Configuration conf) throws Exception {
{{SuppressWarnings("serial")}} seems to be unnecessary?

In loadPlugIns(),
+      } catch (Exception e) {
+        LOG.info("Error loading plugin " + name, e);
+      }
It should use warn instead of info level for log message.

+      LOG.debug("Load plugin class {}", cacheIdPlugin.getClass().getName());
Better to use info instead of debug to follow the same practice with other messages in serviceInit.

For cleanLogs(Path dirpath), it seems like the execution result of cleanup log depends on
the order of files/directories returned. Say an app dir include: file A, dir B, file A is
a fresh one and all files in dir B are older than logRetainMillis. If file A get return first,
the cleanLogs() do nothing, but if dir B get return first, cleanLogs() will clenup dir B.
Give {{fs.listStatusIterator(dirpath)}} could return file A, dir B in randomly order, is this
randomly behavior expected?

    // scans for new logs and returns the modification timestamp of the
    // most recently modified log
    long scanForLogs() throws IOException {
Can we directly return appDirPath's modification time instead of go through all sub directories?
Besides domain, summary, entity logs, anything else could exist under this directory?

      for (LogInfo log : summaryLogs) {
        if (log.getFilename().equals(filename)
            && log.getAttemptDirName().equals(attemptDirName)) {
Is it a common case for a AppLogs have many summaryLogs (and detail logs)? I guess not, but
if my guess is wrong, we may should to use hashmap instead of list which costs to much time
in search.

In LogInfo.parsePath(),
    try {
      try {
        parser = jsonFactory.createJsonParser(in);
        parser.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
      } catch (IOException e) {
        // if app hasn't completed then there may be errors due to the
        // incomplete file which are treated as EOF until app completes
        if (appCompleted) {
          throw e;
        } else {
          LOG.debug("Exception in parse path: {}", e.getMessage());
      return doParse(tdm, parser, objMapper, ugi, appCompleted);
Even we get rid of throw exception for app incompleted case, doParse() below will throw a
NPE exception. Is this expected?

In PluginStoreTestUtils.java,
  static TimelineEntities generateTestEntities() {
        TimelineEntities entities = new TimelineEntities();
Indentation issue here.

> Provide new timeline plugin storage to support fine-grained entity caching
> --------------------------------------------------------------------------
>                 Key: YARN-4265
>                 URL: https://issues.apache.org/jira/browse/YARN-4265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-4265-trunk.001.patch, YARN-4265-trunk.002.patch, YARN-4265-trunk.003.patch,
YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
> To support the newly proposed APIs in YARN-4234, we need to create a new plugin timeline
store. The store may have similar behavior as the EntityFileTimelineStore proposed in YARN-3942,
but cache date in cache id granularity, instead of application id granularity. Let's have
this storage as a standalone one, instead of updating EntityFileTimelineStore, to keep the
existing store (EntityFileTimelineStore) stable. 

This message was sent by Atlassian JIRA

View raw message