hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7952) Find a way to persist the log aggregation status
Date Mon, 05 Mar 2018 20:25:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386700#comment-16386700
] 

Wangda Tan commented on YARN-7952:
----------------------------------

Thanks [~xgong], 

It might be better to update default value in a separate JIRA since it is not directly related
to this one.

1) nmLogAggregationStatusTracker should be part of NodeManager.java instead of CM.
2) NMLogAggregationStatusTracker:
2.1 It looks like pullLock should be a readLock and updateLocker should be a writeLock since
pull doesn't change internal data, correct?

2.2 It's better to rename pullLocker/updateLocker to read/writeLock for readability.

2.3 For overall workflow of this class, IMO it should be:
a. When log aggregation starts, application will be added to {{NMLogAggregationStatusTracker}}
with timestamp.
b. When RM acknowledges application finished, application will be removed from the {{NMLogAggregationStatusTracker}}
c. If configured timeout reaches, application will be removed from the {{NMLogAggregationStatusTracker}}.

I can see c. is handled by {{rollLogAggregationStatus}} and a. is handled by {{updateLogAggregationStatus}},
but not sure about b. When I look at logic below, I'm not sure if it works:

{code}
          if (currentTime - updateTime > rollingInterval) {
            LOG.warn("Ignore the log aggregation status update request "
                + "for the application:" + appId + ". The log aggregation status"
                + " update time is " + updateTime + " while the request process "
                + "time is " + currentTime + ".");
            return;
          }
{code}

Since the currentTime always almost equal to updateTime. The if statement should be false,
correct? And I think if application is null in NM side, we should not add it to the track,
correct?

Instead of this, should we explicitly notify {{NMLogAggregationStatusTracker}} in {{ApplicationImpl.AppLogsAggregatedTransition}}?
(It is true that {{ApplicationImpl.AppLogsAggregatedTransition}} means RM acks the log is
finished?)

2.4 I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the RM will timeout
log aggregation status after configured timeout. What's the purpose of updating the {{lastModifiedTime}}?
Will this cause OOM issue potentially? 

2.5 updateLogAggregationStatus: 
- the {{long updateTime}} parameter is not necessary, can always use System.currentTime instead.

3) {{LogAggregationTrakcer}}
- {{LogAggregationTrakcer}} => AppLogAggregationStatusForRMRecovery (it's not tracker since
it is a passive status, and it's better to emphasize purpose of this class)
- {{getLastModifiedTime}} should be {{logAggregationStartedTime}} if you agree with 2.4

4) Comment of YarnConfiguration.LOG_AGGREGATION_STATUS_TIME_OUT_MS should be updated, it will
be consumed by NM as well.

Will include more another detailed scan in the next review.

> Find a way to persist the log aggregation status
> ------------------------------------------------
>
>                 Key: YARN-7952
>                 URL: https://issues.apache.org/jira/browse/YARN-7952
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Major
>         Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch
>
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In YARN-4946:
RM should write out Aggregated Log Completion file flag next to logs, we have a discussion
on how we can get the log aggregation status: make a client call to RM or get it directly
from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way to persist
the log aggregation status first. This ticket is used to track the working progress for this
purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message