hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5340) App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app info
Date Fri, 08 Jul 2016 21:15:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368489#comment-15368489
] 

Li Lu commented on YARN-5340:
-----------------------------

Thanks for reporting this issue [~ssathish@hortonworks.com]! This is a very very interesting
discovery. I did some debug on this issue and found out the the direct reason for the missing
fields is authentication failure. The original user failed to get authentication to get the
app report. Checked into the ATS returned message, I can see something like this:
{code}
{"events":[{"timestamp":1467931672057,"eventtype":"YARN_APPLICATION_FINISHED","eventinfo":{"YARN_APPLICATION_LATEST_APP_ATTEMPT":"appattempt_1467931619679_0001_000001","YARN_APPLICATION_FINAL_STATUS":"SUCCEEDED","YARN_APPLICATION_DIAGNOSTICS_INFO":"","YARN_APPLICATION_STATE":"FINISHED"}},{"timestamp":1467931652492,"eventtype":"YARN_APPLICATION_STATE_UPDATED","eventinfo":{"YARN_APPLICATION_STATE":"RUNNING"}},{"timestamp":1467931641896,"eventtype":"YARN_APPLICATION_ACLS_UPDATED","eventinfo":{}}],"entitytype":"YARN_APPLICATION","entity":"application_1467931619679_0001","starttime":1467931641896,"domain":"DEFAULT","otherinfo":{"YARN_APPLICATION_MEM_METRIC":290014,"YARN_APPLICATION_CPU_METRIC":74,"YARN_APPLICATION_VIEW_ACLS":"hrt_5
viewtestgroup"},"primaryfilters":{},"relatedentities":{}}
{code}

Note that the application creation information has been missing in the returned information.
I found that in the level db, there are two <entityType, timestamp, entityId> tuples
created with application application_1467931619679_0001, with two different timestamps. The
application creation message is associated with a different timestamp. 

Checking the code of rolling leveldb, I can see both call-sites of RollingLevelDBTimelineStore#getAndSetStartTime
is not properly synchronized, although in the comments it says that it "Should only be called
when a lock has been obtained on the entity. " Then for two events on the same application
arrive the timeline server concurrently, something like this may happen:
1. put1 checks existing timestamp for the application, no result. 
2. put2 checks existing timestamp for the application, no result. 
3. put1 set the application entity's timestamp to be its own timestamp 
4. put2 override the application entity's timestamp to be its own timestamp. 

After the process, put1 will write its data to a key (<entityType, timestamp, entityId>)
that has a stale timestamp, which will never be read out since the time stamp is overridden
by put 2. 

The original LeveldbTimelineStore does not have this problem, because it always grab a lock
when it performs getAndSetStartTime. 

With regard to fix, probably making getAndSetStartTime synchronized will fix the problem.
I'm wondering that making checkStartTimeInDb to be synchronized would also to the trick (since
it's the only place in the process to have a read-then-update semantic). 

[~jeagles] I know you're an expert on rolling leveldb's source code, so if you have any free
bandwidth, I truly appreciate your suggestions here. Thanks! 

> App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app
info
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-5340
>                 URL: https://issues.apache.org/jira/browse/YARN-5340
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Li Lu
>            Priority: Critical
>
> App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app
info
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn --config /tmp/hadoopConf application
-status application_1467931619679_0001
> Application Report :
> Application-Id : application_1467931619679_0001
> Application-Name : null
> Application-Type : null
> User : null
> Queue : null
> Application Priority : null
> Start-Time : 0
> Finish-Time : 1467931672057
> Progress : 100%
> State : FINISHED
> Final-State : SUCCEEDED
> Tracking-URL : N/A
> RPC Port : -1
> AM Host : N/A
> Aggregate Resource Allocation : 290014 MB-seconds, 74 vcore-seconds
> Log Aggregation Status : N/A
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message