hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted
Date Mon, 29 Feb 2016 18:24:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172304#comment-15172304

Sangjin Lee commented on YARN-4700:

I may have misread the comments in haste last Friday. If the comments meant that we would
use the event timestamps instead of the current time and calculate the top-of-the-day timestamps
from them, then I concur. If they meant that we would use the actual event timestamps *as
is* for the row key, I'm not as sure.

My main concern there is it might make some of the queries we want to do against this table
in the future harder or make them perform more poorly. For example, we could do a query like
"return all flow activities in the last 7 days". With a top-of-the-day timestamps, it would
be a simple partial row key matching. With variable timestamps, it would become more of a
range query. Are my concerns overblown?

If the solution we're discussing is the former, then I think it's quite straightforward. We
need a little bit of change in {{FlowActivityRowKey.getRowKey()}} where we should apply {{TimelineStorageUtils.getTopOfTheDayTimestamp()}}
on the provided timestamp.

> ATS storage has one extra record each time the RM got restarted
> ---------------------------------------------------------------
>                 Key: YARN-4700
>                 URL: https://issues.apache.org/jira/browse/YARN-4700
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Li Lu
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
> When testing the new web UI for ATS v2, I noticed that we're creating one extra record
for each finished application (but still hold in the RM state store) each time the RM got
restarted. It's quite possible that we add the cluster start timestamp into the default cluster
id, thus each time we're creating a new record for one application (cluster id is a part of
the row key). We need to fix this behavior, probably by having a better default cluster id.

This message was sent by Atlassian JIRA

View raw message