hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
Date Thu, 17 Sep 2015 18:37:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803384#comment-14803384
] 

Sangjin Lee commented on YARN-4074:
-----------------------------------

{quote}
In TimelineEntityReader#readMetrics it seems safe to assume that if we have more than one
value that this is a TimelineMetric.Type.TIME_SERIES.
Conversely it doesn't have to be true though right? I guess we'll just assume that for timelines
we'd never have just one value? I can't quite oversee the impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE
if only one value has been written to HBase yet.
{quote}

That's right. We discussed this some time ago, and we think it'd be safer if the metric type
(single value vs. time series) were stored/persisted. But there are other dimensions of metrics
we may need to store (e.g. long vs. float, whether to aggregate, etc.). Also, there is a question
of what if users wrote inconsistent data. So, at that time we went with a simple decision
that's currently there (the code you see in {{TimelineEntityReader}} is refactored out of
{{HBaseTimelineReaderImpl}} so it's not new code).

We should come to a conclusion on how to store/encode various dimensions of metrics, but not
as part of this JIRA.

{quote}
Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should consider making the
app_id a compound object that is stored with a ? separator. The prefix (in most cases in yarn
right now would be "application_") would be separate and the RM start time and the final numeric
part would be stored as a numerical value with a separate Bytes.to... conversion.

Otherwise we'll end up getting incorrect order for rowkeys when the application id wraps to
10K and each power of ten after that. For example, lexically application_1442351767756_10000
< application_1442351767756_9999

If we just access the application by specific key this doesn't matter, but if we do a row-scan
and count on ordering to set an appropriate stop on the scan, we'll break things.
This happens on all rowkeys with the app_id in it.
{quote}

That's a good point. We need to fix this, or we'll have incorrect orders/results happening
with queries. This impacts anywhere we rely on the app id order (as string). I'll file a separate
JIRA to address this issue.

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.007.patch, YARN-4074-YARN-2928.POC.001.patch,
YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch,
YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as implementation
of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message