Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB2C4172B0 for ; Thu, 17 Sep 2015 18:37:21 +0000 (UTC) Received: (qmail 86413 invoked by uid 500); 17 Sep 2015 18:37:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 86376 invoked by uid 500); 17 Sep 2015 18:37:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 86364 invoked by uid 99); 17 Sep 2015 18:37:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Sep 2015 18:37:05 +0000 Date: Thu, 17 Sep 2015 18:37:05 +0000 (UTC) From: "Sangjin Lee (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803384#comment-14803384 ] Sangjin Lee commented on YARN-4074: ----------------------------------- {quote} In TimelineEntityReader#readMetrics it seems safe to assume that if we have more than one value that this is a TimelineMetric.Type.TIME_SERIES. Conversely it doesn't have to be true though right? I guess we'll just assume that for timelines we'd never have just one value? I can't quite oversee the impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE if only one value has been written to HBase yet. {quote} That's right. We discussed this some time ago, and we think it'd be safer if the metric type (single value vs. time series) were stored/persisted. But there are other dimensions of metrics we may need to store (e.g. long vs. float, whether to aggregate, etc.). Also, there is a question of what if users wrote inconsistent data. So, at that time we went with a simple decision that's currently there (the code you see in {{TimelineEntityReader}} is refactored out of {{HBaseTimelineReaderImpl}} so it's not new code). We should come to a conclusion on how to store/encode various dimensions of metrics, but not as part of this JIRA. {quote} Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should consider making the app_id a compound object that is stored with a ? separator. The prefix (in most cases in yarn right now would be "application_") would be separate and the RM start time and the final numeric part would be stored as a numerical value with a separate Bytes.to... conversion. Otherwise we'll end up getting incorrect order for rowkeys when the application id wraps to 10K and each power of ten after that. For example, lexically application_1442351767756_10000 < application_1442351767756_9999 If we just access the application by specific key this doesn't matter, but if we do a row-scan and count on ordering to set an appropriate stop on the scan, we'll break things. This happens on all rowkeys with the app_id in it. {quote} That's a good point. We need to fix this, or we'll have incorrect orders/results happening with queries. This impacts anywhere we rely on the app id order (as string). I'll file a separate JIRA to address this issue. > [timeline reader] implement support for querying for flows and flow runs > ------------------------------------------------------------------------ > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)