Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B21817F14 for ; Mon, 14 Sep 2015 07:11:46 +0000 (UTC) Received: (qmail 38198 invoked by uid 500); 14 Sep 2015 07:11:46 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 38149 invoked by uid 500); 14 Sep 2015 07:11:46 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 38132 invoked by uid 99); 14 Sep 2015 07:11:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Sep 2015 07:11:46 +0000 Date: Mon, 14 Sep 2015 07:11:46 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3901?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3901: ----------------------------- Attachment: YARN-3901-YARN-2928.7.patch Attaching patch v7 that addresses Sangjin's review suggestions as well as t= hose discussed with Joep offline. I have also fixed the findbugs warnings. > Populate flow run data in the flow_run & flow activity tables > ------------------------------------------------------------- > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Vrushali C > Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, YARN-3901-YARN-2928.2.p= atch, YARN-3901-YARN-2928.3.patch, YARN-3901-YARN-2928.4.patch, YARN-3901-Y= ARN-2928.5.patch, YARN-3901-YARN-2928.6.patch, YARN-3901-YARN-2928.7.patch > > > As per the schema proposed in YARN-3815 in https://issues.apache.org/jira= /secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run tabl= e.=20 > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow ve= rsion > RM=E2=80=99s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency= than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, = even if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and decrem= ented on app completion. > - For min_start_time the RM writer will simply write a value with the tag= for the applicationId. A coprocessor will return the min value of all writ= ten values. -=20 > - Upon flush and compactions, the min value between all the cells of this= column will be written to the cell without any tag (empty tag) and all the= other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or ca= n indicate running (1) or complete (2). In those cases (for metrics) only c= omplete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications= are completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the fl= ow numbers are retained in a separate column for historical tracking: we do= n=E2=80=99t want to re-aggregate for those upon replay > =0C -- This message was sent by Atlassian JIRA (v6.3.4#6332)