Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D5F5B17D46 for ; Thu, 23 Apr 2015 04:44:39 +0000 (UTC) Received: (qmail 15022 invoked by uid 500); 23 Apr 2015 04:44:39 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 14973 invoked by uid 500); 23 Apr 2015 04:44:39 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 14960 invoked by uid 99); 23 Apr 2015 04:44:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Apr 2015 04:44:39 +0000 Date: Thu, 23 Apr 2015 04:44:39 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: ----------------------------- Attachment: YARN-3411.poc.2.txt Attaching a patch that includes: - a HBaseTimelineWriterImpl class - a test class for the same - an EntityTableDetails class for storing some entity table specific constants and other functions - a TimelineWriterUtils class which has utility functions that are useful while reading from and writing to hbase tables The write function in HBaseTimelineWriterImpl class writes out the entire contents of a TimelineEntity object including it's info, config, metrics (timeseries), isRelatedTo and relatesTo fields. The metrics timeseries is written such that the hbase cell timestamp is set to the metric timestamp, the hbase cell column qualifier is the metric name and the value is the metric value. I also propose changing the TimelineMetric values to be "long" instead of "Object" (although this patch does not make that change). For the metrics column family, we should set a TTL of X days and MIN_VERSIONS = 1. That way, the timeseries info will be retained for X days by hbase and the latest value will always be retained. The test class spins up a MiniCluster via HBaseTestingUtility's startMiniCluster. It creates one entity object with info, config, metrics (timeseries), isRelatedTo and relatesTo entities and writes it to the backend by invoking the write api in HBaseTimelineWriterImpl class. The test scans the entity table and reads back the entity details and verifies the values of each field, including the timeseries. Also attaching an eclipse console log that ran the unit test. The schema creation would be along the lines of this: {code} create 'ats.entity', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'}, {NAME => 'm', VERSIONS => 2147483647, MIN_VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => '2592000'}, {NAME => 'c', COMPRESSION => 'LZO', BLOCKCACHE => false, BLOOMFILTER => 'ROWCOL' } {code} > [Storage implementation] explore the native HBase write schema for storage > -------------------------------------------------------------------------- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Vrushali C > Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native HBase schema for the write path. Such a schema does not exclude using Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)