hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
Date Thu, 23 Apr 2015 04:44:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vrushali C updated YARN-3411:
-----------------------------
    Attachment: YARN-3411.poc.2.txt

Attaching a patch that includes:
-  a HBaseTimelineWriterImpl class
- a test class for the same
- an EntityTableDetails class for storing some entity table specific constants and other functions
- a TimelineWriterUtils class which has utility functions that are useful while reading from
and writing to hbase tables

The write function in HBaseTimelineWriterImpl class writes out the entire contents of a TimelineEntity
object including it's info, config, metrics (timeseries), isRelatedTo and relatesTo fields.


The metrics timeseries is written such that the hbase cell timestamp is set to the metric
timestamp, the hbase cell column qualifier is the metric name and the value is the metric
value. I also propose changing the TimelineMetric values to be "long" instead of "Object"
(although this patch does not make that change). 

For the metrics column family, we should set a TTL of X days and MIN_VERSIONS = 1. That way,
the timeseries info will be retained for X days by hbase and the latest value will always
be retained. 

The test class spins up a MiniCluster via HBaseTestingUtility's startMiniCluster.  It creates
one entity object with info, config, metrics (timeseries), isRelatedTo and relatesTo entities
and writes it to the backend by invoking the write api in HBaseTimelineWriterImpl class. The
test scans the entity table and reads back the entity details and verifies the values of each
field, including the timeseries. 

Also attaching an eclipse console log that ran the unit test. 

The schema creation would be along the lines of this:
{code}
create 'ats.entity',
  {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'},
  {NAME => 'm', VERSIONS => 2147483647, MIN_VERSIONS => 1, COMPRESSION => 'LZO',
BLOCKCACHE => false, TTL => '2592000'},
  {NAME => 'c', COMPRESSION => 'LZO', BLOCKCACHE => false, BLOOMFILTER => 'ROWCOL'
}

{code}

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native HBase schema
for the write path. Such a schema does not exclude using Phoenix, especially for reads and
offline queries.
> Once we have basic implementations of both options, we could evaluate them in terms of
performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message