hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
Date Mon, 13 Apr 2015 23:07:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493235#comment-14493235

Li Lu commented on YARN-3134:

Hi [~vrushalic] and [~zjshen]! Thanks for the comments! 

About [~vrushalic]'s questions, for this poc patch I'm not adding metrics info, but that's
my next step. I'm storing configs in the entity table, under a separate column family CONFIG_COLUMN_FAMILY.
Each config item C(k, v) for a PK will be stored at column CONFIG_COLUMN_FAMILY.k, row PK
with value v. 

bq. One entity may need multiple sql sentences to complete one entity write. Do we need to
use transaction?
That's a very good question that I'm not sure about the answer right now. Now we're writing
one entity (with a PK) with two writes, one only with static columns (C_s) and the other only
with dynamic columns (C_d). Hbase will guarantee row level atomicity for each of the write,
so I assume the result after the two calls will be (PK, C_s) or (PK, C_d) or (PK, C_s, C_d).
The last one is the best case of course. 

bq. In this case, is it better to write the entity one-by-one (including config, info), assuming
the records are in sequence by PK?
I think you're right. Will look into this improvement. 

About deployment, for end users we can either feed then a predefined version of phoenix+hbase
for simpler deployment, or we can allow users to specify the classpath for the phoenix JDBC
driver and choose a version of phoenix+hbase in a customized way. The latter will unavoidably
introduce some difficulties to deployment, but with more freedom. For now, I think our short-term
focus is to wrap miniclusters to allow UTs pass in our branch (to be prepared for a branch

About posting metrics, I was thinking if it's possible to allow users just send the delta
to storage, and we can use some information in the timeline entity to infer if the entity
itself is already in the entity table? If that's possible then we can have some shortcut (not
touching entity table) for faster metrics updating, which may generate the majority of our
storage traffic. 

> [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134DataSchema.pdf
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a client-embedded
JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query,
compiles it into a series of HBase scans, and orchestrates the running of those scans to produce
regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such
that snapshot queries over prior versions will automatically use the correct schema. Direct
use of the HBase API, along with coprocessors and custom filters, results in performance on
the order of milliseconds for small queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can easily build
index and compose complex query.

This message was sent by Atlassian JIRA

View raw message