hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
Date Wed, 06 May 2015 16:55:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530885#comment-14530885
] 

Junping Du commented on YARN-3134:
----------------------------------

bq. I have not made the change on the Phoenix connection string since, according to our previous
discussion, we're planning to address this after we've decided which implementation to pursue
in the future. 
It should be fine to keep hard coded JDBC connection address here. At least, we should update
the name and make it private instead of default visibility (package level) as no place outside
of CLASS need to use it for now. 

Some other comments against latest patch:

{code}
+  /** Default Phoenix JDBC driver name */
+  static final String DRIVER_CLASS_NAME = "org.apache.phoenix.jdbc.PhoenixDriver";
+  /** Default Phoenix timeline entity table name */
+  static final String ENTITY_TABLE_NAME = "TIMELINE_ENTITY";
...
{code}
Are we going to share these constant variables with HBaseTimelineWriterImpl? If not, we should
mark them as private. In fact, just quickly check patch in YARN-3411, I would expect we could
reuse some of these constants but that not something we should worry about it now.

{code}
+  private static final String[] PHOENIX_STORAGE_PK_LIST
+      = {"cluster", "user", "flow", "version", "run", "appid", "type", "entityid"};
+  private static final String[] TIMELINE_EVENT_EXTRA_PK_LIST =
+      {"timestamp", "eventid"};
+  private static final String[] TIMELINE_METRIC_EXTRA_PK_LIST =
+      {"metricid"};
{code}
For key name, we should keep consistent with other places, i.e. replacing with cluster_id,
user_id, flow_name, flow_version, etc. or we will have significant headache when maintain
it in future. In addition, for key naming convention, it sounds like we don't follow any name
conversions for now, e.g. lower/upper CamelCase or something else. From examples of Phonix
project(http://phoenix.apache.org/views.html or http://phoenix.apache.org/multi-tenancy.html),
they hire the style with having an underscore as a word separator. I would suggest to conform
the same style here.

{code}
+    try {
+      conn.commit();
+    } catch (SQLException se) {
+      LOG.error("Failed to close the phoenix connection! "
+          + se.getLocalizedMessage());
{code}
Message is not correct as exception is not happen in closing connection, but commit.

Last but not the least, looks like we have many helper methods (appendColumnsSQL, setStringsForColumnFamily,
setStringsForPrimaryKey, etc.) for constructing the SQL sentences. However, these helper methods
are without any comments and lacking of unit tests to verify correctness in all cases.

> [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>              Labels: BB2015-05-TBR
>         Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch,
YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch,
YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch,
YARN-3134-YARN-2928.004.patch, YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a client-embedded
JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query,
compiles it into a series of HBase scans, and orchestrates the running of those scans to produce
regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such
that snapshot queries over prior versions will automatically use the correct schema. Direct
use of the HBase API, along with coprocessors and custom filters, results in performance on
the order of milliseconds for small queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can easily build
index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message