hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
Date Thu, 16 Apr 2015 22:55:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498924#comment-14498924
] 

Junping Du commented on YARN-3134:
----------------------------------

Thanks [~gtCarrera9] for delivering a patch here! 
Just start to look at the patch. Some initiative comments so far:

{code}
+      String sql = "CREATE TABLE IF NOT EXISTS " + ENTITY_TABLE_NAME
+          + "(cluster VARCHAR NOT NULL, user VARCHAR NOT NULL, "
+          + "flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, "
+          + "appid VARCHAR NOT NULL, type VARCHAR NOT NULL, "
+          + "entityid VARCHAR NOT NULL, "
+          + "creationtime UNSIGNED_LONG, modifiedtime UNSIGNED_LONG, "
...
+      stmt.executeUpdate(sql);
+      stmt.close();
+      conn.commit();
{code}
Putting raw SQL sentences in this way sounds a little headache to me as this means difficult
to debug/maintain in future. Given we could have more tables in pipeline, we may want to refactor
this in some way to be more maintainable? 
BTW, I don't think HBase support any atomic operation across multiple tables. Here we create
3 tables but only one commit which means if 2nd table created failed, 1st table should still
be created and commit successfully and won't be rollback. These partial success after commit
doesn't sounds a good practice to me.
Additional problem is we didn't close connection here but we need to.

{code}
+  private class TimelineEntityCtxt {
{code}
TimelineEntityCtxt => TimelineEntityContext, better not omit full name (except very obviously,
like conf for configuration) in method name. It looks like exactly the same with TimelineCollectorContext.java.
Can we reuse that class instead of creating a new duplicated one?

{code}
+  private <K, V> int setStringsForCf(
{code}
What Cf means? Just like I mentioned above, don't omit the character of a word in a method
which break code's readability.

{code}
+  private int setStringsForPk(PreparedStatement ps, String clusterId, String userId,
{code}
setStringsForPk => setStringsForPrimaryKeys

{code}
+  ResultSet executeQuery(String sql) {
+    ResultSet rs = null;
+    try {
+      Statement stmt = conn.createStatement();
+      rs = stmt.executeQuery(sql);
+    } catch (SQLException se) {
+      LOG.error("SQL exception! " + se.getLocalizedMessage());
+    }
+    return rs;
+  }
{code}
Does getLocalizedMessage contains enough info (at least the SQL sentences executed)? If not,
I would prefer we add raw SQL sentences in error message when Exception get throw. 

{code}
+    // Execute and close
+    psConfigInfo.execute();
+    psConfigInfo.close();
{code}
Many places like here that we are forgetting to put closable resources to finally block. We
should close it even exception get throw. 

More comments could comes later.

> [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch,
YARN-3134DataSchema.pdf
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a client-embedded
JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query,
compiles it into a series of HBase scans, and orchestrates the running of those scans to produce
regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such
that snapshot queries over prior versions will automatically use the correct schema. Direct
use of the HBase API, along with coprocessors and custom filters, results in performance on
the order of milliseconds for small queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can easily build
index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message