hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijie Shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
Date Tue, 05 May 2015 20:01:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529167#comment-14529167

Zhijie Shen commented on YARN-3134:

Li, thanks for updating the patch. Here're some comments about it.

1. How do we chose the size and the expiry time?
113	    connectionCache = CacheBuilder.newBuilder().maximumSize(16)
114	        .expireAfterAccess(10, TimeUnit.SECONDS).removalListener(

2. If we use try with resources, do we still need to close stmt? Shall we close them in finally
235	    try (Statement stmt = conn.createStatement()) {
272	      stmt.close();
273	      conn.commit();
274	      conn.close();

3. Seems to be a trivial method wrapper
292	  private <K> StringBuilder appendVarcharColumnsSQL(
293	      StringBuilder colNames, ColumnFamilyInfo<K> cfInfo) {
294	    return appendColumnsSQL(colNames, cfInfo, " VARCHAR");
295	  }

4. So why name and version should be combined and put it the same cell, but not be separated?

345	    ps.setString(idx++,
346	        context.getFlowName() + STORAGE_SEPARATOR + context.getFlowVersion());

5. Seems not to be necessary.
356	    if (entity.getConfigs() == null
357	        && entity.getInfo() == null
358	        && entity.getIsRelatedToEntities() == null
359	        && entity.getRelatesToEntities() == null) {
360	      return;
361	    }

6. Should info be varbinary?

7. Should config be varchar?
366	      appendColumnsSQL(sqlColumns, new ColumnFamilyInfo<>(
367	          CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()), " VARBINARY");

8. Does phoenix support numeric/decimal? Not sure if we should store the numbers in these
268	          + "singledata VARBINARY "

9. In storeMetrics, assuming we only deal with single value case now, I think it's better
to check if the metric is single value first. Another question here is if we want to ignore
the associated timestamp of the single value? Or we should add one more column to store the
timestamp of this value.

10. W.R.T the number of conn and threads? Is it better to have the same number of conn threads
as the number of app collector? And the requests of one app is routed to the same thread.
This is because I remember somewhere we have mentioned we want to isolate between apps. Otherwise,
the app with more timeline data will occupy more writing capacity to the backend. /cc [~sjlee0]

11. In TestTimelineWriterImpl, can we cover the case that the entity has non string info value?

12. In TestPhoenixTimelineWriterImpl, can we verify the each cell are storing the right data?

> [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, YARN-3134-040915_poc.patch,
YARN-3134-041015_poc.patch, YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch,
YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, YARN-3134-YARN-2928.003.patch,
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a client-embedded
JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query,
compiles it into a series of HBase scans, and orchestrates the running of those scans to produce
regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such
that snapshot queries over prior versions will automatically use the correct schema. Direct
use of the HBase API, along with coprocessors and custom filters, results in performance on
the order of milliseconds for small queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can easily build
index and compose complex query.

This message was sent by Atlassian JIRA

View raw message