hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
Date Fri, 05 Jun 2015 18:40:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574991#comment-14574991
] 

James Taylor commented on YARN-2928:
------------------------------------

Nice writeup, [~vrushalic]. For your benchmarks, if you're pre-splitting for the HBase direct
write path but not for the Phoenix write path, you're not really comparing apples-to-apples.
There are a number of ways you can install your KeyPrefixRegionSplitPolicy in Phoenix. The
easiest is probably to create the HBase table the same way (through code or using the HBase
shell) with the KeyPrefixRegionSplitPolicy specified at create time. Then, in Phoenix you
can issue a CREATE TABLE statement against the existing HBase table and  it'll just map to
it. Then you'll have your split policy for your benchmark in both write paths.

An alternative to dynamic columns is to define views over your Phoenix table (http://phoenix.apache.org/views.html).
In each view, you could specify the set of columns it contains. Then you can use the regular
JDBC metadata APIs to get the set of columns that define your view: http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getColumns%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29

Another interesting angle with views (not sure if this is relevant for your use case or not),
but they're capable of being multi-tenant where the definition of the "tenant" is up to you
(maybe it would map to a User?). In this case, each tenant can define their own derived view
and add columns specific to their usage. You can even create secondary indexes over a view.
This is the way Phoenix surfaces NoSQL in the SQL world. More here: http://phoenix.apache.org/multi-tenancy.html

There is room for improvement in the Phoenix write path, though. I've filed PHOENIX-2028 and
plan to work on that shortly.

If you do end up going with a direct HBase write path, I'd encourage you to use the Phoenix
serialization format (through PDataType and derived classes) to ensure you can do adhoc querying
on the data. The most important aspect is how your row key is written and the separators you
use if you're storing multiple values in the row key.

> YARN Timeline Service: Next generation
> --------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline
Service Next Gen - Planning - ppt.pptx, TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and YARN-321.
Although it is a great feature, we have recognized several critical issues and features that
need to be addressed.
> This JIRA proposes the design and implementation changes to address those. This is phase
1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message