hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2928) YARN Timeline Service: Next generation
Date Sat, 06 Jun 2015 01:38:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575490#comment-14575490
] 

James Taylor commented on YARN-2928:
------------------------------------

Happy to help, [~gtCarrera9]. Thanks for the information.

bq. If I understand this correctly, in this case, Phoenix will inherit pre-split settings
from HBase? Will this alter the existing HBase table, including its schema and/or data inside?
In general, if one runs CREATE TABLE IF NOT EXISTS or simply CREATE TABLE commands over a
pre-split existing HBase table, will Phoenix simply accept the existing table as-is?
If you create a table in Phoenix and the table already exists in HBase, Phoenix will accept
the existing table as-is, adding any metadata it needs (i.e. it's coprocessors). If the table
has existing data, then Phoenix will add an empty KeyValue to each row in the first column
family referenced in the create table statement (or the default column family if there are
no column families referenced). Phoenix needs this empty key value for a variety of reasons.
The onus is on the user to ensure that the types in the create table statement match the actual
means in which the data was serialized.

For your configuration/metric key-value pair, how are they named? Do you know the possible
set of key values in advance? Or are they known more-or-less on-the-fly? One way you could
model this with views is to just dynamically add the column to the view when you need to.
Adding a column to a view is a very light weight operation - corresponding to a few Puts to
the SYSTEM.CATALOG table. Then you'd have a way of looping through all metrics for a given
view using the metadata APIs. Think of a view as a set of explicitly named dynamic columns.
You'd still need to generate the SQL statement, though.

bq. One potential solution is to use HBase coprocessors to aggregate application data from
the HBase storage, and then store them in a Phoenix aggregation table.
I'm not following. Are you thinking to have a secondary table that's a rollup aggregation
of more raw data? Is that required, or is it more of a convenience for the user? If the raw
data is Phoenix-queryable, then I think you have a lot of options. Can you point me to some
more info on your design?

The stable APIs for Phoenix are the ones we expose through our public APIs: JDBC and our various
integration modules (i.e. MapReduce, Pig, etc.). I'd say that our serialization format produced
by PDataType is stable (it needs to be for us to meet our b/w compat guarantees) and the PDataType
APIs are more stable than others. Also, we're looking to integrate with Apache Calcite, so
we may have some other APIs that could be hooked into as well down the road.


> YARN Timeline Service: Next generation
> --------------------------------------
>
>                 Key: YARN-2928
>                 URL: https://issues.apache.org/jira/browse/YARN-2928
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>         Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf, Timeline
Service Next Gen - Planning - ppt.pptx, TimelineServiceStoragePerformanceTestSummaryYARN-2928.pdf
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and YARN-321.
Although it is a great feature, we have recognized several critical issues and features that
need to be addressed.
> This JIRA proposes the design and implementation changes to address those. This is phase
1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message