hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joep Rottinghuis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
Date Thu, 21 May 2015 16:19:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554595#comment-14554595
] 

Joep Rottinghuis commented on YARN-3411:
----------------------------------------

I think it is reasonable that two implementations can differ in their backing schema as long
as they both can write the data and retrieve the data with the same key information. Phoenix
may need to add somethings to the rowkey in order to work properly, it may have to add some
things, and ditto for the raw HBase implementation, some additional secondary lookups may
be needed etc. That is part of the performance comparison to see.

[~djp] with respect to adding the flow version in the key, I think the problem with that is
that you now require the caller to know what the version is in order to query back. I don't
think that is a natural requirement. I know that I ran the "ComputeUniqueUsers" flow on the
cluster, so I have user cluster and flowname, but I don't need to know the version to just
query the last few runs right? If you do have the version (for reducer estimation and you
want the last runs of the same flow back) then it should be possible to query by flow _and_
by version, but I don't think it should be mandatory.
Therefore I don't think that flow version must perse be a rowkey in all implementations.

I think we'll find that with certain schema choices some things will be more performant while
others will be somewhat slower. It will be a mater of finding those schema choices that will
give good enough write performance to handle scale and give good read performance for the
most common use cases, while maintaining reasonable performance for other queries.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411-YARN-2928.001.patch,
YARN-3411-YARN-2928.002.patch, YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, YARN-3411-YARN-2928.007.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt,
YARN-3411.poc.7.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native HBase schema
for the write path. Such a schema does not exclude using Phoenix, especially for reads and
offline queries.
> Once we have basic implementations of both options, we could evaluate them in terms of
performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message