hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3699) Decide if flow version should be part of row key or column
Date Mon, 01 Jun 2015 16:59:17 GMT

    [ https://issues.apache.org/jira/browse/YARN-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567573#comment-14567573

Junping Du commented on YARN-3699:

Hi [~jrottinghuis] and [~vrushalic], thanks for your comments and sorry for replying late
on this as traveling last week. 
I fully agree with Joep's above comments that there is no right or wrong schema but just fit-in
one for priority scenarios:
- if we need more for flow_run under specific flow/flows, then making flow version as column
will make this query more efficient.
- if we equally (or more) need for flow_run under specific flow version(s), then our decision
here could be different.
To me, the tricky/interesting part here is the boundary between different flows and flow versions
could vague in practice: How big/small changes we made on a flow should start a new flow or
new flow version? Why we have more active flow versions instead of having only one active
flow version (with adding more flows). These trade-offs in application concepts also affect
our trade-off in schema design which is pretty common thing that I saw also from other apps.
I would like to trust your priority here given your experience from hRaven which is already
in production running well for years. So I agree Phoenix schema should be adjusted slightly
to get closed to HBase one. 
May be we should have a new JIRA for this (Phoenix schema) change? We can either keep this
JIRA open for discussion or resolve it as later so in future, if others from community bring
other solid scenarios in practice, we can continue the discussion here and try to make better
trade-off or innovation. Thoughts?

> Decide if  flow version should be part of row key or column
> -----------------------------------------------------------
>                 Key: YARN-3699
>                 URL: https://issues.apache.org/jira/browse/YARN-3699
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Vrushali C
> Based on discussions in YARN-3411 with [~djp], filing jira for continuing discussion
on putting the flow version in rowkey or column. 
> Either phoenix/hbase approach will update the jira with the conclusions..

This message was sent by Atlassian JIRA

View raw message