crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan De Smit (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-598) scaleFactor for JoinStrategy
Date Thu, 24 Mar 2016 19:01:25 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15210769#comment-15210769
] 

Stefan De Smit commented on CRUNCH-598:
---------------------------------------

 I would only fix Shardedjoinstrategy, as it is possible by passing your own joinfn in defaultjoin
This scalefactor is pretty custom anyway. I think the default factor of 1 is in most cases
not accurate for a join, but as there is a way, I don't think an extra argument brings extra
value.
The shared join does not have a way and needs it more.
I would just add a method "getAverageNumShards()" in the ShardingStrategy interface, and call
this to set the scaling for leftshardfn. As that is the correct Scalefactor of this function.
In Constantshardingstrategy, this method can just return the same numshards.
For another strategy the scalefactor would be some kind of average shard, that the user has
to define.


> scaleFactor for JoinStrategy
> ----------------------------
>
>                 Key: CRUNCH-598
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-598
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Stefan De Smit
>            Priority: Minor
>
> the scaleFactor method has a big influence on planner.
> For joins, there currently isn't a clean way to set this, while it often is required,
as a join can have a big multiply factor.
> for the DefaultJoinStrategy, it's possible to add a custom JoinFn with proper scaleFactor,
or just extend the default InnerJoinFn with a scaleFactor.
> For the ShardedJoinStrategy, this isn't possible, while it often is needed more (as ShardedJoin
is especially handy for 1 to really many).
> For the default ConstantShardingStrategy, it might make sense to use the numShards also
as scalingFactor for left side. as that's kind of what happens: emit every left entry numShards
times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message