crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan De Smit (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-598) scaleFactor for JoinStrategy
Date Mon, 28 Mar 2016 18:40:25 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214650#comment-15214650
] 

Stefan De Smit commented on CRUNCH-598:
---------------------------------------

Any option that allows me to increase the number of reducers of the join will work for me.
(but not a fixed number of reducers, as I have different sized inputs). 
I understand you don't want to extend the interface for compatibility reasons.
Have you considered to allow a custom JoinFn to be passed to the Shardedjoinstrategy?
That would be in-line with Defaultjoinstrategy.
of course, sharded join does not allow right join, but that could be documented
I'm not sure if this could be done in a way that it allows to easily switch between default
and sharded join.
If not, adding a constructor argument to pass scalefactor seems best way to me. As this could
also be done for the defaultjoin. To avoid having to subclass the default joinfn.

> scaleFactor for JoinStrategy
> ----------------------------
>
>                 Key: CRUNCH-598
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-598
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Stefan De Smit
>            Priority: Minor
>
> the scaleFactor method has a big influence on planner.
> For joins, there currently isn't a clean way to set this, while it often is required,
as a join can have a big multiply factor.
> for the DefaultJoinStrategy, it's possible to add a custom JoinFn with proper scaleFactor,
or just extend the default InnerJoinFn with a scaleFactor.
> For the ShardedJoinStrategy, this isn't possible, while it often is needed more (as ShardedJoin
is especially handy for 1 to really many).
> For the default ConstantShardingStrategy, it might make sense to use the numShards also
as scalingFactor for left side. as that's kind of what happens: emit every left entry numShards
times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message