crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-598) scaleFactor for JoinStrategy
Date Mon, 28 Mar 2016 16:59:25 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214465#comment-15214465
] 

Josh Wills commented on CRUNCH-598:
-----------------------------------

So I'm trying to figure out the best way to solve this, and I'd like to avoid modifying the
ShardingStrategy, since changing interfaces has more downstream impact (clients don't compile
against the new version, etc.) There's lots of stuff we can do on the constructor of ShardedJoinStrategy,
including allowing a scaleFactor argument in place of (or in addition to) the numReducers
argument we provide, or even letting someone pass in their own custom JoinStrategy instead
of the DefaultJoinStrategy that we use under the covers now. Would either of those options
work for your use case?

> scaleFactor for JoinStrategy
> ----------------------------
>
>                 Key: CRUNCH-598
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-598
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Stefan De Smit
>            Priority: Minor
>
> the scaleFactor method has a big influence on planner.
> For joins, there currently isn't a clean way to set this, while it often is required,
as a join can have a big multiply factor.
> for the DefaultJoinStrategy, it's possible to add a custom JoinFn with proper scaleFactor,
or just extend the default InnerJoinFn with a scaleFactor.
> For the ShardedJoinStrategy, this isn't possible, while it often is needed more (as ShardedJoin
is especially handy for 1 to really many).
> For the default ConstantShardingStrategy, it might make sense to use the numShards also
as scalingFactor for left side. as that's kind of what happens: emit every left entry numShards
times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message