crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-596) Right and full outer join for Bloom filter strategy
Date Fri, 18 Mar 2016 12:08:33 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201375#comment-15201375
] 

Gabriel Reid commented on CRUNCH-596:
-------------------------------------

First off, sorry I missed your long-standing pull request for this -- I saw it pass by a while
back and didn't get on it.

This looks really good -- a great solution for something that I was pretty much convinced
wasn't possible.

Very good point in the javadoc about setting dealing with the deep-copying that will occur
by using the two filters to split the right side up into two PCollections. However, I think
that we can get around that by putting the "right" PCollection through a dummy parallelDo
call with a DoFn that returns true for {{disableDeepCopy()}}. This will stop the deep copying
that would happen otherwise before the values are passed to the two filter functions, and
then there wouldn't be any need to set DISABLE_DEEP_COPY globally any more to get decent performance.
Would you be able to update the patch with that little change? Other than that, this looks
like it's good to go.



> Right and full outer join for Bloom filter strategy
> ---------------------------------------------------
>
>                 Key: CRUNCH-596
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-596
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.13.0
>            Reporter: Piotr Chromiec
>            Assignee: Josh Wills
>            Priority: Minor
>              Labels: features, github-import, newbie
>             Fix For: 0.14.0
>
>
> Seems that current Bloom filter join strategy lacks of support for right and full outer
joins. At RTBHOUSE we had recently found this as useful and implemented for our internal project.
Code for this feature with javadoc and tests is pushed at GitHub [PullRequest#9|https://github.com/apache/crunch/pull/9]
> I'm newbie here so forgive me if this issue is somehow incomplete or buggy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message