crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-215) Add BloomFilterJoinStrategy
Date Mon, 10 Jun 2013 04:06:21 GMT


Josh Wills commented on CRUNCH-215:

+1, with a couple of comments for follow-up items.

1) I'm curious as to why the call was required before the bloom stuff was run;
I want to investigate that after this gets checked in so I can see what's going on there.
2) Not for this patch, but it might make sense to have a general purpose way of serializing/deserializing
any PType (independent of type family) to bytes, or to a ByteBuffer. Don't change it for this
patch, but I think it's something we should consider.
> Add BloomFilterJoinStrategy
> ---------------------------
>                 Key: CRUNCH-215
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: CRUNCH-215.patch
> Bloom filters can be very effective for pre-filtering one side of a join when one side
of the join has a small subset of the keys of the other side (i.e. there are many keys on
one side that will not be joined).
> The Bloom filter can be built up based on the keys of one side of the join (the side
with fewer keys), and then can be applied as a filter to the other side of the join before
it is sent through the shuffle and reduce phases.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message