crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-215) Add BloomFilterJoinStrategy
Date Sun, 09 Jun 2013 22:00:20 GMT
Gabriel Reid created CRUNCH-215:
-----------------------------------

             Summary: Add BloomFilterJoinStrategy
                 Key: CRUNCH-215
                 URL: https://issues.apache.org/jira/browse/CRUNCH-215
             Project: Crunch
          Issue Type: New Feature
            Reporter: Gabriel Reid
            Assignee: Gabriel Reid


Bloom filters can be very effective for pre-filtering one side of a join when one side of
the join has a small subset of the keys of the other side (i.e. there are many keys on one
side that will not be joined).

The Bloom filter can be built up based on the keys of one side of the join (the side with
fewer keys), and then can be applied as a filter to the other side of the join before it is
sent through the shuffle and reduce phases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message