apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket
Date Thu, 01 Jun 2017 21:07:04 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033697#comment-16033697

ASF GitHub Bot commented on APEXMALHAR-2366:

GitHub user PramodSSImmaneni opened a pull request:


    APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal BloomFilter

    @bhupeshchawda please see, this is to finish up the work started in https://github.com/apache/apex-malhar/pull/521

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/PramodSSImmaneni/apex-malhar APEXMALHAR-2366

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #631
commit 31c15fe839569610edb9c89af1d3371401114956
Author: brightchen <bright@datatorrent.com>
Date:   2016-12-05T19:34:48Z

    APEXMALHAR-2366 #resolve #comment Apply BloomFilter to Bucket, use internal BloomFilter

commit 58fec176277eca6c6b872552d411ce3b852fc70d
Author: Pramod Immaneni <pramod@datatorrent.com>
Date:   2017-05-25T21:39:23Z

    Merge branch 'APEXMALHAR-2366' of github.com:brightchen/apex-malhar into APEXMALHAR-2366

commit 7ec135b45d2eb99497944bd6608d3b29e89ade9b
Author: Pramod Immaneni <pramod@datatorrent.com>
Date:   2017-06-01T21:02:26Z

    Added license references


> Apply BloomFilter to Bucket
> ---------------------------
>                 Key: APEXMALHAR-2366
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366
>             Project: Apache Apex Malhar
>          Issue Type: Improvement
>            Reporter: bright chen
>            Assignee: bright chen
>   Original Estimate: 192h
>  Remaining Estimate: 192h
> The bucket get() will check the cache and then check from the stored files if the entry
is not in the cache. The checking from files is a pretty heavy operation due to file seek.
> The chance of check from file is very high if the key range are large.
> Suggest to apply BloomFilter for bucket to reduce the chance read from file.
> If the buckets were managed by ManagedStateImpl, the entry of bucket would be very huge
and the BloomFilter maybe not useful after a while. But If the buckets were managed by ManagedTimeUnifiedStateImpl,
each bucket keep certain amount of entry and BloomFilter would be very useful.
> For implementation:
> The Guava already have BloomFilter and the interface are pretty simple and fit for our
case. But Guava 11 is not compatible with Guava 14 (Guava 11 use Sink while Guava 14 use PrimitiveSink).

This message was sent by Atlassian JIRA

View raw message