Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 75BB0200CA4 for ; Wed, 7 Jun 2017 11:40:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 747E1160BD0; Wed, 7 Jun 2017 09:40:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B9325160BB6 for ; Wed, 7 Jun 2017 11:40:22 +0200 (CEST) Received: (qmail 55621 invoked by uid 500); 7 Jun 2017 09:40:21 -0000 Mailing-List: contact dev-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.apache.org Delivered-To: mailing list dev@apex.apache.org Received: (qmail 55420 invoked by uid 99); 7 Jun 2017 09:40:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jun 2017 09:40:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 347D91A0877 for ; Wed, 7 Jun 2017 09:40:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LvEjm6wGykV6 for ; Wed, 7 Jun 2017 09:40:19 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id ECCA95FBB8 for ; Wed, 7 Jun 2017 09:40:18 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 78F78E092E for ; Wed, 7 Jun 2017 09:40:18 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2F58321E0E for ; Wed, 7 Jun 2017 09:40:18 +0000 (UTC) Date: Wed, 7 Jun 2017 09:40:18 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@apex.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (APEXMALHAR-2366) Apply BloomFilter to Bucket MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 07 Jun 2017 09:40:23 -0000 [ https://issues.apache.org/jira/browse/APEXMALHAR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040597#comment-16040597 ] ASF GitHub Bot commented on APEXMALHAR-2366: -------------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/apex-malhar/pull/631 > Apply BloomFilter to Bucket > --------------------------- > > Key: APEXMALHAR-2366 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2366 > Project: Apache Apex Malhar > Issue Type: Improvement > Reporter: bright chen > Assignee: bright chen > Original Estimate: 192h > Remaining Estimate: 192h > > The bucket get() will check the cache and then check from the stored files if the entry is not in the cache. The checking from files is a pretty heavy operation due to file seek. > The chance of check from file is very high if the key range are large. > Suggest to apply BloomFilter for bucket to reduce the chance read from file. > If the buckets were managed by ManagedStateImpl, the entry of bucket would be very huge and the BloomFilter maybe not useful after a while. But If the buckets were managed by ManagedTimeUnifiedStateImpl, each bucket keep certain amount of entry and BloomFilter would be very useful. > For implementation: > The Guava already have BloomFilter and the interface are pretty simple and fit for our case. But Guava 11 is not compatible with Guava 14 (Guava 11 use Sink while Guava 14 use PrimitiveSink). -- This message was sent by Atlassian JIRA (v6.3.15#6346)