Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 534B290D2 for ; Mon, 4 Mar 2013 23:57:13 +0000 (UTC) Received: (qmail 22260 invoked by uid 500); 4 Mar 2013 23:57:13 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 22083 invoked by uid 500); 4 Mar 2013 23:57:13 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 21952 invoked by uid 99); 4 Mar 2013 23:57:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Mar 2013 23:57:12 +0000 Date: Mon, 4 Mar 2013 23:57:12 +0000 (UTC) From: "Jay Kreps (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (KAFKA-741) Improve log cleaning dedupe buffer efficiency MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Kreps resolved KAFKA-741. ----------------------------- Resolution: Duplicate This issue is fixed in the patch for KAFKA-739. It removes the duplication using a probing scheme and counts updates correctly. > Improve log cleaning dedupe buffer efficiency > --------------------------------------------- > > Key: KAFKA-741 > URL: https://issues.apache.org/jira/browse/KAFKA-741 > Project: Kafka > Issue Type: Improvement > Reporter: Jay Kreps > Assignee: Jay Kreps > Fix For: 0.8.1 > > > Two good suggestions: > 1. Use a probing scheme to increase density without increasing the collision rate > 2. Only count unique updates to the offset map (i.e. if the key is all zero, don't count it) when computing the load. Dynamically choose the end offset based on when the map is full. > Would be good to investigate these things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira