Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@locus.apache.org Received: (qmail 56877 invoked from network); 11 Jan 2009 20:13:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jan 2009 20:13:20 -0000 Received: (qmail 79532 invoked by uid 500); 11 Jan 2009 20:13:20 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 79509 invoked by uid 500); 11 Jan 2009 20:13:20 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 79498 invoked by uid 99); 11 Jan 2009 20:13:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2009 12:13:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2009 20:13:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8E815234C48B for ; Sun, 11 Jan 2009 12:12:59 -0800 (PST) Message-ID: <990182374.1231704779582.JavaMail.jira@brutus> Date: Sun, 11 Jan 2009 12:12:59 -0800 (PST) From: "Joydeep Sen Sarma (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Created: (HIVE-224) implement lfu based flushing policy for map side aggregates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org implement lfu based flushing policy for map side aggregates ----------------------------------------------------------- Key: HIVE-224 URL: https://issues.apache.org/jira/browse/HIVE-224 Project: Hadoop Hive Issue Type: Improvement Reporter: Joydeep Sen Sarma currently we flush some random set of rows when the map side hash table approaches memory limits. we have discussed a strategy of flushing hash table entries that have the been seen the least number of times (effectively LFU flushing strategy). This will be very effective at reducing the amount of data sent from map to reduce step - as well as reduce the chances for any skews. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.