Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79924EBE9 for ; Sat, 2 Feb 2013 01:11:13 +0000 (UTC) Received: (qmail 66601 invoked by uid 500); 2 Feb 2013 01:11:12 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 66472 invoked by uid 500); 2 Feb 2013 01:11:12 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 66220 invoked by uid 99); 2 Feb 2013 01:11:11 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Feb 2013 01:11:11 +0000 Date: Sat, 2 Feb 2013 01:11:11 +0000 (UTC) From: "Nick Dimiduk (JIRA)" To: dev@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HBASE-7743) KeyValueSortReducer and PutSortReducers buffer entire value-groups in memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Nick Dimiduk created HBASE-7743: ----------------------------------- Summary: KeyValueSortReducer and PutSortReducers buffer entire value-groups in memory Key: HBASE-7743 URL: https://issues.apache.org/jira/browse/HBASE-7743 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Nick Dimiduk The mapreduce package provides two Reducer implementations, KeyValueSortReducer and PutSortReducer, which are used by Import, ImportTsv, and WALPlayer in conjunction with the HFileOutputFormat. Both of these implementations make use of a TreeSet to sort values matching a key. This reducer will OOM when rows are large. A better solution would be to implement secondary sort of the values. That way hadoop sorts the records, spilling to disk when necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira