Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 549147667 for ; Tue, 29 Nov 2011 23:45:20 +0000 (UTC) Received: (qmail 31966 invoked by uid 500); 29 Nov 2011 23:45:19 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 31851 invoked by uid 500); 29 Nov 2011 23:45:18 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 31843 invoked by uid 99); 29 Nov 2011 23:45:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 23:45:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Mingxi.Wu@turn.com designates 216.75.227.229 as permitted sender) Received: from [216.75.227.229] (HELO turn-mail02.turn.corp) (216.75.227.229) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2011 23:45:11 +0000 Received: from turn-mail02.turn.corp ([::1]) by turn-mail02.turn.corp ([::1]) with mapi id 14.01.0289.001; Tue, 29 Nov 2011 15:44:49 -0800 From: Mingxi Wu To: "common-dev@hadoop.apache.org" Subject: Hadoop - non disk based sorting? Thread-Topic: Hadoop - non disk based sorting? Thread-Index: AQHMrvDhz/mC1UMQi0akmJ2CzRzIfg== Date: Tue, 29 Nov 2011 23:44:49 +0000 Message-ID: <8473D1F51DC7684E8D6514A8FAE5C7ACF0FC55@turn-mail02.turn.corp> References: <32876785.post@talk.nabble.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [216.75.227.229] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I have a question regarding the shuffle phase of reducer.=20 It appears when there are large map output (in my case, 5 billion records),= I will have out of memory Error like below.=20 Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.map= red.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java= :1592) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.= getMapOutput(ReduceTask.java:1452) at org.apache.hadoop.mapred.ReduceTask$R= educeCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301) at org.apache.= hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1= 233) However, I thought the shuffling phase is using disk-based sort, which is n= ot constraint by memory.=20 So, why will user run into this outofmemory error? After I increased my num= ber of reducers from 100 to 200, the problem went away.=20 Any input regarding this memory issue would be appreciated! Thanks, Mingxi