Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE319D0BE for ; Mon, 24 Dec 2012 03:31:08 +0000 (UTC) Received: (qmail 67853 invoked by uid 500); 24 Dec 2012 03:31:03 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 67653 invoked by uid 500); 24 Dec 2012 03:31:03 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 67627 invoked by uid 99); 24 Dec 2012 03:31:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 03:31:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ablozhou@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 03:30:57 +0000 Received: by mail-la0-f45.google.com with SMTP id p9so8025408laa.18 for ; Sun, 23 Dec 2012 19:30:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Yrzbi2H0W3WthmPpBT7S7+VY9jeobU3vLGy9nnsuyaY=; b=O/1g1+W2Nfe0NyQabdPY31G5NRXr1qbtipRKrzliE5vR07gtrhj2bB6/UaYSgpY6XX rl4MWrDLU/S3ztQSldi0J+nf1txZYHTwY+cuN1tuS6zpaUTAxDrHUDzHrFScxfdzIdNM EElYSAN4kaS86VP5mjwmDgZsi/L/K0CkJxTKlSWgYsnaLtTc1hjsjOgR92H7eKfkoyqY w20TczyBMedAwYS69hmUT544gcn7mI8nDdlPJQsw2K2cyFC75yfHrfE+P7iM9fS7R1Fc TZgQ+3ic2B/bzLCIn0S9eViEsiUqi5VDcL9r5MUb+Qg9qsF3Rkg577eWyiDGYY28W522 /82w== Received: by 10.152.46.161 with SMTP id w1mr19034829lam.27.1356319835870; Sun, 23 Dec 2012 19:30:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.87.231 with HTTP; Sun, 23 Dec 2012 19:30:10 -0800 (PST) In-Reply-To: References: <013c01cddffd$7ee10b10$7ca32130$@yahoo.com> From: =?UTF-8?B?5ZGo5qKm5oOz?= Date: Mon, 24 Dec 2012 11:30:10 +0800 Message-ID: Subject: Re: How to troubleshoot OutOfMemoryError To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec554089004d6f504d190d273 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec554089004d6f504d190d273 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I encountered the OOM problem, because i don't set ulimit open files limit. It had nothing to do with Memory. Memory is sufficient. Best Regards, Andy 2012/12/22 Manoj Babu > David, > > I faced the same issue due to too much of logging that fills the task > tracker log folder. > > Cheers! > Manoj. > > > On Sat, Dec 22, 2012 at 9:10 PM, Stephen Fritz wro= te: > >> Troubleshooting OOMs in the map/reduce tasks can be tricky, see page 118 >> of Hadoop Operationsfor a couple of settings which could affect the fr= equency of OOMs which >> aren't necessarily intuitive. >> >> To answer your question about getting the heap dump, you should be able >> to add "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=3D/some/path" t= o >> your mapred.child.java.opts, then look for the heap dump in that path ne= xt >> time you see the OOM. >> >> >> On Fri, Dec 21, 2012 at 11:33 PM, David Parks wr= ote: >> >>> I=92m pretty consistently seeing a few reduce tasks fail with >>> OutOfMemoryError (below). It doesn=92t kill the job, but it slows it do= wn. >>> **** >>> >>> ** ** >>> >>> In my current case the reducer is pretty darn simple, the algorithm >>> basically does:**** >>> >>> **1. **Do you have 2 values for this key?**** >>> >>> **2. **If so, build a json string and emit a NullWritable and >>> Text value.**** >>> >>> ** ** >>> >>> The string buffer I use to build the json is re-used, and I can=92t see >>> anywhere in my code that would be taking more than ~50k of memory at an= y >>> point in time.**** >>> >>> ** ** >>> >>> But I want to verify, is there a way to get the heap dump and all after >>> this error? I=92m running on AWS MapReduce v1.0.3 of Hadoop.**** >>> >>> ** ** >>> >>> Error: java.lang.OutOfMemoryError: Java heap space**** >>> >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffl= eInMemory(ReduceTask.java:1711) >>> **** >>> >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMap= Output(ReduceTask.java:1571) >>> **** >>> >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOu= tput(ReduceTask.java:1412) >>> **** >>> >>> at >>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(Re= duceTask.java:1344) >>> **** >>> >>> ** ** >>> >>> ** ** >>> >> >> > --bcaec554089004d6f504d190d273 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I encountered the OOM problem, because i don't set ulimit open files li= mit. It had nothing to do with Memory. Memory is=A0sufficient.

Best Regards,
Andy

2012/= 12/22 Manoj Babu <manoj444@gmail.com>
David,

I faced the same issue d= ue to too much of logging that fills the task tracker log folder.

Cheers!
Manoj.


On Sat, Dec 22, 2012 at 9:10 PM, Stephen= Fritz <stephenf@cloudera.com> wrote:
Troubleshooting OOMs in the map/reduce tasks can be tricky, see page 118= of Hadoop Operations for a couple of settings which could affect the f= requency of OOMs which aren't necessarily intuitive.=A0

To answer your question about getting the heap dump, you should be able= to add "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=3D/some/path= " to your mapred.child.java.opts, then look for the heap dump in that = path next time you see the OOM.


On Fri, Dec 21, 2012 at 11:33 PM, David Park= s <davidparks21@yahoo.com> wrote:

I=92m pretty consistently seeing a few reduce tasks fail with OutOfMemo= ryError (below). It doesn=92t kill the job, but it slows it down.=

=A0

In my cu= rrent case the reducer is pretty darn simple, the algorithm basically does:=

1.=A0=A0=A0=A0=A0=A0 Do you have 2 value= s for this key?

2.= =A0=A0=A0=A0=A0=A0 If so, build a json string and emit= a NullWritable and Text value.

= =A0

The string buffer I use to build the json is re-used= , and I can=92t see anywhere in my code that would be taking more than ~50k= of memory at any point in time.

=A0

But I want to verify, is there a way to get the heap= dump and all after this error? I=92m running on AWS MapReduce v1.0.3 of Ha= doop.

=A0

Error: java.lang.OutOfM= emoryError: Java heap space

= =A0=A0=A0=A0=A0=A0=A0 a= t org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleI= nMemory(ReduceTask.java:1711)

= =A0=A0=A0=A0=A0=A0=A0 at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$M= apOutputCopier.getMapOutput(ReduceTask.java:1571)

<= p class=3D"MsoNormal"> =A0=A0=A0=A0=A0=A0=A0 a= t org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutp= ut(ReduceTask.java:1412)

=A0=A0=A0=A0=A0=A0=A0 at o= rg.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT= ask.java:1344)

= =A0

=A0




--bcaec554089004d6f504d190d273--