From user-return-2647-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Mon Nov 5 09:11:05 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD3509B96 for ; Mon, 5 Nov 2012 09:11:05 +0000 (UTC) Received: (qmail 84000 invoked by uid 500); 5 Nov 2012 09:11:01 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 83866 invoked by uid 500); 5 Nov 2012 09:11:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 83803 invoked by uid 99); 5 Nov 2012 09:11:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 09:11:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of e.v.skaley@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2012 09:10:20 +0000 Received: by mail-lb0-f176.google.com with SMTP id i8so4130998lbo.35 for ; Mon, 05 Nov 2012 01:09:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=Z5pnqjwAOBIOhNaPVSrK0VYa7Zcnh7fqKQX/2NVZn/Y=; b=e2yS5jeRQw/v8pVmbLv6v47UQJSlzQ0B/6uNkDljjyTifGFpWUCVMytWpK97TiuVz+ orqvVbi5dYhObBiqRH/ueDvE06Jyv6DnkMFjCDd84AojwcLFr33JdJrFNI5JLiSRMeUu v1Zbb5znZBg/AHuRbkzJkm2uZR0N9VOVtuVWvQbHhCmbDjxF/GgyhFofpUUYkC5P8C0A Zx4PQwZ8Mft/cNE7KbtoQ6UNmJXfZoyeMWPyjHADWpD7W7Pqf6XORLokWXvMpRVzs2F4 77NdfOouEWIXqlM/M5V3Db9RidZfmuDaWpySAlunjeA4h9Tf/WHQXE49NdcRryGVbpxk Ia4Q== Received: by 10.112.14.9 with SMTP id l9mr3810287lbc.78.1352106599188; Mon, 05 Nov 2012 01:09:59 -0800 (PST) Received: from [192.168.178.35] (port-92-203-0-198.dynamic.qsc.de. [92.203.0.198]) by mx.google.com with ESMTPS id sx3sm5359713lab.9.2012.11.05.01.09.57 (version=SSLv3 cipher=OTHER); Mon, 05 Nov 2012 01:09:58 -0800 (PST) Message-ID: <50978264.5070104@gmail.com> Date: Mon, 05 Nov 2012 10:09:56 +0100 From: Eduard Skaley User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: user@hadoop.apache.org CC: Nitin Pawar Subject: Re: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError Java Heap Space References: <50914788.2000005@gmail.com> <50914A96.1070405@gmail.com> In-Reply-To: <50914A96.1070405@gmail.com> Content-Type: multipart/alternative; boundary="------------030908080000070502000309" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------030908080000070502000309 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit By the way it happens on Yarn not on MRv1 > each container gets 1GB at the moment. >> can you try increasing memory per reducer ? >> >> >> On Wed, Oct 31, 2012 at 9:15 PM, Eduard Skaley > > wrote: >> >> Hello, >> >> I'm getting this Error through job execution: >> >> 16:20:26 INFO [main] Job - map 100% reduce 46% >> 16:20:27 INFO [main] Job - map 100% reduce 51% >> 16:20:29 INFO [main] Job - map 100% reduce 62% >> 16:20:30 INFO [main] Job - map 100% reduce 64% >> 16:20:32 INFO [main] Job - Task Id : >> attempt_1351680008718_0018_r_000006_0, Status : FAILED >> Error: >> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: >> error in shuffle in fetcher#2 >> at >> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) >> Caused by: java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:58) >> at >> org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:45) >> at >> org.apache.hadoop.mapreduce.task.reduce.MapOutput.(MapOutput.java:97) >> at >> org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:286) >> at >> org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:276) >> at >> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:384) >> at >> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:319) >> at >> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:179) >> >> 16:20:33 INFO [main] Job - map 100% reduce 65% >> 16:20:36 INFO [main] Job - map 100% reduce 67% >> 16:20:39 INFO [main] Job - map 100% reduce 69% >> 16:20:41 INFO [main] Job - map 100% reduce 70% >> 16:20:43 INFO [main] Job - map 100% reduce 71% >> >> I have no clue what the issue could be for this. I googled this >> issue and checked several sources of possible solutions but >> nothing does fit. >> >> I saw this jira entry which could fit: >> https://issues.apache.org/jira/browse/MAPREDUCE-4655. >> >> Here somebody recommends to increase the value for the property >> dfs.datanode.max.xcievers / dfs.datanode.max.receiver.threads to >> 4096, but this is the value for our cluster. >> http://yaseminavcular.blogspot.de/2011/04/common-hadoop-hdfs-exceptions-with.html >> >> The issue with the to small input files doesn't fit I think, >> because the map phase reads 137 files with each 130MB. Block Size >> is 128MB. >> >> The cluster uses version 2.0.0-cdh4.1.1, >> 581959ba23e4af85afd8db98b7687662fe9c5f20. >> >> Thx >> >> >> >> >> >> >> >> >> >> -- >> Nitin Pawar >> > --------------030908080000070502000309 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit
By the way it happens on Yarn not on MRv1
each container gets 1GB at the moment.
can you try increasing memory per reducer  ? 


On Wed, Oct 31, 2012 at 9:15 PM, Eduard Skaley <e.v.skaley@gmail.com> wrote:
Hello,

I'm getting this Error through job execution:

16:20:26 INFO  [main]                     Job -  map 100% reduce 46%
16:20:27 INFO  [main]                     Job -  map 100% reduce 51%
16:20:29 INFO  [main]                     Job -  map 100% reduce 62%
16:20:30 INFO  [main]                     Job -  map 100% reduce 64%
16:20:32 INFO  [main]                     Job - Task Id : attempt_1351680008718_0018_r_000006_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:58)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:45)
    at org.apache.hadoop.mapreduce.task.reduce.MapOutput.<init>(MapOutput.java:97)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:286)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:276)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:384)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:319)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:179)

16:20:33 INFO  [main]                     Job -  map 100% reduce 65%
16:20:36 INFO  [main]                     Job -  map 100% reduce 67%
16:20:39 INFO  [main]                     Job -  map 100% reduce 69%
16:20:41 INFO  [main]                     Job -  map 100% reduce 70%
16:20:43 INFO  [main]                     Job -  map 100% reduce 71%

I have no clue what the issue could be for this. I googled this issue and checked several sources of possible solutions but nothing does fit.

I saw this jira entry which could fit: https://issues.apache.org/jira/browse/MAPREDUCE-4655.

Here somebody recommends to increase the value for the property dfs.datanode.max.xcievers / dfs.datanode.max.receiver.threads to 4096, but this is the value for our cluster.
http://yaseminavcular.blogspot.de/2011/04/common-hadoop-hdfs-exceptions-with.html

The issue with the to small input files doesn't fit I think, because the map phase reads 137 files with each 130MB. Block Size is 128MB.

The cluster uses version
2.0.0-cdh4.1.1, 581959ba23e4af85afd8db98b7687662fe9c5f20.

Thx









--
Nitin Pawar



--------------030908080000070502000309--