Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EB34BC413 for ; Sat, 12 May 2012 01:18:10 +0000 (UTC) Received: (qmail 79722 invoked by uid 500); 12 May 2012 01:18:10 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 79694 invoked by uid 500); 12 May 2012 01:18:10 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 79682 invoked by uid 99); 12 May 2012 01:18:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 01:18:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 01:18:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id F101A49228D for ; Sat, 12 May 2012 01:17:48 +0000 (UTC) Date: Sat, 12 May 2012 01:17:48 +0000 (UTC) From: "Catalin Alexandru Zamfir (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <1165114224.57164.1336785468988.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <271932178.56127.1336768485004.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-8396) DataStreamer, OutOfMemoryError, unable to create new native thread MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273782#comment-13273782 ] Catalin Alexandru Zamfir commented on HADOOP-8396: -------------------------------------------------- It ran for a while and died at 3881 threads. There's definitively a problem here in how Hadoop handles native threads. > DataStreamer, OutOfMemoryError, unable to create new native thread > ------------------------------------------------------------------ > > Key: HADOOP-8396 > URL: https://issues.apache.org/jira/browse/HADOOP-8396 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 1.0.2 > Environment: Ubuntu 64bit, 4GB of RAM, Core Duo processors, commodity hardware. > Reporter: Catalin Alexandru Zamfir > Priority: Blocker > Labels: DataStreamer, I/O, OutOfMemoryError, ResponseProcessor, hadoop,, leak, memory, rpc, > > We're trying to write about 1 few billion records, via "Avro". When we got this error, that's unrelated to our code: > 10725984 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:58:43.290 # Written: 521000000 records > Exception in thread "DataStreamer for file /Streams/Cubed/Stuff/objGame/aRandomGame/objType/aRandomType/2012/05/11/20/29/Shard.avro block blk_3254486396346586049_75838" java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:657) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:612) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202) > at org.apache.hadoop.ipc.Client.call(Client.java:1046) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > at $Proxy8.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:160) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3117) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790) > 10746169 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:59:03.474 # Written: 522000000 records > Exception in thread "ResponseProcessor for block blk_4201760269657070412_73948" java.lang.OutOfMemoryError > at sun.misc.Unsafe.allocateMemory(Native Method) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:117) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305) > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75) > at sun.nio.ch.IOUtil.read(IOUtil.java:223) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) > at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at java.io.DataInputStream.readLong(DataInputStream.java:416) > at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964) > # > # There is insufficient memory for the Java Runtime Environment to continue. > # Native memory allocation (malloc) failed to allocate 32 bytes for intptr_t in /build/buildd/openjdk-6-6b23~pre11/build/openjdk/hotspot/src/share/vm/runtime/deoptimization.cpp > [thread 1587264368 also had an error] > [thread 1111309168 also had an error] > [thread 1820371824 also had an error] > [thread 1343454064 also had an error] > [thread 1345444720 also had an error] > # An error report file with more information is saved as: > # [thread 1345444720 also had an error] > [thread -1091290256 also had an error] > [thread 678165360 also had an error] > [thread 678497136 also had an error] > [thread 675511152 also had an error] > [thread 1385937776 also had an error] > [thread 911969136 also had an error] > [thread -1086207120 also had an error] > [thread -1088251024 also had an error] > [thread -1088914576 also had an error] > [thread -1086870672 also had an error] > [thread 441797488 also had an error][thread 445778800 also had an error] > [thread 440400752 also had an error] > [thread 444119920 also had an error][thread 1151298416 also had an error] > [thread 443124592 also had an error] > [thread 1152625520 also had an error] > [thread 913628016 also had an error] > [thread -1095345296 also had an error][thread 1390799728 also had an error] > [thread 443788144 also had an error] > [thread 676506480 also had an error] > [thread 1630595952 also had an error] > pure virtual method called > terminate called without an active exception > pure virtual method called > Aborted > It seems to be a memory leak. We were opening 5 - 10 buffers to different paths when writing and closing them. We've tested that those buffers do not overrun. And they don't. But watching the application continue writing, we saw that over a period of 5 to 6 hours, it kept constantly increasing in memory, not by the average of 8MB buffer that we've set, but my small values. I'm reading the code and it seems there's a memory leak somewhere, in the way Hadoop does buffer allocation. While we specifically close the buffers if the count of open buffers is above 5 (meaning 5 * 8MB per buffer) this bug still happens. > Can it be fixed? As you can see from the strack trace, it writes a "fan-out" path of the type you see in the strack trace. We've let it execute till about 500M records, when this error blew. It's a blocker as these writers need to be production-grade ready, while they're not due to this native buffer allocation that when executing large amounts of writes, seems to generate a memory leak. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira