hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Catalin Alexandru Zamfir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8396) DataStreamer, OutOfMemoryError, unable to create new native thread
Date Sat, 12 May 2012 07:17:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273873#comment-13273873
] 

Catalin Alexandru Zamfir commented on HADOOP-8396:
--------------------------------------------------

I monitored the code with htop. Over time it grew from: "Tasks: 35, 147 thr, 1 running" to
"Tasks: 36, 3475 thr, 1 running". Once I killed it via "kill" the number of "thr" kept dropping
to the same 147. So it seems it's a direct relation between the fact that the application
allocates a big number of, I guess, native threads but does not kill them when done.

Also, I'm calling Runtime.getRuntime ().gc () every time the count of open streams is bigger
than 5. Also, I'm explicitly flushing and closing these streams before removing them and before
running the getRuntime ().gc () method. I'm not using any specific "GC" strategy, but the
default one defined by openjdk-6. There are no other GC parameters on the command line when
the program is run.

I'll be profiling the code today, using jmap and post the results here.

The output of cat /proc/sys/kernel/threads-max is: 48056
Still, the code only gets to about 3/4/5.000 "thr" (threads) in htop, reaches about 500M written
records (or less by a few millions) and dies.

Here's a dump of jmap -histo:live pid:

 num     #instances         #bytes  class name
----------------------------------------------
   1:       1303640       96984920  [B
   2:        976162       69580696  [C
   3:        648949       31149552  java.nio.HeapByteBuffer
   4:        647505       31080240  java.nio.HeapCharBuffer
   5:        533222       12797328  java.util.HashMap$Entry
   6:        481595       11558280  java.lang.String
   7:          8086        4556064  [I
   8:         29805        3901240  <constMethodKlass>
   9:        177060        2832960  java.lang.Long
  10:         29805        2388304  <methodKlass>
  11:         58863        2354520  sun.misc.FloatingDecimal
  12:             1        2097168  [Lorg.h2.util.CacheObject;
  13:         50674        2041760  <symbolKlass>


Using "watch jmap -histo:live" to see reactions i get this on the CLI, every few million records:
OpenJDK Server VM warning: GC locker is held; pre-dump GC was skipped

Also, I see the "[B" class name, alternating between 20MB and 90MB, with both of them growing
constantly over time.
Also, after running the code for 15 minutes, "Eden space" and "PS Old Generation" started
growing like crazy. "Eden space" started with an acceptable 25MB while "PS Old Generation"
something small also (10/25MB, can't remember).

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 792723456 (756.0MB)
   NewSize          = 1048576 (1.0MB)
   MaxNewSize       = 4294901760 (4095.9375MB)
   OldSize          = 4194304 (4.0MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 16777216 (16.0MB)
   MaxPermSize      = 134217728 (128.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 260177920 (248.125MB)
   used     = 104665056 (99.81637573242188MB)
   free     = 155512864 (148.30862426757812MB)
   40.22826225991814% used
>From Space:
   capacity = 1441792 (1.375MB)
   used     = 1409864 (1.3445510864257812MB)
   free     = 31928 (0.03044891357421875MB)
   97.78553355823864% used
To Space:
   capacity = 1966080 (1.875MB)
   used     = 0 (0.0MB)
   free     = 1966080 (1.875MB)
   0.0% used
PS Old Generation
   capacity = 528482304 (504.0MB)
   used     = 31693784 (30.225547790527344MB)
   free     = 496788520 (473.77445220947266MB)
   5.9971324981205045% used
PS Perm Generation
   capacity = 16777216 (16.0MB)
   used     = 13510752 (12.884857177734375MB)
   free     = 3266464 (3.115142822265625MB)
   80.53035736083984% used

Hope  this ton of information helps you.
                
> DataStreamer, OutOfMemoryError, unable to create new native thread
> ------------------------------------------------------------------
>
>                 Key: HADOOP-8396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8396
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1.0.2
>         Environment: Ubuntu 64bit, 4GB of RAM, Core Duo processors, commodity hardware.
>            Reporter: Catalin Alexandru Zamfir
>            Priority: Blocker
>              Labels: DataStreamer, I/O, OutOfMemoryError, ResponseProcessor, hadoop,,
leak, memory, rpc,
>
> We're trying to write about 1 few billion records, via "Avro". When we got this error,
that's unrelated to our code:
> 10725984 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:58:43.290 # Written: 521000000
records
> Exception in thread "DataStreamer for file /Streams/Cubed/Stuff/objGame/aRandomGame/objType/aRandomType/2012/05/11/20/29/Shard.avro
block blk_3254486396346586049_75838" java.lang.OutOfMemoryError: unable to create new native
thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:657)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:612)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>         at $Proxy8.getProtocolVersion(Unknown Source)
>         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>         at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:160)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3117)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790)
> 10746169 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:59:03.474 # Written: 522000000
records
> Exception in thread "ResponseProcessor for block blk_4201760269657070412_73948" java.lang.OutOfMemoryError
>         at sun.misc.Unsafe.allocateMemory(Native Method)
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:117)
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>         at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:223)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
>         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at java.io.DataInputStream.readLong(DataInputStream.java:416)
>         at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 32 bytes for intptr_t in /build/buildd/openjdk-6-6b23~pre11/build/openjdk/hotspot/src/share/vm/runtime/deoptimization.cpp
> [thread 1587264368 also had an error]
> [thread 1111309168 also had an error]
> [thread 1820371824 also had an error]
> [thread 1343454064 also had an error]
> [thread 1345444720 also had an error]
> # An error report file with more information is saved as:
> # [thread 1345444720 also had an error]
> [thread -1091290256 also had an error]
> [thread 678165360 also had an error]
> [thread 678497136 also had an error]
> [thread 675511152 also had an error]
> [thread 1385937776 also had an error]
> [thread 911969136 also had an error]
> [thread -1086207120 also had an error]
> [thread -1088251024 also had an error]
> [thread -1088914576 also had an error]
> [thread -1086870672 also had an error]
> [thread 441797488 also had an error][thread 445778800 also had an error]
> [thread 440400752 also had an error]
> [thread 444119920 also had an error][thread 1151298416 also had an error]
> [thread 443124592 also had an error]
> [thread 1152625520 also had an error]
> [thread 913628016 also had an error]
> [thread -1095345296 also had an error][thread 1390799728 also had an error]
> [thread 443788144 also had an error]
> [thread 676506480 also had an error]
> [thread 1630595952 also had an error]
> pure virtual method called
> terminate called without an active exception
> pure virtual method called
> Aborted
> It seems to be a memory leak. We were opening 5 - 10 buffers to different paths when
writing and closing them. We've tested that those buffers do not overrun. And they don't.
But watching the application continue writing, we saw that over a period of 5 to 6 hours,
it kept constantly increasing in memory, not by the average of 8MB buffer that we've set,
but my small values. I'm reading the code and it seems there's a memory leak somewhere, in
the way Hadoop does buffer allocation. While we specifically close the buffers if the count
of open buffers is above 5 (meaning 5 * 8MB per buffer) this bug still happens.
> Can it be fixed? As you can see from the strack trace, it writes a "fan-out" path of
the type you see in the strack trace. We've let it execute till about 500M records, when this
error blew. It's a blocker as these writers need to be production-grade ready, while they're
not due to this native buffer allocation that when executing large amounts of writes, seems
to generate a memory leak.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message