accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2495) OOM exception didn't bring down tserver
Date Wed, 19 Mar 2014 16:43:44 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940648#comment-13940648
] 

Keith Turner commented on ACCUMULO-2495:
----------------------------------------

Its always seemed to me that the jvm only reacts when the heap code throws an OOME.   To confirm
this I created a little Test program that throws OOME and ran it with the following command.
 The JVM did not exit.  So I think this issue and ACCUMULO-1708 are the same case.  Non-heap
code threw an OOME and the JVM did nothing.

{noformat}
$ java -version
java version "1.6.0_30"
OpenJDK Runtime Environment (IcedTea6 1.13.1) (rhel-3.1.13.1.el6_5-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)
$ java  -XX:OnOutOfMemoryError="kill -9 %p" -cp CP Test
Keeping the process alive
Exception in thread "main" java.lang.OutOfMemoryError
	at Test.main(Test.java:43)
{noformat}

> OOM exception didn't bring down tserver
> ---------------------------------------
>
>                 Key: ACCUMULO-2495
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2495
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.1
>            Reporter: John Vines
>         Attachments: Test.java
>
>
> Got
> {code}Thread "acu-problem-reporter 2" died Direct buffer memory
> 	java.lang.OutOfMemoryError: Direct buffer memory
> 		at java.nio.Bits.reserveMemory(Bits.java:659)
> 		at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
> 		at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
> 		at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75)
> 		at sun.nio.ch.IOUtil.read(IOUtil.java:223)
> 		at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
> 		at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
> 		at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> 		at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> 		at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> 		at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> 		at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> 		at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> 		at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> 		at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> 		at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> 		at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> 		at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> 		at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:271)
> 		at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
> 		at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
> 		at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 		at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_update(TabletClientService.java:443)
> 		at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.update(TabletClientService.java:427)
> 		at org.apache.accumulo.core.client.impl.Writer.updateServer(Writer.java:69)
> 		at org.apache.accumulo.core.client.impl.Writer.update(Writer.java:97)
> 		at org.apache.accumulo.server.problems.ProblemReport.saveToMetadataTable(ProblemReport.java:134)
> 		at org.apache.accumulo.server.problems.ProblemReports$1.run(ProblemReports.java:92)
> 		at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> 		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 		at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 		at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 		at java.lang.Thread.run(Thread.java:701){code}
> while hammering a single node setup with table creates and delete. First the master went
down with an OOM after about an hour, which is strange since I gave it a gig and was only
creating and dropping tables in 64 count chunks. When I brought the master back up, I saw
that stack trace in the monitor, but nothing in the tserver logs.
> Initial logging was
> {code}
> 2014-03-18 16:59:43,977 [impl.TabletServerBatchWriter] ERROR: Failed to send tablet server
127.0.0.1:9997 its batch : Direct buffer memory
> java.lang.OutOfMemoryError: Direct buffer memory
>         at java.nio.Bits.reserveMemory(Bits.java:659)
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>         at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:223)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
>         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>         at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:271)
>         at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
>         at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_update(TabletClientService.java:443)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.update(TabletClientService.java:427)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:870)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.access$1(TabletServerBatchWriter.java:845)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:803)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:767)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:701)
> {code}. I'm using the default -XX:OnOutOfMemoryError=kill -9 %p, so I don't know why
this is still living. This seems problematic though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message