cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Shuler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-8259) Add column family name when reporting OutOfMemory errors
Date Wed, 05 Nov 2014 20:22:33 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Shuler updated CASSANDRA-8259:
--------------------------------------
      Priority: Major  (was: Critical)
    Issue Type: Improvement  (was: Bug)

Marking this as an improvement. I asked the devs if it was possible to log the keyspace/table
on OOM. In your specific trace it would not be possible, since at the point of OOM the server
was populating the thrift response. Looking at the trace, wherever you were doing a "huge
multiget slice" query would be where to look in your code.

Understandably, extra information logged is always nice for troubleshooting, but it would
not surprise me if DEBUG/TRACE logging wouldn't have given you this information on what table
was being utilized. Or enable application query logging in your code.

I'm aware that either option would mean lots of logging writes and possibly a performance
hit.

> Add column family name when reporting OutOfMemory errors
> --------------------------------------------------------
>
>                 Key: CASSANDRA-8259
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8259
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jacek Furmankiewicz
>
> When we get a Thrift error like this which causes a server crash:
> {noformat}
> ERROR [Thrift:33] 2014-11-05 17:36:07,486 CassandraDaemon.java (line 196)
> Exception in thread Thread[Thrift:33,5,main]
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2271)
>         at java.io.ByteArrayOutputStream.grow
> (ByteArrayOutputStream.java:113)
>         at java.io.ByteArrayOutputStream.ensureCapacity
> (ByteArrayOutputStream.java:93)
>         at java.io.ByteArrayOutputStream.write
> (ByteArrayOutputStream.java:140)
>         at org.apache.thrift.transport.TFramedTransport.write
> (TFramedTransport.java:146)
>         at org.apache.thrift.protocol.TBinaryProtocol.writeBinary
> (TBinaryProtocol.java:183)
>         at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write
> (Column.java:678)
>         at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write
> (Column.java:611)
>         at org.apache.cassandra.thrift.Column.write(Column.java:538)
>         at org.apache.cassandra.thrift.ColumnOrSuperColumn
> $ColumnOrSuperColumnStandardScheme.write(ColumnOrSuperColumn.java:673)
>         at org.apache.cassandra.thrift.ColumnOrSuperColumn
> $ColumnOrSuperColumnStandardScheme.write(ColumnOrSuperColumn.java:607)
>         at org.apache.cassandra.thrift.ColumnOrSuperColumn.write
> (ColumnOrSuperColumn.java:517)
>         at org.apache.cassandra.thrift.Cassandra$multiget_slice_result
> $multiget_slice_resultStandardScheme.write(Cassandra.java:14559)
>         at org.apache.cassandra.thrift.Cassandra$multiget_slice_result
> $multiget_slice_resultStandardScheme.write(Cassandra.java:14463)
>         at org.apache.cassandra.thrift.Cassandra
> $multiget_slice_result.write(Cassandra.java:14393)
>         at org.apache.thrift.ProcessFunction.process
> (ProcessFunction.java:53)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at org.apache.cassandra.thrift.CustomTThreadPoolServer
> $WorkerProcess.run(CustomTThreadPoolServer.java:194)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker
> (ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run
> (ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>  INFO [StorageServiceShutdownHook] 2014-11-05 17:36:07,488
> ThriftServer.java (line 141) Stop listening to thrift clients
> {noformat}
> we have no clue as to which column family was being queried. That makes it extremely
difficult to troubleshoot which query in a complex code base caused this error.
> We have multiple servers and they all throw a NoAvailableHostException in Astyanax at
the same time, all in different parts of the code...so figuring out the root cause is an exercise
in frustration that takes many hours.
> At least listing the column family in this message would save us COUNTLESS hours of troubleshooting.
> We're on 2.0.8, JDK 1.7, RHEL 6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message