incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McNelis <dmcne...@gmail.com>
Subject System hints compaction stuck
Date Wed, 07 Aug 2013 13:14:27 GMT
Morning folks,

For the last couple of days all of my nodes (17, all running 1.2.8) have
been stuck at various percentages of completion for compacting
system.hints.  I've tried restarting the nodes (including a full rolling
restart of the cluster) to no avail.

When I turn on Debugging I am seeing this error on all of the nodes
constantly:

DEBUG 09:03:21,999 Thrift transport error occurred during processing of
message.
org.apache.thrift.transport.TTransportException
        at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
        at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
        at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)


When I turn on tracing, I see that shortly after this error there is a
message similar to:
TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431

The IP in this message is sometimes a client machine, sometimes another
cassandra node with no processes other than C* running on it (which I think
rules out an issue with a particular client library doing something funny
with Thrift).

While I wouldn't expect a Thrift issue to cause problems with compaction,
I'm out of other ideas at the moment.  Anyone have any thoughts they could
share?

Thanks,
David

Mime
View raw message