incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject Re: Cassandra lockup (0.6.5) on bulk write
Date Wed, 06 Oct 2010 00:57:07 GMT
Some quick answers...

Cloud Kick monitoring may be helpful

rpc_timeout_in_ms is the timeout for communication between nodes in the cluster, not clients
and the cluster. 

AFAIK you should be setting timeouts on the Thrift socket. If you get a timeout then retry
on another node. Email the pelops peeps and ask them if there was a reason for not setting

Not sure about setting keep alive. 

Your tokens are different to the one calculated by the python function in the Load Balancing
section here . You may want to take another look
at them, I get 

I'm not sure you are doing yourself any favors writing at CF ONE. It means you clients will
get an ack as soon as one node has written it's data, and you're leaving the rest of the nodes
to sort them selves out. Which *may not happen* if you are overloading the cluster. The overloaded
nodes could drop the messages or the message could timeout, which is fine because you've told
the cluster your happy with low consistency.

So after the load will want to run repair to make sure the data is correctly replicated. I
would suggest writing at CF.QUORUM or higher. And still running the repair after, when hopefully
it will have little to do. 

Hope that helps

On 06 Oct, 2010,at 12:08 PM, Jason Horman <> wrote:

Yes, you are right. For some reason we didn't notice it. The Cassandra
server itself was up still, so the on machine monitoring for the
process didn't go off. nodetool shows it is unresponsive though. We
are still learning how to properly monitor. The machine that went down
ran out of disk space on Amazon EBS.

So I believe that our client was connected to that machine when it ran
into trouble. I am a little surprised that it wasn't disconnected, it
just hung forever. We are using Pelops, which doesn't seem to set a
thrift timeout. It also sets keep alive on the socket. Here is the
pelops connection code.

socket = new TSocket(nodeContext.node, port);

Server side the default rpc timeout is used.

Is RpcTimeoutInMillis supposed to have booted our client after 10s, or
is the server now just in a really bad state. Should I modify Pelops
to set a timeout on the TSocket. Is setKeepAlive recommended.

We are writing at consistency level ONE, replication factor is 4.
There are 5 cassandra servers at the moment but in production we will
run with more. This is on Amazon EC2/EBS so IO performance isn't
great. I think that the cluster appears unbalanced b/c of the high
replication factor.

If you are interested here is the stack trace from the machine that
ran out of space.

ERROR [COMMIT-LOG-WRITER] 2010-10-05 16:05:22,393
(line 83) Uncaught exception in thread
java.lang.RuntimeException: java.lang.RuntimeException: No space left on device
at org.apache.cassandra.utils.WrappedRunnablerun(
Caused by: java.lang.RuntimeException: No space
left on device
at org.apache.cassandra.db.commitlog.BatchCommitLogExecutorService.processWithSyncBatch(
at org.apache.cassandra.db.commitlog.BatchCommitLogExecutorService.access$000(
at org.apache.cassandra.db.commitlog.BatchCommitLogExecutorService$1.runMayThrow(
... 1 more
Caused by: No space left on device
at Method)
at org.apache.cassandra.db.commitlog.CommitLogSegment.sync(
at org.apache.cassandra.db.commitlog.CommitLog.sync(
at org.apache.cassandra.db.commitlog.BatchCommitLogExecutorService.processWithSyncBatch(
... 4 more
ERROR [COMPACTION-POOL:1] 2010-10-05 16:05:40,366
(line 83) Uncaught exception in thread
java.util.concurrent.ExecutionException: No space
left on device
at java.util.concurrent.FutureTask$Sync.innerGet(
at java.util.concurrent.FutureTask.get(
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(
at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
at java.util.concurrent.ThreadPoolExecutor$
Caused by: No space left on device
at Method)
at org.apache.cassandra.db.CompactionManager.doCompaction(
at org.apache.cassandra.db.CompactionManager$
at orgapache.cassandra.db.CompactionManager$
at java.util.concurrent.FutureTask$Sync.innerRun(
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
.. 2 more

On Tue, Oct 5, 2010 at 3:43 PM, Aaron Morton <> wrote:
> The cluster looks unbalanced (assuming the Random Partitioner), did you
> manually assign tokens to the nodes?  The section on Token Select here some
> some tips
> One of the nodes in the cluster is down. Is there anything in the log to
> explain why ? You may have some other errors.
> Also want to check:
> - your client has a list of all of the clients, so it could move to another
> if it was connected to the down node.
> - what's the RF and what consistency level are you writing at.
> - how long is the hang?
> - what happening on the server while the client is hanging? e.g. is it idle
> or is the CPU going crazy, swapping, iostat
> - what timeout are you using with thrift?
> Aaron
> On 06 Oct, 2010,at 07:28 AM, Jason Horman <> wrote:
> We are experiencing some random hangs while importing data into
> Cassandra 0.6.5. The client stack dump is below. We are using Java
> Pelops with Thrift r917130. The hang seems random, sometimes millions
> of records in, sometimes just a few thousand. It sort of smells like
> the JIRA
> Has any one else experienced this? Any advice?
> Here is a dump from nodetool
> Address Status Load Range
> Ring
> 43.41 GB
> 25274261893111669883290654807978388961 |<--|
> 29.38 GB
> 34662916595519283353151730886201323030 | ^
> 19.83 GB
> 45387569059876439228162547977665761954 v |
> 23.59 GB
> 105389616365686887162471812716889564402 | ^
> Up 33.16 GB
> 148562884084359545011181864444489491335 |-->|
> Here is the stack
> "RMI TCP Connection(4)-10.246.55223" daemon prio=10
> tid=0x00002aaac0194000 nid=0x53b3 runnable [0x000000004b7dc000]
>    java.lang.Thread.State: RUNNABLE
> at Method)
> at
> at
> at
> at
> - locked <0x000000074d23e978> (a
> at
> at org.apache.thrift.transport.TTransport.readAll(
> at
> org.apache.thrifttransport.TFramedTransport.readFrame(
> at
> at org.apache.thrift.transport.TTransport.readAll(
> at
> org.apache.thriftprotocol.TBinaryProtocol.readAll(
> at
> orgapache.thrift.protocol.TBinaryProtocol.readI32(
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(
> at
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(
> at org.wyki.cassandra.pelops.Mutator$1.execute(


  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message