incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe <watche...@gmail.com>
Subject 12-node cluster mystery
Date Sun, 09 Oct 2011 00:13:44 GMT
Dear all,
I've just fired up our production cluster : 12 nodes, RF=3 and I've run into
something I don't understand at all. Our test cluster was 3 nodes, RF=3
Test cluster was AMD opteron CPUs (6x2.33) w/ 32GB RAM while the production
cluster is core i5 (4x2.66) w/ 16 GB RAM.

I'm running the same import process using Hector as I did in August on the
test cluster, but this time, I get a lot of
211725 [pool-3-thread-1] WARN
me.prettyprint.cassandra.connection.HConnectionManager  - Exception:
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
        at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:40)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
        at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
        at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:219)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
        at
me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)
        at
me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)
        at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
        at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
        at
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219)
        at
com.sensorly.heatmap.rollups.cassandra.CassandraRollupWithCountersDao.executeMutator(CassandraRollupWithCountersDao.java:302)
        at
com.sensorly.heatmap.rollups.cassandra.LoaderCallable.loadRollup(LoaderCallable.java:112)
        at
com.sensorly.heatmap.rollups.cassandra.LoaderCallable.run(LoaderCallable.java:74)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: TimedOutException()
        at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19061)
        at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
        at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
        at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)

I've lowered the number of concurrent threads to one or running it locally
on one of the nodes but it still doesn't improve.

   - vmstat shows nothing going on on the servers
   - the logs don't indicate anything
   - network traffic is below 1Mbit/s (I guess that's just gossip)
   - iostat shows no activity
   - nearly all of the servers' memory is free
   - tpstats shows that some mutations were dropped on a node.

I'm stumped... what could I have missed ?

Thanks
PS: @aaron, Richard & co : your suggestions to my previous questions are
being investigated, I'll report on my findings.

Mime
View raw message