incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: 12-node cluster mystery
Date Sun, 09 Oct 2011 23:19:25 GMT
It looks like you're trying to use batches as a performance
optimization. Don't do that, it makes your load bursty.

On Sat, Oct 8, 2011 at 7:13 PM, Philippe <watcherfr@gmail.com> wrote:
> Dear all,
> I've just fired up our production cluster : 12 nodes, RF=3 and I've run into
> something I don't understand at all. Our test cluster was 3 nodes, RF=3
> Test cluster was AMD opteron CPUs (6x2.33) w/ 32GB RAM while the production
> cluster is core i5 (4x2.66) w/ 16 GB RAM.
>
> I'm running the same import process using Hector as I did in August on the
> test cluster, but this time, I get a lot of
> 211725 [pool-3-thread-1] WARN
> me.prettyprint.cassandra.connection.HConnectionManager  - Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
>         at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:40)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
>         at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
>         at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:219)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
>         at
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)
>         at
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)
>         at
> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
>         at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
>         at
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219)
>         at
> com.sensorly.heatmap.rollups.cassandra.CassandraRollupWithCountersDao.executeMutator(CassandraRollupWithCountersDao.java:302)
>         at
> com.sensorly.heatmap.rollups.cassandra.LoaderCallable.loadRollup(LoaderCallable.java:112)
>         at
> com.sensorly.heatmap.rollups.cassandra.LoaderCallable.run(LoaderCallable.java:74)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: TimedOutException()
>         at
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19061)
>         at
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
>         at
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
>         at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
>
> I've lowered the number of concurrent threads to one or running it locally
> on one of the nodes but it still doesn't improve.
>
> vmstat shows nothing going on on the servers
> the logs don't indicate anything
> network traffic is below 1Mbit/s (I guess that's just gossip)
> iostat shows no activity
> nearly all of the servers' memory is free
> tpstats shows that some mutations were dropped on a node.
>
> I'm stumped... what could I have missed ?
>
> Thanks
> PS: @aaron, Richard & co : your suggestions to my previous questions are
> being investigated, I'll report on my findings.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message