incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheng Chen <chensheng2...@gmail.com>
Subject Re: Stress tests failed with secondary index
Date Fri, 08 Apr 2011 09:32:20 GMT
Thank you. I found the timeout log as attached.
It happens right after writing a memtable, which is the first flush
actually.
And I notice that there is an unusual peak of Par Survivor Space, though I'm
not sure whether it's a cause or a consequence.
No active or pending tasks are found in tpstats.

Anyway, it must be something related with secondary index.
This node can manage 40000 inserts/sec without it, while when the timeout
happens, it's only 600 inserts/sec with a single thread.


DEBUG [pool-1-thread-2] 2011-04-08 15:59:27,214 StorageProxy.java (line 268)
insert writing local RowMutation(keyspace='Keyspace1',
key='3031343733353737', modifications=[Standard1])
DEBUG [MutationStage:42] 2011-04-08 15:59:27,214 Table.java (line 346)
applying mutation of row 3031343733353737
DEBUG [MutationStage:42] 2011-04-08 15:59:27,215 Table.java (line 378)
mutating indexed column 4331 value
63383165373238643964346332663633366630363766383963633134383632636338
DEBUG [MutationStage:42] 2011-04-08 15:59:27,215 Table.java (line 395)
Pre-mutation index row is null
DEBUG [MutationStage:42] 2011-04-08 15:59:27,215 Table.java (line 493)
applying index row DecoratedKey(c81e728d9d4c2f636f067f89cc14862cc8,
63383165373238643964346332663633366630363766383963633134383632636338):ColumnFamily(<anonymous>
[3031343733353737:false:0@1302247753028033,])
 INFO [MutationStage:42] 2011-04-08 15:59:27,215 ColumnFamilyStore.java
(line 1048) Enqueuing flush of Memtable-Standard1@1078274831(375762390
bytes, 7367890 operations)
 INFO [MutationStage:42] 2011-04-08 15:59:27,216 ColumnFamilyStore.java
(line 1048) Enqueuing flush of Memtable-Standard1.4331@2098071231(33892294
bytes, 1473578 operations)
 INFO [FlushWriter:1] 2011-04-08 15:59:27,216 Memtable.java (line 157)
Writing Memtable-Standard1@1078274831(375762390 bytes, 7367890 operations)
DEBUG [WRITE-/127.0.0.1] 2011-04-08 15:59:30,418 OutboundTcpConnection.java
(line 159) attempting to connect to /127.0.0.1
DEBUG [ScheduledTasks:1] 2011-04-08 15:59:34,427 GCInspector.java (line 130)
GC for ParNew: 71 ms, 1236400968 reclaimed leaving 2760594280 used; max is
25287458816
*DEBUG [pool-1-thread-2] 2011-04-08 15:59:37,217 CassandraServer.java (line
419) ... timed out*
DEBUG [pool-1-thread-2] 2011-04-08 15:59:37,312 ClientState.java (line 91)
logged out: #<User allow_all groups=[]>
DEBUG [WRITE-/127.0.0.1] 2011-04-08 15:59:41,441 OutboundTcpConnection.java
(line 159) attempting to connect to /127.0.0.1
 INFO [FlushWriter:1] 2011-04-08 15:59:47,386 Memtable.java (line 164)
Completed flushing /data/cassandra/data/Keyspace1/Standard1-f-1-Data.db
(461229914 bytes)





2011/4/7 aaron morton <aaron@thelastpickle.com>

> Can you turn the logging up to DEBUG level and look for a message from
> CassandraServer that says "... timed out" ?
>
> Also check the thread pool stats "nodetool tpstats" to see if the node is
> keeping up.
>
> Aaron
>
> On 7 Apr 2011, at 13:43, Sheng Chen wrote:
>
> Thank you Aaron.
>
> It does not seem to be an overload problem.
>
> I have 16 cores and 48G ram on the single node, and I reduced the
> concurrent threads to be 1.
> Still, it just suddenly dies of a timeout, while the cpu, ram, disk load
> are below 10% and write latency is about 0.5ms for the past 10 minutes which
> is really fast.
>
> No logs of dropped messages are found.
>
>
>
>
>
> 2011/4/7 aaron morton <aaron@thelastpickle.com>
>
>> TimedOutException means that the less than CL number of nodes responded to
>> the coordinator before the rpc_timeout.
>>
>> So it was overloaded. Which makes sense when you say it only happens with
>> secondary indexes. Consider things like
>> - reducing the throughput
>> - reducing the number of clients
>> - ensuring the clients are connecting to all nodes in the cluster.
>>
>> You will probably find some logs about dropped messages on some nodes.
>> Aaron
>>
>> On 6 Apr 2011, at 20:39, Sheng Chen wrote:
>>
>> > I used py_stress module to insert 10m test data with a secondary index.
>> > I got the following exceptions.
>> >
>> > # python stress.py -d xxx -o insert -n 10000000 -c 5 -s 34 -C 5 -x keys
>> > total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
>> > 265322,26532,26541,0.00186140829433,10
>> > 630300,36497,36502,0.00129331431204,20
>> > 986781,35648,35640,0.0013310986218,30
>> > 1332190,34540,34534,0.00135942295893,40
>> > 1473578,14138,14138,0.00142941070007,50
>> > Process Inserter-38:
>> > Traceback (most recent call last):
>> >   File "/usr/lib64/python2.4/site-packages/multiprocessing/process.py",
>> line 237, in _bootstrap
>> >     self.run()
>> >   File "stress.py", line 242, in run
>> >     self.cclient.batch_mutate(cfmap, consistency)
>> >   File
>> "/root/apache-cassandra-0.7.4-src/interface/thrift/gen-py/cassandra/Cassandra.py",
>> line 784, in batch_mutate
>> > TimedOutException: TimedOutException(args=())
>> >     self.run()
>> >   File "stress.py", line 242, in run
>> >     self.recv_batch_mutate()
>> >   File
>> "/root/apache-cassandra-0.7.4-src/interface/thrift/gen-py/cassandra/Cassandra.py",
>> line 810, in recv_batch_mutate
>> >     raise result.te
>> >
>> >
>> > Tests without secondary index is ok at about 40k ops/sec.
>> >
>> > There is a `GC for ParNew` for about 200ms taking place every second.
>> Does it matter?
>> > The same gc for about 400ms happens every 2 seconds, which does not hurt
>> the inserts without secondary index.
>> >
>> > Thanks in advance for any advice.
>> >
>> > Sheng
>>
>>
>
>

Mime
View raw message