cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: ReplicateOnWrite issues
Date Tue, 12 Jul 2011 17:05:00 GMT
what do you mean by "until they are full" ?

right now I guess a quick black-box testing method for this problem is
to try inserting only shorter rows , and see if that persists.

as you said, it could be that addReadCommandFromColumnFamily is taking
a lot of time to read, if that's from disk, it's going to be slow, can
u try to use a huge memory and just test with those few rows of
counters, so that they are in memory? if this is still slow, u may try
JIRA cassandra-2843  (for long row reading)



On Tue, Jul 12, 2011 at 12:37 AM, David Hawthorne <dhawth@gmx.3crowd.com> wrote:
> It's definitely for counters, and some of the rows I'm inserting are long-ish, if 1.3MB
is long.
>
> Maybe it would help if I said I was using counter super columns.  I'm also writing to
only a handful of rows at a time, until they are full.  It looks like the counter super column
code in addReadCommandFromColumnFamily, in the ReplicateOnWrite handling, might not be optimized
for this kind of load.
>
> It looks like I may be able to turn off replicate_on_write for the CFs as long as I use
CL.ONE to get around this in the short term, but I'd like to know what happens when replicate
on write is true and when it is false so I can make a more informed choice.
>
> On Jul 12, 2011, at 12:20 AM, Yang wrote:
>
>> interesting,
>>
>> first just to make sure: since replicateOnWrite is for Counters, you
>> are using counters (you use the word "insert" instead of
>> "add/increment" ) right?
>>
>> if you are using counters, supposedly the leader runs
>> replicateOnWrite, somehow all your adds find the one box as leader,
>> that's probably something worth investigating.
>>
>> finally, u could try debugging into those replicateOnWrite tasks and
>> see what they are doing exactly, maybe there is something immediately
>> wrong
>>
>>
>>
>> On Mon, Jul 11, 2011 at 11:57 PM, David Hawthorne <dhawth@gmx.3crowd.com> wrote:
>>> This is now my fourth attempt to get the message through.  Apologies if you
see multiple copies.
>>>
>>> I've tried to give as much relevant data as I can think of, but please let me
know if you need any other info.  I spent the day getting jmxtrans to talk to statsd with
the cassandra JMX data, so I can provide graphs of just about anything you want.  I can also
re-run the tests to replicate the problem if need be.  As always, thanks for taking a look!
>>>
>>> I have a 4 host test cluster that I'm writing counters into running 0.8.1, and
I'm writing to it with hector with CL.ONE or CL.QUORUM (test was run with each and both gave
the same results).  Partitioner is the RandomPartitioner, and RF=3.  nodetool ring shows
that actual data is well balanced:
>>>
>>> Address         DC          Rack        Status State   Load  
         Owns    Token
>>>                                                    
                         127605887595351923798765477786913079296
>>> 10.0.0.57    datacenter1 rack1       Up     Normal  1.59 GB      
  25.00%  0
>>> 10.0.0.56    datacenter1 rack1       Up     Normal  1.39 GB      
  25.00%  42535295865117307932921825928971026432
>>> 10.0.0.55    datacenter1 rack1       Up     Normal  1.24 GB      
  25.00%  85070591730234615865843651857942052864
>>> 10.0.0.54    datacenter1 rack1       Up     Normal  1.34 GB      
  25.00%  127605887595351923798765477786913079296
>>>
>>> I also tested against a 1-box test cluster with RF=1.
>>>
>>> Both show the same results:  high performance for a while, and then ReplicateOnWrite
backs way up (I've seen it as high as a million), the cassandra process becomes unresponsive,
and the hector client starts throwing exceptions.  High performance in this case equates
to about 5-20k inserts/sec on both the 4-box and 1-box cluster (I'm inserting one long row
at a time, so the performance is bounded by the one box owning the row).  The cassandra process
never seems to recover, even if left for 24 hours.  It's still chewing through those ReplicateOnWrite
pending tasks.
>>>
>>> GC doesn't seem to be a factor.  Logs show GCs completing in < 1s on all
4 boxes in the test cluster, as well as on the 1 standalone server.
>>>
>>> I'm also seeing highly erratic performance from the cluster in general:  inserts/sec
usually start at around 3k, but then maybe creep up towards 6k, then drop back down to below
1k for an extended period of time, then maybe 1% of the time they'll spike up to 8k or even
15k, but that doesn't last more than 30 seconds.
>>>
>>> Here's the tpstats output from all 4 boxes:
>>>
>>> Slowest/most problematic/highest cpu usage one:
>>>
>>> This tpstats output is also representative of the one box cluster about 20 minutes
into my insert run.
>>>
>>> 4 $ ./nodetool -h localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> ReadStage                         0         0          
   0
>>> RequestResponseStage              0         0       59095840
>>> MutationStage                     0         0       31182942
>>> ReadRepairStage                   0         0            
 0
>>> ReplicateOnWriteStage            32    928042       28246716
>>> GossipStage                       1         1         449464
>>> AntiEntropyStage                  0         0            
 0
>>> MigrationStage                    0         0            
 4
>>> MemtablePostFlusher               0         0            445
>>> StreamStage                       0         0            
 0
>>> FlushWriter                       0         0            445
>>> FILEUTILS-DELETE-POOL             0         0             38
>>> MiscStage                         0         0          
   0
>>> FlushSorter                       0         0            
 0
>>> InternalResponseStage             0         0              4
>>> HintedHandoff                     0         0            
 0
>>>
>>> Second highest cpu usage one:
>>>
>>> 3 $ ./nodetool -h localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> ReadStage                         0         0          
   0
>>> RequestResponseStage              0         0       27413910
>>> MutationStage                     0         0       27523094
>>> ReadRepairStage                   0         0            
 0
>>> ReplicateOnWriteStage             0         0       11111971
>>> GossipStage                       0         0         335422
>>> AntiEntropyStage                  0         0            
 0
>>> MigrationStage                    0         0            
 4
>>> MemtablePostFlusher               0         0             66
>>> StreamStage                       0         0            
 0
>>> FlushWriter                       0         0            
66
>>> FILEUTILS-DELETE-POOL             0         0             31
>>> MiscStage                         0         0          
   0
>>> FlushSorter                       0         0            
 0
>>> InternalResponseStage             0         0              4
>>> HintedHandoff                     0         0            
 0
>>>
>>> The other 2 that see minimal cpu usage:
>>>
>>> 2 $ ./nodetool -h localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> ReadStage                         0         0          
   0
>>> RequestResponseStage              0         0        9213999
>>> MutationStage                     0         0       33233114
>>> ReadRepairStage                   0         0            
 0
>>> ReplicateOnWriteStage             0         0              0
>>> GossipStage                       0         0         336016
>>> AntiEntropyStage                  0         0            
 0
>>> MigrationStage                    0         0            
 4
>>> MemtablePostFlusher               0         0             77
>>> StreamStage                       0         0            
 0
>>> FlushWriter                       0         0            
77
>>> FILEUTILS-DELETE-POOL             0         0             40
>>> MiscStage                         0         0          
   0
>>> FlushSorter                       0         0            
 0
>>> InternalResponseStage             0         0              4
>>> HintedHandoff                     0         0            
 0
>>>
>>>
>>> 1 $ ./nodetool -h localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> ReadStage                         0         0          
   0
>>> RequestResponseStage              0         0       13796875
>>> MutationStage                     0         0       36116662
>>> ReadRepairStage                   0         0            
 0
>>> ReplicateOnWriteStage             0         0              0
>>> GossipStage                       0         0         336496
>>> AntiEntropyStage                  0         0            
 0
>>> MigrationStage                    0         0            
 4
>>> MemtablePostFlusher               0         0             91
>>> StreamStage                       0         0            
 0
>>> FlushWriter                       0         0            
91
>>> FILEUTILS-DELETE-POOL             0         0             43
>>> MiscStage                         0         0          
   0
>>> FlushSorter                       0         0            
 0
>>> InternalResponseStage             0         0              4
>>> HintedHandoff                     0         0            
 0
>
>

Mime
View raw message