incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Araujo <cassandra-us...@alex.otherinbox.com>
Subject Re: Ec2 Stress Results
Date Thu, 12 May 2011 00:25:50 GMT
On 5/9/11 9:49 PM, Jonathan Ellis wrote:
> On Mon, May 9, 2011 at 5:58 PM, Alex Araujo<cassandra->>  How many
> replicas are you writing?
>> Replication factor is 3.
> So you're actually spot on the predicted numbers: you're pushing
> 20k*3=60k "raw" rows/s across your 4 machines.
>
> You might get another 10% or so from increasing memtable thresholds,
> but bottom line is you're right around what we'd expect to see.
> Furthermore, CPU is the primary bottleneck which is what you want to
> see on a pure write workload.
>
That makes a lot more sense.  I upgraded the cluster to 4 m2.4xlarge 
instances (68GB of RAM/8 CPU cores) in preparation for application 
stress tests and the results were impressive @ 200 threads per client:

+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Server Nodes | Client Nodes | --keep-going |   Columns    |    
Client    |    Total     |  Rep Factor  |  Test Rate   | Cluster Rate |
|              |              |              |              |   
Threads    |   Threads    |              |  (writes/s)  |  (writes/s)  |
+==============+==============+==============+==============+==============+==============+==============+==============+==============+
|      4       |      3       |      N       |   10000000   |     
200      |     600      |      3       |    44644     |    133931    |
+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+

The issue I'm seeing with app stress tests is that the rate will be 
comparable/acceptable at first (~100k w/s) and will degrade considerably 
(~48k w/s) until a flush and restart.  CPU usage will correspondingly be 
high at first (500-700%) and taper down to 50-200%.  My data model is 
pretty standard (<This> is pseudo-type information):

Users<Column>
"UserId<32CharHash>" : {
     "email<String>": "a@b.com",
     "first_name<String>": "John",
     "last_name<String>": "Doe"
}

UserGroups<SuperColumn>
"GroupId<UUID>": {
     "UserId<32CharHash>": {
         "date_joined<DateTime>": "2011-05-10 13:14.789",
         "date_left<DateTime>": "2011-05-11 13:14.789",
         "active<short>": "0|1"
     }
}

UserGroupTimeline<Column>
"GroupId<UUID>": {
     "date_joined<TimeUUID>": "UserId<32CharHash>"
}

UserGroupStatus<Column>
"CompositeId('GroupId<UUID>:UserId<32CharHash>')": {
     "active<short>": "0|1"
}

Every new User has a row in Users and a ColumnOrSuperColumn in the other 
3 CFs (total of 4 operations).  One notable difference is that the RAID0 
on this instance type (surprisingly) only contains two ephemeral volumes 
and appear a bit more saturated in iostat, although not enough to 
clearly stand out as the bottleneck.  Is the bottleneck in this scenario 
likely memtable flush and/or commitlog rotation settings?

RF = 2; ConsistencyLevel = One; -Xmx = 6GB; concurrent_writes: 64; all 
other settings are the defaults.  Thanks, Alex.

Mime
View raw message