cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: performance problems on new cluster
Date Thu, 11 Aug 2011 22:01:02 GMT
Is there a reason you are using the trunk and not one of the tagged releases? Official releases
are a lot more stable than the trunk. 

> 1) thrift timeouts & general degraded response times
For read or writes ? What sort of queries are you running ? Check the local latency on each
node using cfstats and cfhistogram, and a bit of iostat
What does nodetool tpstats say, is there a stage backing up?

If the local latency is OK look at the cross DC situation. What CL are you using? Are nodes
timing out waiting for nodes in other DC's ? 

> 2) *lots* of exception errors, such as:
Repair is trying to run on a response which is a digest response, this should not be happening.
Can you provide some more info on the type of query you are running ? 

> 3) ring imbalances during a repair (refer to the above nodetool ring output)
You may be seeing this
I think it's a mistake that is it marked as resolved. 

> 4) regular failure detection when any node does something only moderately stressful,
such as a repair or are under light load etc. but the node itself thinks it is fine.
What version are you using ? 

I'd take a look at the exceptions first then move onto the performance issues. 


Aaron Morton
Freelance Cassandra Developer

On 12 Aug 2011, at 03:16, Anton Winter wrote:

> Hi,
> I have recently been migrating to a small 12 node Cassandra cluster spanning across 4
DC's and have been encountering various issues with what I suspect to be a performance tuning
issue with my data set.  I've learnt a few lessons along the way but I'm at a bit of a roadblock
now where I have been experiencing frequent OutOfMemory exceptions, various other exceptions,
poor performance and my ring is appearing to become imbalanced during repairs.  I've tried
various different configurations but haven't been able to get to the bottom of my performance
issues.  I'm assuming this has something to do with my data and some performance tuning metric
that I'm merely overlooking.
> My ring was created as documented in the wiki & various other performance tuning
guides, calculating the tokens at each DC and incrementing when in conflict.  It is as follows:
> Address         DC          Rack        Status State   Load            Owns    Token
>                                                                               113427455640312821154458202477256070487
> dc1host1  dc1          1a          Up     Normal  88.62 GB        33.33%  0
> dc2host1  dc2          1           Up     Normal  14.76 GB        0.00%   1
> dc3host1    dc3          1           Up     Normal  15.99 GB        0.00%   2
> dc4host1    cd4          1           Up     Normal  14.52 GB        0.00%   3
> dc1host2   dc1          1a          Up     Normal  18.02 GB        33.33%  56713727820156410577229101238628035242
> dc2host2  dc2          1           Up     Normal  16.5 GB         0.00%   56713727820156410577229101238628035243
> dc3host2     dc3          1           Up     Normal  16.37 GB        0.00%   56713727820156410577229101238628035244
> dc4host2    dc4          1           Up     Normal  13.34 GB        0.00%   56713727820156410577229101238628035245
> dc1host3  dc1          1a          Up     Normal  16.59 GB        33.33%  113427455640312821154458202477256070484
> dc2host3   dc2          1           Up     Normal  15.22 GB        0.00%   113427455640312821154458202477256070485
> dc3host3   dc3          1           Up     Normal  15.59 GB        0.00%   113427455640312821154458202477256070486
> dc4host3    dc4          1           Up     Normal  8.84 GB         0.00%   113427455640312821154458202477256070487
> The above ring was freshly created and fairly evenly distributed in load prior to a repair
(which is still running at the time of the above command) on dc1host1, however with the exception
of dc4host3 where a previous bulk data load timed out.  dc4host3 was responding poorly, was
failing according to other nodes and judging from its heap usage was rather close to OOM'ing
before it was restarted.
> I'm also using NTS with RF2.
> The primary issues I'm experiencing are:
> Light load against nodes in dc1 was causing OutOfMemory exceptions across all Cassandra
servers outside of dc1 which were all idle and eventually after several hours happened on
one of the dc1 nodes.  This issue was produced using svn trunk r1153002 and an in house written
Snitched which effectively combined PropertyFileSnitch with some components of Ec2Snitch.
 While trying to resolve these issues I have moved to a r1156490 snapshot and have switched
across to just the PropertyFileSnitch and simply utilising the broadcast_address configuration
option available in trunk which seems to work quite well.
> Since moving to r1156490 we have stopped getting OOM's, but that may actually be because
we have been unable to send traffic to the cluster to be able to produce one.
> The most current issues I have been experiencing are the following:
> 1) thrift timeouts & general degraded response times
> 2) *lots* of exception errors, such as:
> ERROR [ReadRepairStage:1076] 2011-08-11 13:33:41,266 (line
133) Fatal exception in thread Thread[ReadRepairStage:1076,5,main]
> java.lang.AssertionError
>        at org.apache.cassandra.service.RowRepairResolver.resolve(
>        at org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(
>        at
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
>        at java.util.concurrent.ThreadPoolExecutor$
>        at
> 3) ring imbalances during a repair (refer to the above nodetool ring output)
> 4) regular failure detection when any node does something only moderately stressful,
such as a repair or are under light load etc. but the node itself thinks it is fine.
> My hosts are all 32Gb with either 4 or 16 cores, I've set heaps appropriately to half
physical memory (16G) and for the purpose of cluster simplicity set all younggen to 400Mb.
 JNA is in use, commitlogs and data have been split onto different filesystems and so on.
> My data set as described by a dev is essentially as follows:
> 3 column families (tables):
> cf1.  The RowKey is the user id.  This is the primary column family queried on and always
just looked up by RowKey.  It has 1 supercolumn called "seg".  The column names in this supercolumn
are the segment_id's that the user belongs to and the value is just "1".  This should have
about 150mm rows.  Each row will have an average of 2-3 columns in the "seg" supercolumn.
 The column values have TTL's set on them.
> cf2.  This is a CounterColumnFamily.  There's only a single "cnt" column which stores
a counter of the number of cf1's having that segment.  This was only updated during the import
and is not read at all.
> cf3.  This is a lookup between the RowKey which is an external ID and the RowKey to be
used to find the user in the cf1  CF.
> Does anyone have any ideas or suggestions about where I should be focusing on to get
to the bottom of these issues or any recommendations on where I should be focusing my efforts
> Thanks,
> Anton

View raw message