cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diane Griffith <>
Subject Re: horizontal query scaling issues follow on
Date Tue, 22 Jul 2014 02:23:13 GMT
So I appreciate all the help so far.  Upfront, it is possible the schema
and data query pattern could be contributing to the problem.  The schema
was born out of certain design requirements.  If it proves to be part of
what makes the scalability crumble, then I hope it will help shape the
design requirements.

Anyway, the premise of the question was my struggle where scalability
metrics fell apart going from 2 nodes to 4 nodes for the current schema and
query access pattern being modeled:
- 1 node was producing acceptable response times seemed to be the consensus
- 2 nodes showed marked improvement to the response times for the query
scenario being modeled which was welcomed news
- 4 nodes showed a decrease in performance and it was not clear why going 2
to 4 nodes triggered the decrease

Also what contributed to the question was 2 more items:
- - where in the example for HEAP_NEWSIZE states in the
comments it assumes a modern 8 core machine for pause times
- a wiki article I had found and I am trying to relocate where a person set
up very small nodes for developers on that team and talked through all the
paramters that had to be changed from the default to get good throughput.
 It sort of implied the defaults maybe were based on a certain sized vm.

That was the main driver for those questions. I agree it does not seem
correct to boost the values let alone so high to minimize impact in some
respects (i.e. not trigger the reads to time out and start over given the
retry policy).

So the question really was are the defaults sized with the assumption of a
certain minimal vm size (i.e. the comment in

Does that explain where I am coming from better?

My question, despite being naive and ignoring other impacts still stands,
is there a minimal vm size that is more of the sweet spot for cassandra and
the defaults.  I get the point that a column family schema as it relates to
the desired queries can and do impact that answer.  I guess what bothered
me was it didn't impact that answer going from 1 node to 2 nodes but
started showing up going from 2 nodes to 4 nodes.

I'm building whatever facts I can to support the schema and query pattern
scales or does not.  If it does not, then I am trying to pull information
from some metrics outputted by nodetool or log statements on the cassandra
log files to support a case to change the design requirements.


On Mon, Jul 21, 2014 at 8:15 PM, Robert Coli <> wrote:

> On Sun, Jul 20, 2014 at 6:12 PM, Diane Griffith <>
> wrote:
>> I am running tests again across different number of client threads and
>> number of nodes but this time I tweaked some of the timeouts configured for
>> the nodes in the cluster.  I was able to get better performance on the
>> nodes at 10 client threads by upping 4 timeout values in cassandra.yaml to
>> 240000:
> If you have to tune these timeout values, you have probably modeled data
> in such a way that each of your requests is "quite large" or "quite slow".
> This is usually, but not always, an indicator that you are Doing It Wrong.
> Massively multithreaded things don't generally like their threads to be
> long-lived, for what should hopefully be obvious reasons.
>> I did this because of my interpretation of the cfhistograms output on one
>> of the nodes.
> Could you be more specific?
>> So 3 questions that come to mind:
>>    1. Did I interpret the histogram information correctly in cassandra
>>    2.0.6 nodetool output?  That the 2 column read latency output is the offset
>>    or left column is the time in milliseconds and the right column is number
>>    of requests that fell into that bucket range.
>>    2. Was it reasonable for me to boost those 4 timeouts and just those?
>> Not really. In 5 years of operating Cassandra, I've never had a problem
> whose solution was to increase these timeouts from their default.
>>    1. What are reasonable timeout values for smaller vm sizes (i.e. 8GB
>>    RAM, 4 CPUs)?
>> As above, I question the premise of this question.
> =Rob

View raw message