cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From benjamin roth <brs...@gmail.com>
Subject Re: How can I scale my read rate?
Date Fri, 17 Mar 2017 18:03:57 GMT
Looks like you have a concurrency problem.
A 99% 0.3ms read time looks really decent. But if you have 12 nodes and no
concurrency, each node could process 3333 req/s of these resulting in 40000
req/s for 12 nodes.
That even WITHOUT concurrency. With 8 cores you can get out much more out
of it. Of course this is a very rough estimation but should show that 15k/s
is still quite low.
It seems that something is blocking. As can be seen in max latency. You
should take a look at GC related JMX metrics (or nodetool gcstats).
And maybe you should try to tweak concurrency settings in cassandra.yaml
like concurrent_reads. The defaults are rather conservative.

2017-03-17 17:26 GMT+01:00 S G <sg.online.email@gmail.com>:

> Thank you all for the nice pointers.
> I am using 22 machines each running 60 threads as reading clients.
> Each client has 8 cores and 23 GB RAM.
>
> On the Cassandra side, I bumped up the machines from 2 core, 5.5gb nodes
> to 8 core, 24 gb nodes and the performance improved from 15,000 to 35,000
> reads/second.
>
> Now it is stuck again at this throughput.
>
> I have tried using G1 garbage collector and played with
> increasing/decreasing threads and machines on the clients reading Cassandra.
> No effect.
>
> Strangest part is this:
> It seems adding more nodes to Cassandra cluster has no effect !
> I had 12 machines a while ago and when I bumped the cluster size to 16
> nodes (33% bigger), there was zero improvement on the read performance.
> Is the datastax client smart enough to route a get-by-primary-key query to
> the right node always?
> Or does the Cassandra nodes redirect queries to the right node after they
> find a right token-range?
> If its the latter, then it seems a bigger cluster size's advantages are
> being offset by an increasing number of redirects between the Cassandra
> nodes.
>
> I tried to look at Cassandra JMX metrics but there are way too many of
> them.
> What metrics should I look into?
>
>
> The tablehistogram for the table being queried looks awesome.
>
> ​
> How do I diagnose this system for a faulty setting?
> What are the usual suspects in such cases?
>
> Thanks
> SG
>
>
>
>
> On Thu, Mar 16, 2017 at 8:11 AM, Patrick McFadin <pmcfadin@gmail.com>
> wrote:
>
>> My general rule of thumb when it comes to stress clients to cluster under
>> load is around 2:1. So with a 12 node cluster, 24 stress clients. As Eric
>> stated, monitoring your stress clients is critical to making sure you have
>> adequate load being produced.
>>
>> Patrick
>>
>> On Wed, Mar 15, 2017 at 3:50 PM, Carlos Rolo <rolo@pythian.com> wrote:
>>
>>> Plus is one of the most documented questions available...
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>>> *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>>> www.pythian.com
>>>
>>> On Wed, Mar 15, 2017 at 10:29 PM, James Carman <
>>> james@carmanconsulting.com> wrote:
>>>
>>>> Come on, man.  That's not called for.  Someone is coming here to the
>>>> community for help.  Let's be courteous, please.
>>>>
>>>> On Wed, Mar 15, 2017 at 5:12 PM daemeon reiydelle <daemeonr@gmail.com>
>>>> wrote:
>>>>
>>>>> with that brain damaged memory setting yes.
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>>>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>>>
>>>>> On Tue, Mar 14, 2017 at 3:45 PM, S G <sg.online.email@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to scale my read throughput.
>>>>>
>>>>> I have a 12 node Cassandra cluster, each node has:
>>>>> RAM: 5.5gb
>>>>> Disk: 64gb
>>>>> C* version: 3.3
>>>>> Java: 1.8.0_51
>>>>>
>>>>> The cluster has 2 tables - each with 2 million rows.
>>>>> Partition keys for all these rows are unique.
>>>>>
>>>>> I am stress-reading it from a 15-node client cluster.
>>>>> Each read-client has 40 threads, so total 600 read-threads from 12
>>>>> machines.
>>>>> Each read query is get-by-primary-key where keys are read randomly
>>>>> from a file having all primary-keys.
>>>>>
>>>>>
>>>>> I am able to get only 15,000 reads/second from the entire system.
>>>>> Is this the best read performance I can expect?
>>>>> Are there any benchmarks I can compare against?
>>>>>
>>>>>
>>>>> I tried setting bloom_filter_fp_chance from 0.01 to 0.0001 and caching
>>>>> to {'keys': 'ALL', 'rows_per_partition': 'ALL'} but it had no effect.
>>>>>
>>>>> I have also captured few JMX metric graphs as follows:
>>>>> [image: Cassandra-Bulk-Read-Metrics.png]
>>>>> ​
>>>>> Garbage collection metrics did not look very interesting, hence I have
>>>>> not added the same.
>>>>>
>>>>> Any ideas what more I can do to debug the system and increase its read
>>>>> throughput?
>>>>>
>>>>> Thanks
>>>>> SG
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Mar 11, 2017 at 2:33 PM, Jeff Jirsa <jjirsa@gmail.com>
wrote:
>>>>>
>>>>> 5.5G of ram isn't very much for a jvm based database. Having fewer
>>>>> instances with more ram will likely give you better performance.
>>>>>
>>>>> Also, this is probably a better question for the user list than the
>>>>> Dev list
>>>>>
>>>>> --
>>>>> Jeff Jirsa
>>>>>
>>>>>
>>>>> > On Mar 11, 2017, at 1:20 PM, S G <sg.online.email@gmail.com>
wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I have a 9 node cassandra cluster, each node has:
>>>>> > RAM: 5.5gb
>>>>> > Disk: 64gb
>>>>> > C* version: 3.3
>>>>> > Java: 1.8.0_51
>>>>> >
>>>>> > The cluster stores about 2 million rows and partition keys for all
>>>>> these
>>>>> > rows are unique.
>>>>> >
>>>>> > I am stress-reading it from a 12-node client cluster.
>>>>> > Each read-client has 50 threads, so total 600 read-threads from
12
>>>>> machines.
>>>>> > Each read query is get-by-primary-key where keys are read randomly
>>>>> from a
>>>>> > file having all primary-keys.
>>>>> >
>>>>> >
>>>>> > I am able to get only 15,000 reads/second from the entire system.
>>>>> > Is this the best read performance I can expect?
>>>>> > Are there any benchmarks I can compare against?
>>>>> >
>>>>> >
>>>>> > I tried setting bloom_filter_fp_chance from 0.01 to 0.0001 and
>>>>> caching to
>>>>> > {'keys': 'ALL', 'rows_per_partition': 'ALL'} but it had no effect.
>>>>> >
>>>>> > Thx,
>>>>> > SG
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message