incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Batch get queries
Date Sun, 21 Apr 2013 20:34:22 GMT
> This is very acceptable but wanted to get everyone's take as I have seen messages about
this "starving" the request pool. 
The issue with sending large mutli gets or batch mutations is that it can reduce overall request
throughput. Every row in your 10K multi becomes RF number of tasks that are placed into read
thread pools. If these pools are full (which is more likely with smaller clusters) servicing
one request they are not servicing requests from other clients. 

Additionally large requests are more likely to upset the delicate flower that is JVM GC. 

10K feels like a lot to me. I would run a test to see the overall throughput for a single
thread, at 100, 200, 400, 800 etc rows per request. At some point the gains in overall throughput
for that one client will drop off. 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/04/2013, at 5:05 AM, Keith Wright <kwright@nanigans.com> wrote:

> Hi all,
> 
>    I am using C* 1.2.4 and using CQL3 with Astyanax to consume large amount of user based
data (around 50-100K / sec).  Requests come in based on user cookies which I then need to
link to a user (as users can change their cookies).  This is done using a link table:
> 
> CREATE TABLE cookie_user_lookup (
> 	cookie TEXT PRIMARY KEY,
> 	user_id BIGINT,
>         creation_time TIMESTAMP
> ) with  compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and

> compaction={'class':'LeveledCompactionStrategy'} and 
> gc_grace_seconds = 86400;
> 
> As I said, I am handling a large number of these per second and wanted to get your take
on how best to do the lookup.  I find that there are 3 ways:
> 	• Serially fetch 1 by 1.  The latency is very low at 0.1 ms but multiplying that by
thousands per second becomes substantial.  This is too slow
> 	• Serially fetch 1 by 1 but on separate threads.  This would require a very large
number of concurrent connections (unless I change to datastax's binary protocol) as well as
threads.  Seems heavy
> 	• Batch fetch.  This is what I'm doing now where I build a very large select * from
cookie_user_lookup where cookie in (a,b,c,.. Etc).  I am actually doing around 10K of these
at a time and getting a response time in my cluster of around 100 ms.  This is very acceptable
but wanted to get everyone's take as I have seen messages about this "starving" the request
pool.  Note that I'm running in HSHA and am rarely seeing any reads waiting.
> I appreciate your input!


Mime
View raw message