incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wright <kwri...@nanigans.com>
Subject Batch get queries
Date Fri, 19 Apr 2013 17:05:33 GMT
Hi all,

   I am using C* 1.2.4 and using CQL3 with Astyanax to consume large amount of user based
data (around 50-100K / sec).  Requests come in based on user cookies which I then need to
link to a user (as users can change their cookies).  This is done using a link table:

CREATE TABLE cookie_user_lookup (
cookie TEXT PRIMARY KEY,
user_id BIGINT,
        creation_time TIMESTAMP
) with  compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and
compaction={'class':'LeveledCompactionStrategy'} and
gc_grace_seconds = 86400;

As I said, I am handling a large number of these per second and wanted to get your take on
how best to do the lookup.  I find that there are 3 ways:

 *   Serially fetch 1 by 1.  The latency is very low at 0.1 ms but multiplying that by thousands
per second becomes substantial.  This is too slow
 *   Serially fetch 1 by 1 but on separate threads.  This would require a very large number
of concurrent connections (unless I change to datastax's binary protocol) as well as threads.
 Seems heavy
 *   Batch fetch.  This is what I'm doing now where I build a very large select * from cookie_user_lookup
where cookie in (a,b,c,.. Etc).  I am actually doing around 10K of these at a time and getting
a response time in my cluster of around 100 ms.  This is very acceptable but wanted to get
everyone's take as I have seen messages about this "starving" the request pool.  Note that
I'm running in HSHA and am rarely seeing any reads waiting.

I appreciate your input!

Mime
View raw message