incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Gould <>
Subject Re: CQL 'IN' predicate
Date Wed, 06 Nov 2013 23:19:48 GMT
Thanks Nate,

I assume 10k is the return limit.  I don't think I'll ever get close to 
10k matches to the IN query.  That said, you're right: to be safe I'll 
increase the limit to match the number of items on the IN.

I didn't know CQL supported stored procedures, but I'll take a look.  I 
suppose my question was asking about parsing overhead, however.  If one 
big query doesn't cause problems--which I assume it wouldn't since there 
can be multiple threads parsing and I assume C* is smart about memory 
when accumulating results--I'd much rather do that.


On 11/6/13 3:05 PM, Nate McCall wrote:
> Unless you explicitly set a page size (i'm pretty sure the query is 
> converted to a paging query automatically under the hood) you will get 
> capped at the default of 10k which might get a little weird 
> semantically. That said, you should experiment with explicit page 
> sizes and see where it gets you (i've not tried this yet with an IN 
> clause - would be real curious to hear how it worked).
> Another thing to consider is that it's a pretty big statement to parse 
> every time. You might want to go the (much) smaller batch route so 
> these can be stored procedures? (another thing I havent tried with IN 
> clause - don't see why it would not work though).
> On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould < 
> <>> wrote:
>     I was wondering if anyone had a sense of performance/best practices
>     around the 'IN' predicate.
>     I have a list of up to potentially ~30k keys that I want to look
>     up in a
>     table (typically queries will have <500, but I worry about the
>     long tail).  Most
>     of them will not exist in the table, but, say, about 10-20% will.
>     Would it be best to do:
>     1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ......
>     uuid30000);
>     2) Split into smaller batches--
>     for group_of_100 in all_30000:
>        // ** Issue in parallel or block after each one??
>        SELECT fields FROM table WHERE id in (group_of_100 uuids);
>     3) Something else?
>     My guess is that (1) is fine and that the only worry is too much
>     data returned (which won't be a problem in this case), but I
>     wanted to check that it's not a C* anti-pattern before.
>     [Conversely, is a batch insert with up to 30k items ok?]
>     Thanks,
>     Dan
> -- 
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting

View raw message