incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Gould <...@chill.com>
Subject CQL 'IN' predicate
Date Wed, 06 Nov 2013 22:08:58 GMT
I was wondering if anyone had a sense of performance/best practices
around the 'IN' predicate.

I have a list of up to potentially ~30k keys that I want to look up in a
table (typically queries will have <500, but I worry about the long tail).  Most
of them will not exist in the table, but, say, about 10-20% will.

Would it be best to do:

1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ...... uuid30000);

2) Split into smaller batches--
for group_of_100 in all_30000:
    // ** Issue in parallel or block after each one??
    SELECT fields FROM table WHERE id in (group_of_100 uuids);

3) Something else?

My guess is that (1) is fine and that the only worry is too much data returned (which won't
be a problem in this case), but I wanted to check that it's not a C* anti-pattern before.

[Conversely, is a batch insert with up to 30k items ok?]

Thanks,
Dan


Mime
View raw message