If one big query doesn't cause problems
Every row you read becomes a (roughly) RF number of tasks in the cluster. If you ask for 100 rows in one query it will generate 300 tasks that are processed by the read thread pool which as a default of 32 threads. If you ask for a lot of rows and the number of nodes in low there is a chance the client starve others as they wait for all the tasks to be completed. So i tend to like asking for fewer rows.
Co-Founder & Principal Consultant
Apache Cassandra Consulting
I assume 10k is the return limit. I don't think I'll ever get close
to 10k matches to the IN query. That said, you're right: to be safe
I'll increase the limit to match the number of items on the IN.
I didn't know CQL supported stored procedures, but I'll take a
look. I suppose my question was asking about parsing overhead,
however. If one big query doesn't cause problems--which I assume it
wouldn't since there can be multiple threads parsing and I assume C*
is smart about memory when accumulating results--I'd much rather do
On 11/6/13 3:05 PM, Nate McCall wrote:
Unless you explicitly set a page size (i'm pretty
sure the query is converted to a paging query automatically
under the hood) you will get capped at the default of 10k which
might get a little weird semantically. That said, you should
experiment with explicit page sizes and see where it gets you
(i've not tried this yet with an IN clause - would be real
curious to hear how it worked).
Another thing to consider is that it's a pretty big
statement to parse every time. You might want to go the (much)
smaller batch route so these can be stored procedures?
(another thing I havent tried with IN clause - don't see why
it would not work though).