cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <>
Subject Re: IN versus multiple asynchronous queries
Date Tue, 07 Oct 2014 19:02:51 GMT
Also note that with an IN clause, if there is a failure fetching one of the
partitions, the entire request will fail and will need to be retried.  If
you use concurrent async queries, you'll only need to retry one small

On Mon, Oct 6, 2014 at 1:14 PM, DuyHai Doan <> wrote:

> "Definitely better to not make the coordinator hold on to that memory
> while it waits for other requests to come back" --> You get it. When
> loading big documents, you risk starving the heap quickly, triggering long
> GC cycle on the coordinator etc...
> On Mon, Oct 6, 2014 at 6:22 PM, Robert Wille <> wrote:
>>  As far as latency is concerned, it seems like it wouldn't matter very
>> much if the coordinator has to wait for all the responses to come back, or
>> the client waits for all the responses to come back. I’ve got the same
>> latency either way.
>>  I would assume that 50 coordinations is more expensive than one
>> coordination that does 50 times the work, but that’s probably insignificant
>> when compared to the actual fetching of the data from the SSTables.
>>  I do see the point about putting stress on coordinator memory. In
>> general, the documents will be very small, but there will occasionally be
>> some rather large ones, potentially several megabytes in size. Definitely
>> better to not make the coordinator hold on to that memory while it waits
>> for other requests to come back.
>>  Robert
>>  On Oct 4, 2014, at 8:34 AM, DuyHai Doan <> wrote:
>>  Definitely 50 concurrent queries, possibly in async mode.
>>  If you're using the IN clause with 50 values, the coordinator will
>> block, waiting for 50 partitions to be fetched from different nodes (worst
>> case = 50 nodes) before responding to client. In addition to the very  high
>> latency, you'll put the stress on the coordinator memory.
>> On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille <> wrote:
>>> I have a table of small documents (less than 1K) that are often accessed
>>> together as a group. The group size is always less than 50. Which produces
>>> less load on the server, one query using an IN clause to get all 50 back
>>> together, or 50 concurrent queries? Which one is fastest?
>>> Thanks
>>> Robert

Tyler Hobbs
DataStax <>

View raw message