cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <>
Subject Re: Large number of row keys in query kills cluster
Date Thu, 12 Jun 2014 15:05:40 GMT
Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.

Of course response size and other factors come into play, but having a
hundred or so queries simultaneously in the pipeline from each worker
subprocess is a big help.

On Thu, Jun 12, 2014 at 10:46 AM, Jeremy Jongsma <>

> Good to know, thanks Peter. I am worried about client-to-node latency if I
> have to do 20,000 individual queries, but that makes it clearer that at
> least batching in smaller sizes is a good idea.
> On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford <>
> wrote:
>> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma <>
>> wrote:
>>> The big problem seems to have been requesting a large number of row keys
>>> combined with a large number of named columns in a query. 20K rows with 20K
>>> columns destroyed my cluster. Splitting it into slices of 100 sequential
>>> queries fixed the performance issue.
>>> When updating 20K rows at a time, I saw a different issue -
>>> BrokenPipeException from all nodes. Splitting into slices of 1000 fixed
>>> that issue.
>>> Is there any documentation on this? Obviously these limits will vary by
>>> cluster capacity, but for new users it would be great to know that you can
>>> run into problems with large queries, and how they present themselves when
>>> you hit them. The errors I saw are pretty opaque, and took me a couple days
>>> to track down.
>> The first thing that comes to mind is the Multiget section on the
>> Datastax anti-patterns page:
>> -psanford

View raw message