incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Batch mutation streaming
Date Mon, 10 Dec 2012 00:47:38 GMT
> (and if the message is being decoded on the server site as a complete message, then presumably
the same resident memory consumption applies there too).
Yerp. 
And every row mutation in your batch becomes a task in the Mutation thread pool. If one replica
gets 500 row mutations from one client request it will take a while for the (default) 32 threads
to chew through them. While this is going on other client request will be effectively blocked.


Depending on the number of clients, I would start with say 50 rows per mutation and keep and
eye of the *request* latency. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/12/2012, at 7:18 AM, Ben Hood <0x6e6562@gmail.com> wrote:

> Thanks for the clarification Andrey. If that is the case, I had better ensure that I
don't put the entire contents of a very long input stream into a single batch, since that
is presumably going to cause a very large message to accumulate on the client side (and if
the message is being decoded on the server site as a complete message, then presumably the
same resident memory consumption applies there too).
> 
> Cheers,
> 
> 
> Ben
> 
> On Dec 7, 2012, at 17:24, Andrey Ilinykh <ailinykh@gmail.com> wrote:
> 
>> Cassandra uses thrift messages to pass data to and from server. A batch is just a
convenient way to create such message. Nothing happens until you send this message. Probably,
this is what you call "close the batch".
>> 
>> Thank you,
>>   Andrey
>> 
>> 
>> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x6e6562@gmail.com> wrote:
>> Hi,
>> 
>> I'd like my app to stream a large number of events into Cassandra that originate
from the same network input stream. If I create one batch mutation, can I just keep appending
events to the Cassandra batch until I'm done, or are there some practical considerations about
doing this (e.g. too much stuff buffering up on the client or server side, visibility of the
data within the batch that hasn't been closed by the client yet)? Barring any discussion about
atomicity, if I were able to stream a largish source into Cassandra, what would happen if
the client crashed and didn't close the batch? Or is this kind of thing just a normal occurrence
that Cassandra has to be aware of anyway?
>> 
>> Cheers,
>> 
>> Ben
>> 


Mime
View raw message