incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <...@mit.edu>
Subject Re: pycassa failures in large batch cycling
Date Fri, 17 May 2013 21:31:48 GMT
> IMHO you are going to have more success breaking up your work load to 
> work with the current settings.  The buffers created by thrift are going 
> to eat up the server side memory. They grow dynamically but persist for 
> the life of the connection. 

Amen to that.  Already refactoring our workload to minimize record sizes.

Smaller fields means more of them, so batched inserts are even more useful 
compared to many unbatched inserts.

IMO there is still a serious bug: even with smaller individual records, it 
is trivially easy to put too many small records into a batch_mutate. 
Right now, clients like pycassa, and I imagine others, are forced into an 
infinite retry loop under the hood because the thrift exception is 
indistinguishable from the server crashing --- the application layer has 
no recourse.

I'd love to see a work around that still has the benefit of "grouping 
together" many inserts.


John
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message