incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: pycassa failures in large batch cycling
Date Tue, 14 May 2013 04:48:14 GMT
> After several cycles, pycassa starts getting connection failures.
Do you have the error stack ?
Are the TimedOutExceptions or socket time outs or something else.

> Would things be any different if we used multiple nodes and scaled the data and worker
count to match?  I mean, is there something inherent to cassandra's operating model that makes
it want to always have multiple nodes?
It's not expected. If I had to guess I would say the 100MB rows are causing some GC problems
(check the cass log) and you are getting timeouts from that. 

As a work around what happens when you reduce the number of workers ? 

Consider smoothing out the row size by chunking them into several rows, see the Astynax client
recipes for a design pattern. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/05/2013, at 1:09 PM, John R. Frank <jrf@mit.edu> wrote:

> C* users,
> 
> We have a process that loads a large batch of rows from Cassandra into many separate
compute workers.  The rows are one-column wide and range in size for a couple KB to ~100 MB.
 After manipulating the data for a while, each compute worker writes the data back with *new*
row keys computed by the workers (UUIDs).
> 
> After the full batch is written back to new rows, a cleanup worker deletes the old rows.
> 
> After several cycles, pycassa starts getting connection failures.
> 
> Should we use a pycassa listener to catch these failures and just recreate the ConnectionPool
and keep going as if the connection had not dropped? Or is there a better approach?
> 
> These failures happen on just a simple single-node setup with a total data set less than
half the size of Java heap space, e.g. 2GB data (times two for the two copies during cycling)
versus 8GB heap.  We tried reducing memtable_flush_queue_size to 2 so that it would flush
the deletes faster, and also tried multithreaded_compaction=true, but still pycassa gets connection
failures.
> 
> Is this expected before for shedding load?  Or is this unexpected?
> 
> Would things be any different if we used multiple nodes and scaled the data and worker
count to match?  I mean, is there something inherent to cassandra's operating model that makes
it want to always have multiple nodes?
> 
> Thanks for pointers,
> John


Mime
View raw message