cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: replication with large rows
Date Tue, 04 May 2010 04:31:44 GMT
replication in Cassandra is per-operation, not per-row

On Mon, May 3, 2010 at 2:40 PM, Lee Parker <lee@socialagency.com> wrote:
> I have a CF on our cluster which has several rows with 200k+ columns of
> TimeUUID data.  I have noticed recently that this CF is reaching my memtable
> thresholds (128M or 1.5 mill obj) far more frequently than I would expect
> (every 10 minutes or so).  This CF is used as an index of items in another
> CF.  So, all of the columns only have a single value, but there are lots of
> them.  In the other CF, the rows all have about 10-15 columns, but there are
> millions of rows.  I have reviewed our code several times and cannot see
> where we would be writing millions of columns to the index CF with this kind
> of frequency.  Could this be caused by the replication of data between
> nodes?  When one node has new data for a row, does it pass the entire row to
> the other nodes for replication or does it just pass the portion of the row
> that has changed? I have two nodes with a replication factor of 2.  In the
> end, this is causing both of my servers to constantly work on compacting the
> files for the index CF.
>
> Lee Parker



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message