incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Large data files and no "edit in place"?
Date Tue, 30 Mar 2010 18:33:04 GMT
Cassandra does "minor" compactions with a minimum of 4 sstables in the
same "bucket," with buckets doubling in size as you compact.  So you
only ever rewrite all data in your weekly-ish major compaction for
tombstone cleanup and anti entropy.


On Tue, Mar 30, 2010 at 12:54 AM, Julian Simon <> wrote:
> Forgive me as I'm probably a little out of my depth in trying to
> assess this particular design choice within Cassandra, but...
> My understanding is that Cassandra never updates data "in place" on
> disk - instead it completely re-creates the data files during a
> "flush".  Stop me if I'm wrong already ;-)
> So imagine we have a large data set in our ColumnFamily and we're
> constantly adding data to it.
> Every [x] minutes or [y] bytes, the compaction process is triggered,
> and the entire data set is written to disk.
> So as our data set grows over time, the compaction process will result
> in an increasingly large IO operation to write all that data to disk
> each time.
> We could easily be talking about single data files in the
> many-gigabyte size range, no?  Or is there a file size limit that I'm
> not aware of?
> If not, is this an efficient approach to take for large data sets?
> Seems like we would become awfully IO bound, writing the entire thing
> from scratch each time.
> Do let me know if I've gotten it all wrong ;-)
> Cheers,
> Jules

View raw message