cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Could table partitioning be implemented using a customer compaction strategy?
Date Fri, 15 Aug 2014 04:28:52 GMT
We use log structured tables to hold logs for analysis.

It's basically append only, and immutable.  Every record has a timestamp
for each record inserted.

Having this in ONE big monolithic table can be problematic.

1.  compactions have to compact old data that might not even be used often.

2.  it might be nice to not have the old data touched on disk so that your
can just use it for map reduce.  Being able to fadvise away old data so
that it's not in cache can be valuable.

3.  the ability to drop large chunks of old data is also useful .  For
example, if you run out of disk space, you can just drop the oldest day's
worth of data without having to use tombstones.

MySQL has a somewhat decent partition engine:

It seems like this come be easily implemented using a custom compaction

Essentially, you would take each SSTable and first group them into
partitions.  So if you were using day partitions, you could take all
SSTables for that day, and then use another , nested compaction strategy,
like leveled, on just those SSTables.

The older days would yield one SSTable per day, once all the individual
SSTables are compacted.   For a month, you would need a minimum of 30

You would need to implement some custom ways to prune the older partitions.
  And you'd also need some way to define the partitions.

… but maybe an initial prototype could just read from a configuration file,
or another system table which defines them…

(just thinking out loud)



Location: *San Francisco, CA*
… or check out my Google+ profile

View raw message