cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-223) time-based slicing does not work correctly w/ "historial" memtables
Date Mon, 22 Jun 2009 16:06:07 GMT


Jonathan Ellis commented on CASSANDRA-223:

I came to the same conclusion.

One partial answer to the files-to-read is to change compaction to guarantee log(n) sstable
files instead of the current ad-hoc behavior, where n is the maximum sstable "generation"
number.  (Where "generation" is the number of compactions done.)

For each CF, when you flush, you compact until there is nothing already at the same generation
to compact with.  For example,

flush 1: nothing to merge.  memtable becomes sstable-gen0
flush 2: there is already a sstable-gen0 so you merge.  now you have sstable-gen1
flush 3: no gen0, so you store there.  now you have sstable-gen0, sstable-gen1
flush 4: 0 and 1 exist, so you compact (with the new one) to sstable-gen2


Generation tracking can be done in the sstable filename.

> time-based slicing does not work correctly w/ "historial" memtables
> -------------------------------------------------------------------
>                 Key: CASSANDRA-223
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>         Attachments: 223.patch
> TimeFilter assumes that it is done as soon as it finds a column stamped earlier than
what it is filtering on, but when you have a group of "historical" memtables whose columns
were written in an arbitrary order this is not a safe assumption.
> It is not even a safe assumption when dealing with a single memtable + sstable pair,
as the attached new test shows.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message