cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2319) Promote row index
Date Wed, 07 Mar 2012 09:42:59 GMT


Sylvain Lebresne commented on CASSANDRA-2319:

I've put a version of this issue at
(against current trunk). Contrarily to the previously attached patches, this doesn't change
the file format much. It pretty literally do what the issue title said: it promotes the columns
index from the data file to the index file. Note that the patch is split in 3 commits that
have some form of logical separation but the code only compile with all 3 commits.

So this remove the column index and bloom filter from the row header in the data file and
move them in the index file along with the (key,position) pair. There is a number of choices/details
worth mentioning:
* Only wide rows have a column index and bloom filter. So one difference with the current
implementation is that skinny rows have no column bloom filter. I figure that it's probably
not worth the space in the index file in that latter case (but I'm fine discussing that point)
* The key cache now keeps the whole information from the index file for a given row. This
means that for wide rows, column index and bf are cached along with the position. Which is
imo a good thing, but does mean the size of a key cache entry is not constant anymore (The
estimation of the key cache memory size will have to be modified accordingly but the current
patch don't do it).
* For wide rows, the index entry also ship with the row deletion times. This is necessary
since we won't seek at the beginning of the row anymore.
* In the column indexes, offsets are relating to the beginning of the row in the data file
rather than from the beginning of the index as is the case now.

Some other implementation points:
* EchoedRow is removed. It would be possible to echo rows following this patch but we would
need to echo the column index too so that felt complicated enough that it could be left to
a later ticket if we consider it worth it.
* I didn't found a non overly complicated/inefficient way to implement this patch without
using seek() instead of just file marks. So in particular MappedFileDataInput gets a seek()
method, even though that method throw an exception if we seek outside the segment (which should
never happen).

I did a short (and honestly not very scientific) benchmark of a time series like workload
with a number of thread inserting time series columns in a bunch of rows and other threads
reading the tail of those rows (as expected, the performance degrades with more sstables added
and improve with compaction). As soon as more than more than 1 sstable was present, the performance
with this patch was around 30-40% better than without the patch.  I'll note that the test
was very short and with everything on local host, so again the exact benefits may vary, but
the ability to discard sstables based on index infos (saving a seek) seems to be a clear boost
in that case.

I didn't saw any noticeable difference (neither good or bad) on a normal stress, as should
be expected.

Note that this patch paves the way to removing the two phases compaction of LazilyCompactedRow,
but that is left to a follow up ticket.
> Promote row index
> -----------------
>                 Key: CASSANDRA-2319
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>              Labels: compression, index, timeseries
>         Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, version-g-lzf.txt,
> The row index contains entries for configurably sized blocks of a wide row. For a row
of appreciable size, the row index ends up directing the third seek (1. index, 2. row index,
3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains information that
tells us whether or not the 3rd seek is necessary (the column range or name we are trying
to slice may not exist in a given sstable), promoting the row index into the sstable index
would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly,
would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data in wide
rows, where data is appended to the beginning or end of the row. Our existing compaction strategy
gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended
data, we would be able to eliminate wide rows using only the sstable index, rather than needing
to seek into the data file to determine that it isn't interesting. For narrow rows, this change
would have no effect, as they will not reach the threshold for indexing anyway.
> A first cut design for this change would look very similar to the file format design
proposed on #674: row keys clustered,
column names clustered, and offsets clustered and delta encoded.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message