cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2319) Promote row index
Date Mon, 12 Mar 2012 13:46:42 GMT


Sylvain Lebresne commented on CASSANDRA-2319:

Pushed a new on the github branch to address following remarks:

* ColumnIndexer
** Consider renaming ColumnIndexer -> ColumnIndex, and moving all creation logic into factory
functions... ColumnIndexer.Result is redundant
** Will creating a BloomFilter with estimated size 0 do the right thing here? Perhaps explicitly
ask for the empty bloom filter?

Agreed, that much cleaner, done.

bq. Consider adding an EMPTY/LIVE singleton DeletionInfo

Not really related to this patch but good idea, done.

bq. the null checks in create() probably mean create() should be two overloaded methods

We could be truth is create() is only ever called from one place where it takes all it's argument.
In fact the null check should never be trigger in current code, but I figured it doesn't cost
much to code defensively in case someone wants to use null instead of an empty index.

bq. the if (dis instanceof FileDataInput) skipFully logic should be static somewhere

Right, that was copy-pasted code from the old ColumnIndexer but in fact FileUtils.skipBytesFully()
already does the right thing so I just remove the whole {{if}} using that.

bq. This class is probably in the wrong package

I've put it in db because the old ColumnIndexer (whose RowIndexEntry inherit some of the code)
was there too. Overall the db package already has sstable related stuffs so I don't care too
much and feel lazy to change it at the time I'm writing this. But I may move it during commit
if I remember to do it.

* IndexedSliceReader
file = originalInput == null ? sstable.getFileDataInput(positionToSeek) : originalInput;;
could probably be cleaner
I'm open to suggestions :)

bq. sstable.decodeKey(ByteBufferUtil.readWithShortLength(file)); -> ByteBufferUtil.skipShortLength


bq. When we decide not to store metadata in the file, the datafile is no longer enough to
recover the data in case of lost indexes, which is probably fine if streaming has headed in
a direction where we always send the index as well. Once we've passed that point of no-return,
we could also move metadata out for narrow rows and consider removing the key from the data
file row header, leaving the datafile containing nothing but columns (minor compression bonus)

The thing is, we don't yet send the index while streaming. Even with this patch, the data
has still enough data to recompute the index file. And truth is, I think it's a nice property
that we should probably maintain unless we have a good reason not too (it is true that with
compression, the data file alone is not enough as you need to know what is the compression
and the block size, but those are property of the CF even if we loose the compression component).

> Promote row index
> -----------------
>                 Key: CASSANDRA-2319
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Sylvain Lebresne
>              Labels: index, timeseries
>             Fix For: 1.2
>         Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, version-g-lzf.txt,
> The row index contains entries for configurably sized blocks of a wide row. For a row
of appreciable size, the row index ends up directing the third seek (1. index, 2. row index,
3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains information that
tells us whether or not the 3rd seek is necessary (the column range or name we are trying
to slice may not exist in a given sstable), promoting the row index into the sstable index
would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly,
would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data in wide
rows, where data is appended to the beginning or end of the row. Our existing compaction strategy
gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended
data, we would be able to eliminate wide rows using only the sstable index, rather than needing
to seek into the data file to determine that it isn't interesting. For narrow rows, this change
would have no effect, as they will not reach the threshold for indexing anyway.
> A first cut design for this change would look very similar to the file format design
proposed on #674: row keys clustered,
column names clustered, and offsets clustered and delta encoded.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message