cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-2319) Promote row index
Date Tue, 15 Mar 2011 19:52:29 GMT


Stu Hood commented on CASSANDRA-2319:

> I think this would also mean we can remove the back-seek from sstable writing.
> In which case I am a huge fan in principle.
I suspect that this is more challenging: the back seek also writes the row length. We'd need
to move to a blocked design like the one on 674 to replace the row length with a block length
in the data file.

> This will remove as a consequence the row bloom filter
Not necessarily, but I think this ticket does highlight the fact that the column bloom filter
is ill-positioned: they can prevent the 3rd seek, but only for names queries which (I suspect)
are less likely on wide rows. Nonetheless, I can imagine wanting to do a point query for a
secondary index to determine whether a particular row matches the index, so we should probably
consider promoting it as well.

> There will be new trade-offs: either you index 'often' or the index becomes potentially
much bigger
The important thing to remember is that the distinction between columns and keys should be
very fuzzy: columns are a suffix on keys, and treating them otherwise leads to complications.
In this case, we shouldn't be holding every 128th "key" in memory, but instead every 128th-512th
_tuple_: that way wide rows are handled naturally.

This also normalizes our indexing: the size of your index depends on the _total_ number of
columns, instead of on the width and number of your rows.

> Promote row index
> -----------------
>                 Key: CASSANDRA-2319
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>              Labels: index, timeseries
>             Fix For: 0.8
> The row index contains entries for configurably sized blocks of a wide row. For a row
of appreciable size, the row index ends up directing the third seek (1. index, 2. row index,
3. content) to nearby the first column of a scan.
> Since the row index is always used for wide rows, and since it contains information that
tells us whether or not the 3rd seek is necessary (the column range or name we are trying
to slice may not exist in a given sstable), promoting the row index into the sstable index
would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly,
would allow sstables to be eliminated using only the index.
> An example usecase that benefits greatly from this change is time series data in wide
rows, where data is appended to the beginning or end of the row. Our existing compaction strategy
gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended
data, we would be able to eliminate wide rows using only the sstable index, rather than needing
to seek into the data file to determine that it isn't interesting. For narrow rows, this change
would have no effect, as they will not reach the threshold for indexing anyway.
> A first cut design for this change would look very similar to the file format design
proposed on #674: row keys clustered,
column names clustered, and offsets clustered and delta encoded.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message