cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Rao (JIRA)" <>
Subject [jira] Updated: (CASSANDRA-172) A improved and more general version of get_slice
Date Mon, 18 May 2009 17:38:45 GMT


Jun Rao updated CASSANDRA-172:

    Attachment: get_slice_from.patchv2

Attache patch v2.

1. Renamed <start, count> to <offset, limit>. Also set the default value of offset
to 0 since that is likely the common usage. In general, the offset parameter allows one to
efficiently implement scrolling to an arbitrary positition of the returned results. This kind
of functionality is more efficient if provided on the server side.

3. On second thought, I don't think the descending ordering necessarily introduces extra disk
seeks. This is because each SSTable is always accessed one block at a time, independent of
the ordering. A block is always asked sequentially and therfore, there is likely just one
seek for the whole block. Across blocks, there are likely seeks, again independent of the
ordering. This is because we access all SSTables interleavingly. So, keep the ordering option
since it makes this api more general.

4. Switched from CF iterator to Column iterator and reused the CF template for all columns

5. ColumnSliceBlockReader still returns CF + columns. The CF deserialization overhead is amortized
among a block of columns and shouldn't be too much. This api can stay simple this way.

Addressed the rest of the comments and rebased the code.

> A improved and more general version of get_slice
> ------------------------------------------------
>                 Key: CASSANDRA-172
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.4
>         Attachments: get_slice_from.patchv1, get_slice_from.patchv2
> Today, get_slice has to scan through all columns in every memtable and sstable to get
a slice of columns. This becomes inefficient when the number of columns in a row is large.
We need a more efficient API.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message