cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3581) Optimize RangeSlice operations for append-mostly use cases
Date Tue, 06 Dec 2011 23:48:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163978#comment-13163978
] 

Rick Branson edited comment on CASSANDRA-3581 at 12/6/11 11:47 PM:
-------------------------------------------------------------------

{quote}
I don't see any drawback to putting this [the minimum/maximum column names] in the metadata/statistics
component, which would keep backwards compatibility headaches down.
{quote}

This is just my naiveté showing through as I wasn't aware of that component. From a conceptual
perspective, any metadata storage for the SSTable would work, and since it's purely an optional
optimization, this makes sense.

{quote}
Right, and the problem with that is you can't know if the row has a tombstone without looking
up the row and reading its header, which is a large part of the overhead of reading the entire
row. So unless we also add a "sstable contains row tombstones" flag to our metadata we're
screwed.

Tracking that flag is not a problem per se, but it would narrow the usefulness of the optimization
significantly if it can only be applied if there have been no row deletes in the entire sstable.
{quote}

Nullifying the minimum & maximum column name fields has the effect of flagging the SSTable
as containing row tombstones.
                
      was (Author: rbranson):
    {quote}
I don't see any drawback to putting this [the minimum/maximum column names] in the metadata/statistics
component, which would keep backwards compatibility headaches down.
{/quote}

This is just my naiveté showing through as I wasn't aware of that component. From a conceptual
perspective, any metadata storage for the SSTable would work, and since it's purely an optional
optimization, this makes sense.

{quote}
Right, and the problem with that is you can't know if the row has a tombstone without looking
up the row and reading its header, which is a large part of the overhead of reading the entire
row. So unless we also add a "sstable contains row tombstones" flag to our metadata we're
screwed.

Tracking that flag is not a problem per se, but it would narrow the usefulness of the optimization
significantly if it can only be applied if there have been no row deletes in the entire sstable.
{/quote}

Nullifying the minimum & maximum column name fields has the effect of flagging the SSTable
as containing row tombstones.
                  
> Optimize RangeSlice operations for append-mostly use cases
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-3581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3581
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Rick Branson
>            Assignee: Rick Branson
>            Priority: Minor
>             Fix For: 1.1
>
>
> Currently, to perform a slice or count with a SliceRange, all of the SSTables containing
the requested row must be interrogated to determine if they contain matching column names.
SliceRange operations on wide rows which have columns distributed across many SSTable files
can turn into a relatively expensive operation involving many disk seeks. On time-series use
cases such as the one highlighted below, most of these I/O operations end up just eliminating
most of the SSTables.
> This optimization would require two values to be added to the SSTable header: the minimum
and maximum column names (according to the CF comparator) across all rows (including tombstones)
within the SSTable. For SliceRange operations, SSTables containing rows with column names
entirely outside of the SliceRange would be completely eliminated without even a single disk
operation.
> Rationale: a very common use case for Cassandra is to use a column family to store time-series
data with a row for each metric and a column for each data point with the column name being
a TimeUUID. Data is typically read with a bounded time range using a SliceRange. For the described
use case, any given SSTable within this ColumnFamily will have a tightly bound range of minimum
and maximum column names across all rows, and there will be little overlap of these column
name ranges across different SSTable files. Append-mostly column families with serial column
names (as ordered by the comparator) on which SliceRange operations are used can benefit from
this optimization, and the cost to use cases that do not fall within this group range from
negligible to non-existant.
> Caveat: even just one row tombstone would throw this off completely. From what I can
tell, there's no way to skip an SSTable that contains a row tombstone, and there is also no
current way to segregate tombstones. Stu had some interesting ideas in CASSANDRA-2498 about
segregating tombstones to separate SSTables, but that's for a later time. The light at the
end of the tunnel is that users which benefit from this optimization either do not perform
deletes or do them in large batches. These same users would also be able to use slice tombstones
instead of row tombstones to preverse the optimized behavior. A full row tombstone would nullify
the minimum/maximum values, indicating that the optimization can't be used.
> Question for the audience: should there be some kind of cap to the size of the min/max
column names kept in the header to keep the internal bearings greased and everyone honest?
Something like 256 bytes seems reasonable to me, and we just disable the optimization if the
column name size exceeds this limit. Is there a way we could, say, store only the most significant
32 bytes for each end of the name range? I can't think of any.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message