Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 7 Dec 2011 01:46:40 +0000 (UTC)
From: "Rick Branson (Commented) (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: 
 <413888285.47935.1323222400269.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1488060733.47490.1323214359979.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (CASSANDRA-3581) Optimize RangeSlice operations
 for append-mostly use cases
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164051#comment-13164051 ] 

Rick Branson commented on CASSANDRA-3581:
-----------------------------------------

{quote}Okay, but my main concern isn't w/ how to implement that but that it probably tips the balance to "not worth bothering with the complexity and overhead of sorting sstables by min/max column name and doing the pruning dance" if we can apply it in such a small number of cases.{quote}

On the query side it just seems like a matter of adding a simple check against the QueryFilter to the loop in RowIteratorFactory.getIterator that builds the SSTableScanner list.

If the concern is the row tombstone killer, perhaps we should solicit feedback from time-series users on how the perform deletes?
                
> Optimize RangeSlice operations for append-mostly use cases
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-3581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3581
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Rick Branson
>            Assignee: Rick Branson
>            Priority: Minor
>             Fix For: 1.1
>
>
> Currently, to perform a slice or count with a SliceRange, all of the SSTables containing the requested row must be interrogated to determine if they contain matching column names. SliceRange operations on wide rows which have columns distributed across many SSTable files can turn into a relatively expensive operation involving many disk seeks. On time-series use cases such as the one highlighted below, most of these I/O operations end up just eliminating most of the SSTables.
> This optimization would require two values to be added to the SSTable header: the minimum and maximum column names (according to the CF comparator) across all rows (including tombstones) within the SSTable. For SliceRange operations, SSTables containing rows with column names entirely outside of the SliceRange would be completely eliminated without even a single disk operation.
> Rationale: a very common use case for Cassandra is to use a column family to store time-series data with a row for each metric and a column for each data point with the column name being a TimeUUID. Data is typically read with a bounded time range using a SliceRange. For the described use case, any given SSTable within this ColumnFamily will have a tightly bound range of minimum and maximum column names across all rows, and there will be little overlap of these column name ranges across different SSTable files. Append-mostly column families with serial column names (as ordered by the comparator) on which SliceRange operations are used can benefit from this optimization, and the cost to use cases that do not fall within this group range from negligible to non-existant.
> Caveat: even just one row tombstone would throw this off completely. From what I can tell, there's no way to skip an SSTable that contains a row tombstone, and there is also no current way to segregate tombstones. Stu had some interesting ideas in CASSANDRA-2498 about segregating tombstones to separate SSTables, but that's for a later time. The light at the end of the tunnel is that users which benefit from this optimization either do not perform deletes or do them in large batches. These same users would also be able to use slice tombstones instead of row tombstones to preverse the optimized behavior. A full row tombstone would nullify the minimum/maximum values, indicating that the optimization can't be used.
> Question for the audience: should there be some kind of cap to the size of the min/max column names kept in the header to keep the internal bearings greased and everyone honest? Something like 256 bytes seems reasonable to me, and we just disable the optimization if the column name size exceeds this limit. Is there a way we could, say, store only the most significant 32 bytes for each end of the name range? I can't think of any.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira