Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 60228 invoked from network); 18 Mar 2011 10:42:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Mar 2011 10:42:55 -0000 Received: (qmail 96443 invoked by uid 500); 18 Mar 2011 10:42:55 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 96420 invoked by uid 500); 18 Mar 2011 10:42:55 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 96411 invoked by uid 99); 18 Mar 2011 10:42:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 10:42:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Mar 2011 10:42:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A34223AFA2A for ; Fri, 18 Mar 2011 10:42:29 +0000 (UTC) Date: Fri, 18 Mar 2011 10:42:29 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: <839144039.11415.1300444949665.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <747530390.15621.1299959099579.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (CASSANDRA-2319) Promote row index MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008392#comment-13008392 ] Sylvain Lebresne commented on CASSANDRA-2319: --------------------------------------------- bq. Agreed... my point is simply that the number of columns-per-key and the number of keys are inversely proportional: if you have more columns-per-key, you have less keys, and vice-versa. The index will grow proportionally with the total number of columns, not with the number of keys. I do not share your confidence that this is axiomatic. It is certainly not axiomatic to the data model. Anyway, that was just a remark, not a criticism of the approach. bq. Yea... the key cache as it exists does not necessarily need to change, but at some point we'll want to update it to include the improvements from this ticket. Maybe there is a misunderstanding here. I assumed that promoting the row index implied removing the row index (in the favor of a richer sstable index). And even though a first iteration of this doesn't necessary imply this removal, I'll still assume it because I believe this would be weird to keep in the long run even if we keep it in the short run. So if you don't have a row index, caching row key position as the actual key cache does will be counter-productive for any non-narrow row, since looking at the sstable index would give you closer to the column. So it would make the key cache as it exists only useful for narrow row (which makes it less useful though not useless). bq. It depends on the number of unique queries to the row, but I'm willing to bet that the number of unique queries to a row is relatively low. Take time series (which I doubt can be called a niche use case). If the start of your slice query depends on the current time, almost all the query will be unique. Or if you page on the time series and it have a reasonably high rate of inserts, then the pages will be always changing and thus will be your query. Given how long it took me to come up with those two examples (that I did personally used btw, it's not just my imagination running wild), I suspect there is a number of other similar cases. Will those be a minority of all the queries on wide rows ? I don't know, probably for some people but maybe not for others. People come up with new ways to use the Cassandra data model all the time, let's not base our reflexion on unchecked assumptions of the kind of queries people do. Is that a big deal considering that in promoting the row index we will be at 2 seeks for those case but we're already at 2 seeks on a key cache hit ? Probably not (though for the pleasure of nitpicking I'll add that the 2 seeks in the current case of a key cache hit are closer on disk that how the 2 seeks with the promoted index would be). Am I willing to not keep those case in mind because Stu is willing to bet it doesn't matter ? Certainly not (which doesn't mean I don't love you). Anyways, I'm in favor of trying this (honestly more because I believe this could make checksumming and compression easier to add than anything else). > Promote row index > ----------------- > > Key: CASSANDRA-2319 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2319 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Stu Hood > Labels: index, timeseries > Fix For: 0.8 > > > The row index contains entries for configurably sized blocks of a wide row. For a row of appreciable size, the row index ends up directing the third seek (1. index, 2. row index, 3. content) to nearby the first column of a scan. > Since the row index is always used for wide rows, and since it contains information that tells us whether or not the 3rd seek is necessary (the column range or name we are trying to slice may not exist in a given sstable), promoting the row index into the sstable index would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly, would allow sstables to be eliminated using only the index. > An example usecase that benefits greatly from this change is time series data in wide rows, where data is appended to the beginning or end of the row. Our existing compaction strategy gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended data, we would be able to eliminate wide rows using only the sstable index, rather than needing to seek into the data file to determine that it isn't interesting. For narrow rows, this change would have no effect, as they will not reach the threshold for indexing anyway. > A first cut design for this change would look very similar to the file format design proposed on #674: http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, column names clustered, and offsets clustered and delta encoded. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira