Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55C7189D8 for ; Mon, 22 Aug 2011 14:38:56 +0000 (UTC) Received: (qmail 73642 invoked by uid 500); 22 Aug 2011 14:38:56 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 73545 invoked by uid 500); 22 Aug 2011 14:38:54 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 73537 invoked by uid 99); 22 Aug 2011 14:38:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 14:38:54 +0000 X-ASF-Spam-Status: No, hits=-2000.9 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 14:38:52 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 12630AA424 for ; Mon, 22 Aug 2011 14:38:31 +0000 (UTC) Date: Mon, 22 Aug 2011 14:38:31 +0000 (UTC) From: "Ryan King (JIRA)" To: commits@cassandra.apache.org Message-ID: <1037718494.1256.1314023911072.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <747530390.15621.1299959099579.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-2319) Promote row index MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088736#comment-13088736 ] Ryan King commented on CASSANDRA-2319: -------------------------------------- I haven't followed that ticket closely, but I think the answer is yes. For wide row use cases this patch lets you eliminate SStables with only the info in the index (because we know what range(s) of columns for a row are in that file). > Promote row index > ----------------- > > Key: CASSANDRA-2319 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2319 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Stu Hood > Labels: compression, index, timeseries > Fix For: 1.0 > > Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, version-g-lzf.txt, version-g.txt > > > The row index contains entries for configurably sized blocks of a wide row. For a row of appreciable size, the row index ends up directing the third seek (1. index, 2. row index, 3. content) to nearby the first column of a scan. > Since the row index is always used for wide rows, and since it contains information that tells us whether or not the 3rd seek is necessary (the column range or name we are trying to slice may not exist in a given sstable), promoting the row index into the sstable index would allow us to drop the maximum number of seeks for wide rows back to 2, and, more importantly, would allow sstables to be eliminated using only the index. > An example usecase that benefits greatly from this change is time series data in wide rows, where data is appended to the beginning or end of the row. Our existing compaction strategy gets lucky and clusters the oldest data in the oldest sstables: for queries to recently appended data, we would be able to eliminate wide rows using only the sstable index, rather than needing to seek into the data file to determine that it isn't interesting. For narrow rows, this change would have no effect, as they will not reach the threshold for indexing anyway. > A first cut design for this change would look very similar to the file format design proposed on #674: http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, column names clustered, and offsets clustered and delta encoded. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira