Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 96085 invoked from network); 19 May 2010 07:22:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 May 2010 07:22:20 -0000 Received: (qmail 29942 invoked by uid 500); 19 May 2010 07:22:20 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 29929 invoked by uid 500); 19 May 2010 07:22:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 29921 invoked by uid 99); 19 May 2010 07:22:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 07:22:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 07:22:18 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4J7LuuK007093 for ; Wed, 19 May 2010 07:21:56 GMT Message-ID: <2630421.8141274253716579.JavaMail.jira@thor> Date: Wed, 19 May 2010 03:21:56 -0400 (EDT) From: "Stu Hood (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Updated: (CASSANDRA-1092) Add Slice API, and replace CF and SC for compaction reads In-Reply-To: <24412931.29451273793624622.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-1092: -------------------------------- Attachment: (was: 0003-Refactor-Scanner-interface-into-filtering-and-filter.patch) > Add Slice API, and replace CF and SC for compaction reads > --------------------------------------------------------- > > Key: CASSANDRA-1092 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1092 > Project: Cassandra > Issue Type: Sub-task > Components: Core > Reporter: Stu Hood > Priority: Critical > Fix For: 0.8 > > Attachments: 0001-Add-Slice-and-ColumnKey.patch > > > Currently, we have two read paths for fetching Columns from disk: the io.sstable.SSTableScanner interface, and the db.filter.SSTable*Iterator interfaces. The latter is intended for iterating over the IColumns contained in a single row, while the former iterates over entire rows at once (although SSTableScanner supports returning a db.filter implementation per row). > While this separation has allowed for highly optimized pushdown filtering in the db.filter classes, the lack of abstraction makes it impossible to reason about changes to the file format, and depends on random access into the file. Additionally, the separation of 'row iteration' from 'icolumn iteration' ignores the fact that super columns contain an additional level of columns that could be iterated. Rather than introducing a third level of iterators that deals with iterating over subcolumns, a unified interface for iterating over arbitrarily nested columns would clarify the code, and open the door to many interesting possibilities (see CASSANDRA-998). > This ticket deals with implementing an initial cut of the unified interface, which reuses the "Scanner" name. The org.apache.cassandra.Scanner interface is essentially an extended iterator, which is further enhanced by org.apache.cassandra.SeekableScanner to add operations that reposition the iterator. By the end of CASSANDRA-998, SeekableScanner will have implementations for the Memtable and SSTables, allowing for uniform iteration of all sources. > The object that a Scanner iterates over is org.apache.cassandra.Slice, which is immutable, and contains parent deletion Metadata (markedForDeleteAt/localDeletionTime: like a ColumnFamily or SuperColumn). Since only the highest markedForDeleteAt or localDeletionTime matters for nested columns, Slices simplify storage of this data by storing a single value for all parents. The Metadata in a Slice is bounded at each end by a org.apache.cassandra.db.ColumnKey, which is a compound key representing the full path to a column, or a parent boundary. > The ColumnKeys in a Slice make it possible to delete column name ranges. By convention (in this patch), the ColumnKeys in a Slice always share parents. In the future, if we wanted to support range deletes for rows or supercolumns, it would be trivial to remove that assumption. > SSTables and Memtables can be abstracted into "sorted lists of Slices" which are individually non-intersecting. Client reads and compactions can use org.apache.cassandra.SliceMergingIterator to merge the Slices from multiple Scanners into a new Scanner which is globally non-intersecting. This process will be at the heart of any read from a ColumnFamilyStore by the end of 998, but this issue only uses SliceMergingIterator at the core of compaction, by making CompactionIterator a subclass of SliceMergingIterator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.