Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F39F7A00 for ; Tue, 15 Nov 2011 20:00:16 +0000 (UTC) Received: (qmail 75349 invoked by uid 500); 15 Nov 2011 20:00:16 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 75328 invoked by uid 500); 15 Nov 2011 20:00:16 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 75320 invoked by uid 99); 15 Nov 2011 20:00:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 20:00:16 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Nov 2011 20:00:11 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CF9668626D for ; Tue, 15 Nov 2011 19:59:51 +0000 (UTC) Date: Tue, 15 Nov 2011 19:59:51 +0000 (UTC) From: "Jonathan Ellis (Reopened) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1972269493.32250.1321387191851.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1775712870.66155.1303179425714.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Reopened] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding") MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reopened CASSANDRA-2503: --------------------------------------- > Eagerly re-write data at read time ("superseding") > -------------------------------------------------- > > Key: CASSANDRA-2503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2503 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Stu Hood > Assignee: Jonathan Ellis > Labels: compaction, performance > Fix For: 1.1 > > Attachments: 2503-v2.txt, 2503.txt > > > Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary. > Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted). > Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353]. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira