Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4AE899C47 for ; Fri, 27 Jan 2012 13:41:04 +0000 (UTC) Received: (qmail 66345 invoked by uid 500); 27 Jan 2012 13:41:04 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 66260 invoked by uid 500); 27 Jan 2012 13:41:03 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 66252 invoked by uid 99); 27 Jan 2012 13:41:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2012 13:41:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jan 2012 13:41:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C7D82166C29 for ; Fri, 27 Jan 2012 13:40:39 +0000 (UTC) Date: Fri, 27 Jan 2012 13:40:39 +0000 (UTC) From: "Dominic Williams (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <656123745.85636.1327671639820.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1836432411.46318.1326753098735.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3748) Range ghosts don't disappear as expected and accumulate MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194704#comment-13194704 ] Dominic Williams commented on CASSANDRA-3748: --------------------------------------------- Just an update - the range ghosts are still accumulating. It would appear these deleted rows are never being compacted away. Potentially a very serious bug (or system is only still running because of the caching layer) > Range ghosts don't disappear as expected and accumulate > ------------------------------------------------------- > > Key: CASSANDRA-3748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3748 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.3 > Environment: Cassandra on Debian > Reporter: Dominic Williams > Labels: compaction, ghost-row, range, remove > Fix For: 1.0.8 > > Original Estimate: 6h > Remaining Estimate: 6h > > I have a problem where range ghosts are accumulating and cannot be removed by reducing GCSeconds and compacting. > In our system, we have some cfs that represent "markets" where each row represents an item. Once an item is sold, it is removed from the market by passing its key to remove(). > The problem, which was hidden for some time by caching, is appearing on read. Every few seconds our system collates a random sample from each cf/market by choosing a random starting point: > String startKey = RNG.nextUUID()) > and then loading a page range of rows, specifying the key range as: > KeyRange keyRange = new KeyRange(pageSize); > keyRange.setStart_key(startKey); > keyRange.setEnd_key(maxKey); > The returned rows are iterated over, and ghosts ignored. If insufficient rows are obtained, the process is repeated using the key of the last row as the starting key (or wrapping if necessary etc). > When performance was lagging, we did a test and found that constructing a random sample of 40 items (rows) involved iterating over hundreds of thousands of ghost rows. > Our first attempt to deal with this was to halve our GCGraceSeconds and then perform major compactions. However, this had no effect on the number of ghost rows being returned. Furthermore, on examination it seems clear that the number of ghost rows being created within GCSeconds window must be smaller than the number being returned. Thus looks like a bug. > We are using Cassandra 1.0.3 with Sylain's patch from CASSANDRA-3510 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira