Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 99914B467 for ; Wed, 4 Jan 2012 11:04:45 +0000 (UTC) Received: (qmail 72973 invoked by uid 500); 4 Jan 2012 11:04:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 72463 invoked by uid 500); 4 Jan 2012 11:04:35 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 72421 invoked by uid 99); 4 Jan 2012 11:04:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2012 11:04:27 +0000 X-ASF-Spam-Status: No, hits=-2001.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jan 2012 11:04:24 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 88E17136A74 for ; Wed, 4 Jan 2012 11:03:38 +0000 (UTC) Date: Wed, 4 Jan 2012 11:03:38 +0000 (UTC) From: "Sylvain Lebresne (Updated) (JIRA)" To: commits@cassandra.apache.org Message-ID: <179058466.4192.1325675018562.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1510774082.45494.1323177105304.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (CASSANDRA-3579) AssertionError in hintedhandoff - 1.0.5 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-3579: ---------------------------------------- Attachment: 3579.patch I believe I know what is going on. This is bug with CF having gc_grace == 0 (which hints have). The problem lies in the following line of removeDeleted: {noformat} if ((c.isMarkedForDelete() && c.getLocalDeletionTime() <= gcBefore) || c.timestamp() <= cf.getMarkedForDeleteAt()) { iter.remove(); } {noformat} and more precisely the first condition. When that is executed, we have gcbefore <= now but we *can* have gcbefore == now if gc_grace == 0 (and since the resolution is the second, it's not even a very unlikely race). Now for expiring columns it is further possible that localExpirationTime == gcbefore == now. When that happens, c.isMarkedForDelete() will return false (thus the column will be kept) because this method is defined as: {noformat} public boolean isMarkedForDelete() { return (int) (System.currentTimeMillis() / 1000 ) > localExpirationTime; } {noformat} However, during the second pass, now has changed and it is possible that we have now > gcbefore. But since gcbefore == localExpirationTime, this means that isMarkedForDelete() will be true *and* getLocalDeletionTime() <= gcbefore, so the column will be considered tombstone and gcable. In other word, the current code does not respect the condition that at all time a column is considered gcable only if it is considered deleted. A rather simple fix consist in changing the condition in removeDeleted to be {noformat} if ((c.isMarkedForDelete() && c.getLocalDeletionTime() < gcBefore) || c.timestamp() <= cf.getMarkedForDeleteAt()) { iter.remove(); } {noformat} Note the strict lesser than operator for the first condition. It fixes it because since we know that we always have gcbefore <= now, localExpirationTime < gcBefore always imply that isMarkedForDelete() is true. Attached patch to make that fix (with hopefully useful comments). > AssertionError in hintedhandoff - 1.0.5 > --------------------------------------- > > Key: CASSANDRA-3579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3579 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.5 > Environment: RHEL 6.1 64 bit, 32 GB RAM, 8 GB allocated to JVM, running XFS filesystem for commit/data directories > Reporter: Ramesh Natarajan > Assignee: Sylvain Lebresne > Fix For: 1.0.7 > > Attachments: 3579.patch > > > We are running a 8 node cassandra cluster running cassandra 1.0.5. > All our CF use leveled compaction. We ran a test where we did a lot > of inserts for 3 days. After that we started to run tests where some > of the reads could ask for information that was inserted a while back. > In this scenario we are seeing this assertion error in HintedHandoff. > ERROR [HintedHandoff:3] 2011-12-05 15:42:04,324 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[HintedHandoff:3,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330) > at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:326) > ... 6 more > Caused by: java.lang.AssertionError: originally calculated column size > of 470937164 but now it is 470294247 > at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > ERROR [HintedHandoff:3] 2011-12-05 15:42:04,333 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[HintedHandoff:3,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330) > at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:326) > ... 6 more > Caused by: java.lang.AssertionError: originally calculated column size > of 470937164 but now it is 470294247 > at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > ERROR [CompactionExecutor:9931] 2011-12-05 15:42:04,333 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[CompactionExecutor:9931,1,main] > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira