Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3A1C1102F for ; Tue, 13 May 2014 21:11:18 +0000 (UTC) Received: (qmail 47115 invoked by uid 500); 13 May 2014 21:11:18 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 47077 invoked by uid 500); 13 May 2014 21:11:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 47069 invoked by uid 99); 13 May 2014 21:11:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 May 2014 21:11:18 +0000 Date: Tue, 13 May 2014 21:11:18 +0000 (UTC) From: "Rick Branson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996899#comment-13996899 ] Rick Branson commented on CASSANDRA-6285: ----------------------------------------- I did some more digging around on our cluster that was running 2.0.6 when it saw the corruption: it took anywhere from a few hours to 48 hours for the first compaction with the out of order key exception to throw. These nodes are receiving thousands of writes per second, so it's not going to be trivially reproducible. We've been running one of the nodes with 2.0.8-tenative + enable_reallocate_buffers.txt and will report back once we've reached 72 hours and are comfortable rolling this out wide to our own clusters. > 2.0 HSHA server introduces corrupt data > --------------------------------------- > > Key: CASSANDRA-6285 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6285 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2 > Reporter: David Sauer > Assignee: Pavel Yaskevich > Priority: Critical > Fix For: 2.0.8 > > Attachments: 6285_testnotes1.txt, CASSANDRA-6285-disruptor-heap.patch, cassandra-attack-src.zip, compaction_test.py, disruptor-high-cpu.patch, disruptor-memory-corruption.patch, enable_reallocate_buffers.txt > > > After altering everything to LCS the table OpsCenter.rollups60 amd one other none OpsCenter-Table got stuck with everything hanging around in L0. > The compaction started and ran until the logs showed this: > ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime] > java.lang.RuntimeException: Last written key DecoratedKey(1326283851463420237, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564) >= current key DecoratedKey(954210699457429663, 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f) writing into /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db > at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141) > at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164) > at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160) > at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) > at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > at org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Moving back to STC worked to keep the compactions running. > Especialy my own Table i would like to move to LCS. > After a major compaction with STC the move to LCS fails with the same Exception. -- This message was sent by Atlassian JIRA (v6.2#6252)