Return-Path: Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: (qmail 83312 invoked from network); 9 Feb 2011 22:31:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Feb 2011 22:31:57 -0000 Received: (qmail 65275 invoked by uid 500); 9 Feb 2011 22:31:57 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 65218 invoked by uid 500); 9 Feb 2011 22:31:56 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 65210 invoked by uid 99); 9 Feb 2011 22:31:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 22:31:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Feb 2011 22:31:45 +0000 Received: by qwi2 with SMTP id 2so571315qwi.31 for ; Wed, 09 Feb 2011 14:31:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language; bh=g8GwvN80I8Sb25ZKgudQoROutMm01ewoUjZATf6VCLY=; b=MCDrjV63BzUIR3IjA0/3BP+Ace4fUnhfZeajWknWR48PADL6QMWgzVjLWpvA2zcSXT y535vMqrCGecjgEGgN79r4tpzqSnJU96tSwreX8vRF6VBwQ3aiWcay3N9K/FCrMgxE7G K4ha1qq5EoPBdmlPm44hcrOJw8cToXD7qg9E4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:date:message-id:mime-version:content-type:x-mailer :thread-index:content-language; b=A1aWQ1YQn+eA1h1fmMfK0BdfdmvdIfnVPIU/J8hOhaVRuoVMzcD1qmlstFiPE8cBev g0+oFnDlyj4R4zerDdPvz1txHuxjf1SqdvcqPXVQv/qxf4vfTZkfPCuBgC3DKz8Vn1Ic dRuy+lPT0owd61Kvcfiz3t3CR24eo/tTSIoNo= Received: by 10.224.2.81 with SMTP id 17mr16575199qai.366.1297290678288; Wed, 09 Feb 2011 14:31:18 -0800 (PST) Received: from DHTABLET (kik.client.acceleratorcentre.net [38.121.79.182]) by mx.google.com with ESMTPS id e29sm510337qck.15.2011.02.09.14.31.16 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 09 Feb 2011 14:31:17 -0800 (PST) From: "Dan Hendry" To: Subject: SEVERE Data Corruption Problems Date: Wed, 9 Feb 2011 17:31:07 -0500 Message-ID: <4d5315b5.1d4de50a.7aad.1c88@mx.google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0025_01CBC87F.23BE7FC0" X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcvIoVo8dSfmmlxiQO+E4tT0DjocBQAB53FQ Content-Language: en-ca X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_0025_01CBC87F.23BE7FC0 Content-Type: multipart/alternative; boundary="----=_NextPart_001_0026_01CBC87F.23BE7FC0" ------=_NextPart_001_0026_01CBC87F.23BE7FC0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit I have been having SEVERE data corruption issues with SSTables in my cluster, for one CF it was happening almost daily (I have since shut down the service using that CF as it was too much work to manage the Cassandra errors). At this point, I can't see how it is anything but a Cassandra bug yet it's somewhat strange and very scary that I am the only one who seems to be having such serious issues. Most of my data is indexed in two ways so I have been able to write a validator which goes through and back fills missing data but it's kind of defeating the whole point of Cassandra. The only way I have found to deal with issues when they crop up to prevent nodes crashing from repeated failed compactions is delete the SSTable. My cluster is running a slightly modified 0.7.0 version which logs what files errors for so that I can stop the node and delete them. The problem: - Reads, compactions and hinted handoff fail with various exceptions (samples shown at the end of this email) which seem to indicate sstable corruption. - I have seen failed reads/compactions/hinted handoff on 4 out of 4 nodes (RF=2) for 3 different super column families and 1 standard column family (4 out of 11) and just now, the Hints system CF. (if it matters the ring has not changed since one CF which has been giving me trouble was created). I have check SMART disk info and run various diagnostics and there does not seem to be any hardware issues, plus what are the chances of all four nodes having the same hardware problems at the same time when for all other purposes, they appear fine? - I have added logging which outputs what sstable are causing exceptions to be thrown. The corrupt sstables have been both freshly flushed memtables and the output of compaction (ie, 4 sstables which all seem to be fine get compacted to 1 which is then corrupt). It seems that the majority of corrupt sstables are post-compacted (vs post-memtable flush). - The one CF which was giving me the most problems was heavily written to (1000-1500 writes/second continually across the cluster). For that cf, was having to deleting 4-6 sstables a day across the cluster (and the number was going up, even the number of problems for remaining CFs is going up). The other CFs which have had corrupt sstables are also quite heavily written to (generally a few hundred writes a second across the cluster). - Most of the time (5/6 attempts) when this problem occurs, sstable2json also fails. I have however, had one case where I was able to export the sstable to json, then re-import it at which point I was no longer seeing exceptions. - The cluster has been running for a little over 2 months now, problem seems to have sprung up in the last 3-4 weeks and seems to be steadily getting worse. Ultimately, I think I am hitting some subtle race condition somewhere. I have been starting to dig into the Cassandra code but I barely know where to start looking. I realize I have not provided nearly enough information to easily debug the problem but PLEASE keep your eyes open for possibly racy or buggy code which could cause these sorts of problems. I am willing to provided full Cassandra logs and a corrupt SSTable on an individual basis: please email me and let me know. Here is possibly relevant information and my theories on a possible root cause. Again, I know little about the Cassandra code base and have only moderate java experience so these theories may be way off base. - Strictly speaking, I probably don't have enough memory for my workload. I see stop the world gc occurring ~30/day/node, often causing Cassandra to hang for 30+ seconds (according to the gc logs). Could there be some java bug where a full gc in the middle of writing or flushing (compaction/memtable flush) or doing some other disk based activity causes some sort of data corruption? - Writes are usually done at ConsistencyLevel ONE with additional client side retry logic. Given that I often see consecutive nodes in the ring down, could there be some edge condition where dying at just the right time causes parts of mutations/messages to be lost? - All of the CFs which have been causing me problems have large rows which are compacted incrementally. Could there be some problem with the incremental compaction logic? - My cluster has a fairly heavy write load (again, the most problematic CF is getting 1500 (w/s)/(RF=2) = 750 writes/second/node). Furthermore, it is highly probable that there are timestamp collisions. Could there be some issue with timestamp logic (ie, using > instead of >= or some such) during flushes/compaction? - Once a node Cluster/system information: - 4 nodes with RF=2 - Nodes have 8 cores with 24 GB of RAM a piece. - 2 HDs, 1 for commit log/system, 1 for /var/lib/cassandra/data - OS is Ubuntu 10.04 (uname -r = 2.6.32-24-server) - Java: o java version "1.6.0_22" o Java(TM) SE Runtime Environment (build 1.6.0_22-b04) o Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) - Slightly modified (file information in exceptions) version of 0.7.0 The following non-standard cassandra.yaml properties have been changed: - commitlog_sync_period_in_ms: 100 (with commitlog_sync: periodic) - disk_access_mode: mmap_index_only - concurrent_reads: 12 - concurrent_writes: 2 (was 32, but I dropped it to 2 to try and eliminate any mutation race conditions - did not seem to help) - sliced_buffer_size_in_kb: 128 - in_memory_compaction_limit_in_mb: 50 - rpc_timeout_in_ms: 15000 Schema for most problematic CF: name: DeviceEventsByDevice column_type: Standard memtable_throughput_in_mb: 150 memtable_operations_in_millions: 1.5 gc_grace_seconds: 172800 keys_cached: 1000000 rows_cached: 0 Dan Hendry (403) 660-2297 ------=_NextPart_001_0026_01CBC87F.23BE7FC0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I have = been having SEVERE data corruption issues with SSTables in my cluster, = for one CF it was happening almost daily (I have since shut down the = service using that CF as it was too much work to manage the Cassandra = errors). At this point, I can’t see how it is anything but a = Cassandra bug yet it’s somewhat strange and very scary that I am = the only one who seems to be having such serious issues. Most of my data = is indexed in two ways so I have been able to write a validator which = goes through and back fills missing data but it’s kind of = defeating the whole point of Cassandra. The only way I have found to = deal with issues when they crop up to prevent nodes crashing from = repeated failed compactions is delete the SSTable. My cluster is running = a slightly modified 0.7.0 version which logs what files errors for so = that I can stop the node and delete them.

 

The problem: =

-          = Reads, compactions and hinted handoff fail with = various exceptions (samples shown at the end of this email) which seem = to indicate sstable corruption.

-          = I have seen failed reads/compactions/hinted = handoff on 4 out of 4 nodes (RF=3D2) for 3 different super column = families and 1 standard column family (4 out of 11) and just now, the = Hints system CF. (if it matters the ring has not changed since one CF = which has been giving me trouble was created). I have check SMART disk = info and run various diagnostics and there does not seem to be any = hardware issues, plus what are the chances of all four nodes having the = same hardware problems at the same time when for all other purposes, = they appear fine?

-          = I have added logging which outputs what sstable = are causing exceptions to be thrown. The corrupt sstables have been both = freshly flushed memtables and the output of compaction (ie, 4 sstables = which all seem to be fine get compacted to 1 which is then corrupt). It = seems that the majority of corrupt sstables are post-compacted (vs = post-memtable flush).

-          = The one CF which was giving me the most problems = was heavily written to (1000-1500 writes/second continually across the = cluster). For that cf, was having to deleting 4-6 sstables a day across = the cluster (and the number was going up, even the number of problems = for remaining CFs is going up). The other CFs which have had corrupt = sstables are also quite heavily written to (generally a few hundred = writes a second across the cluster).

-          = Most of the time (5/6 attempts) when this = problem occurs, sstable2json also fails. I have however, had one case = where I was able to export the sstable to json, then re-import it at = which point I was no longer seeing exceptions.

-          = The cluster has been running for a little over 2 = months now, problem seems to have sprung up in the last 3-4 weeks and = seems to be steadily getting worse.

 

Ultimately, = I think I am hitting some subtle race condition somewhere. I have been = starting to dig into the Cassandra code but I barely know where to start = looking. I realize I have not provided nearly enough information to = easily debug the problem but PLEASE keep your eyes open for possibly = racy or buggy code which could cause these sorts of problems. I am = willing to provided full Cassandra logs and a corrupt SSTable on an = individual basis: please email me and let me know.

 

Here is = possibly relevant information and my theories on a possible root cause. = Again, I know little about the Cassandra code base and have only = moderate java experience so these theories may be way off = base.

-          = Strictly speaking, I probably don’t have = enough memory for my workload. I see stop the world gc occurring = ~30/day/node, often causing Cassandra to hang for 30+ seconds (according = to the gc logs). Could there be some java bug where a full gc in the = middle of writing or flushing (compaction/memtable flush) or doing some = other disk based activity causes some sort of data = corruption?

-          = Writes are usually done at ConsistencyLevel ONE = with additional client side retry logic. Given that I often see = consecutive nodes in the ring down, could there be some edge condition = where dying at just the right time causes parts of mutations/messages to = be lost?

-          = All of the CFs which have been causing me = problems have large rows which are compacted incrementally. Could there = be some problem with the incremental compaction logic?

-          = My cluster has a fairly heavy write load (again, = the most problematic CF is getting 1500 (w/s)/(RF=3D2) =3D 750 = writes/second/node). Furthermore, it is highly probable that there are = timestamp collisions. Could there be some issue with timestamp logic = (ie, using > instead of >=3D or some such) during = flushes/compaction?

-          = Once a node

 

Cluster/system information:

-          = 4 nodes with RF=3D2

-          = Nodes have 8 cores with 24 GB of RAM a = piece.

-          = 2 HDs, 1 for commit log/system, 1 for = /var/lib/cassandra/data

-          = OS is Ubuntu 10.04 (uname –r =3D = 2.6.32-24-server)

-          = Java:

o   java version = "1.6.0_22"

o   Java(TM) SE = Runtime Environment (build 1.6.0_22-b04)

o   Java HotSpot(TM) = 64-Bit Server VM (build 17.1-b03, mixed mode)

-          = Slightly modified (file information in = exceptions) version of 0.7.0

 

The = following non-standard cassandra.yaml properties have been = changed:

-          = commitlog_sync_period_in_ms: 100 (with = commitlog_sync: periodic)

-          = disk_access_mode: = mmap_index_only

-          = concurrent_reads: 12

-          = concurrent_writes: 2 (was 32, but I dropped it = to 2 to try and eliminate any mutation race conditions – did not = seem to help)

-          = sliced_buffer_size_in_kb: 128

-          = in_memory_compaction_limit_in_mb: = 50

-          = rpc_timeout_in_ms: 15000

 

Schema for = most problematic CF:

name: DeviceEventsByDevice

column_type: = Standard

memtable_throughput_in_mb: = 150

memtable_operations_in_millions: = 1.5

gc_grace_seconds: 172800

keys_cached: = 1000000

rows_cached: 0

 

Dan = Hendry

(403) = 660-2297

 

------=_NextPart_001_0026_01CBC87F.23BE7FC0-- ------=_NextPart_000_0025_01CBC87F.23BE7FC0 Content-Type: text/plain; name="Crazy row key.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="Crazy row key.txt" Occuring on node 4. Note that all row keys are long bytes so a value of = 00012e0290cc4d0000000357b71a25020003f4804d547bdf00049bba059e18800000000d0= 1616e64726f69647.... is pretty much impossible. Also look at the row = size: 8245925322843059048 bytes. Wow INFO [MutationStage:2] 2011-02-08 09:29:36,243 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@678659249(76918719 bytes, 1573295 = operations) INFO [FlushWriter:1] 2011-02-08 09:29:36,243 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@678659249(76918719 bytes, 1573295 = operations) INFO [FlushWriter:1] 2011-02-08 09:29:39,251 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1620-Data.db = (81069730 bytes) INFO [CompactionExecutor:1] 2011-02-08 09:29:39,252 = CompactionManager.java (line 272) Compacting = [org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra= /data/kikmetrics/DeviceEventsByDevice-e-1617-Data.db'),org.apache.cassand= ra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/De= viceEventsByDevice-e-1618-Data.db'),org.apache.cassandra.io.sstable.SSTab= leReader(path=3D'/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-= e-1619-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/v= ar/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1620-Data.db')] INFO [COMMIT-LOG-WRITER] 2011-02-08 09:29:43,902 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297178983902.log INFO [CompactionExecutor:1] 2011-02-08 09:29:57,928 = CompactionManager.java (line 354) Compacted to = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-tmp-e-1621-Data.d= b. 323,738,115 to 322,127,595 (~99% of original) bytes for 92,628 keys. = Time: 18,675ms. INFO [CompactionExecutor:1] 2011-02-08 09:29:57,940 = CompactionManager.java (line 272) Compacting = [org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra= /data/kikmetrics/DeviceEventsByDevice-e-1606-Data.db'),org.apache.cassand= ra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/De= viceEventsByDevice-e-1621-Data.db'),org.apache.cassandra.io.sstable.SSTab= leReader(path=3D'/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-= e-1461-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/v= ar/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1616-Data.db'),or= g.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/da= ta/kikmetrics/DeviceEventsByDevice-e-1560-Data.db'),org.apache.cassandra.= io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/Devic= eEventsByDevice-e-927-Data.db'),org.apache.cassandra.io.sstable.SSTableRe= ader(path=3D'/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-13= 76-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/l= ib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-928-Data.db'),org.apa= che.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/ki= kmetrics/DeviceEventsByDevice-e-1302-Data.db')] INFO [MutationStage:2] 2011-02-08 09:30:27,996 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717898= 3902.log', position=3D83321927) INFO [MutationStage:2] 2011-02-08 09:30:27,997 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@1842496810(76808204 bytes, 1575390 = operations) INFO [FlushWriter:1] 2011-02-08 09:30:27,997 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@1842496810(76808204 bytes, 1575390 = operations) INFO [FlushWriter:1] 2011-02-08 09:30:35,080 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1623-Data.db = (80639143 bytes) INFO [COMMIT-LOG-WRITER] 2011-02-08 09:30:55,346 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297179055346.log INFO [ScheduledTasks:1] 2011-02-08 09:31:21,303 GCInspector.java (line = 133) GC for ParNew: 235 ms, 387116488 reclaimed leaving 4872286800 used; = max is 12747341824 INFO [MutationStage:2] 2011-02-08 09:31:28,688 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717905= 5346.log', position=3D49291199) INFO [MutationStage:2] 2011-02-08 09:31:28,689 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@1183809687(76847864 bytes, 1573549 = operations) INFO [FlushWriter:1] 2011-02-08 09:31:28,689 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@1183809687(76847864 bytes, 1573549 = operations) INFO [FlushWriter:1] 2011-02-08 09:31:33,055 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1624-Data.db = (81056041 bytes) INFO [ScheduledTasks:1] 2011-02-08 09:31:59,403 Gossiper.java (line = 195) InetAddress /192.168.4.15 is now dead. INFO [HintedHandoff:1] 2011-02-08 09:32:06,925 = HintedHandOffManager.java (line 226) Could not complete hinted handoff = to /192.168.4.15 INFO [HintedHandoff:1] 2011-02-08 09:32:06,926 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for HintsColumnFamily at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717905= 5346.log', position=3D95961059) INFO [HintedHandoff:1] 2011-02-08 09:32:06,926 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-HintsColumnFamily@1906760351(18233195 bytes, 328123 operations) INFO [FlushWriter:1] 2011-02-08 09:32:06,927 Memtable.java (line 155) = Writing Memtable-HintsColumnFamily@1906760351(18233195 bytes, 328123 = operations) INFO [FlushWriter:1] 2011-02-08 09:32:07,897 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/system/HintsColumnFamily-e-668-Data.db (9745221 = bytes) INFO [GossipStage:1] 2011-02-08 09:32:33,538 Gossiper.java (line 569) = InetAddress /192.168.4.15 is now UP INFO [COMMIT-LOG-WRITER] 2011-02-08 09:32:55,174 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297179175174.log INFO [MutationStage:1] 2011-02-08 09:33:37,953 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717917= 5174.log', position=3D47142146) INFO [MutationStage:1] 2011-02-08 09:33:37,954 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@952974802(76976245 bytes, 1576516 = operations) INFO [FlushWriter:1] 2011-02-08 09:33:37,954 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@952974802(76976245 bytes, 1576516 = operations) INFO [FlushWriter:1] 2011-02-08 09:33:41,329 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-1625-Data.db = (84280730 bytes) INFO [CompactionExecutor:1] 2011-02-08 09:33:56,600 = CompactionIterator.java (line 134) Compacting large row 00000000deadbeef = (233750447 bytes) incrementally INFO [CompactionExecutor:1] 2011-02-08 09:34:21,119 = CompactionIterator.java (line 134) Compacting large row = 00012e0290cc4d0000000357b71a25020003f4804d547bdf00049bba059e18800000000d0= 1616e64726f6964707573683200100000012e029929ab0000000357bb09b9020003f4804d= 547e0300049bba264ac3f80000000d01616e64726f6964707573683200100000012e02995= 46e0000000357bb1c75020003f4804d547e0e00049bba26f1cdb00000000d01616e64726f= 6964707573683200100000012e029a746a0000000357bba806020003f4804d547e5800049= bba2b56be100000000d01616e64726f6964707573683200100000012e029e44c700000003= 57bd72f7020003f4804d547f5200049bba3a3cad400000000d01616e64726f69647075736= 83200100000012e029e493e0000000128fba216020003f4804d547f5300049bba3a4e1a30= 0000000d01616e64726f6964707573683100100000012e029eaa030000000328fbd7d5020= 003f4804d547f6c00049bba3bc81bb80000000d01616e64726f6964707573683100100000= 012e029eaf170000000128fbda18020003f4804d547f6d00049bba3bdbf1d80000000d016= 16e64726f6964707573683100100000012e029ef0ad0000000328fbf780020003f4804d54= 7f7e00049bba3cdc23c80000000d01616e64726f6964707573683100100000012e029f1b3= 70000000328fc0af5020003f4804d547f8900049bba3d824ed80000000d01616e64726f69= 64707573683100100000012e029f9dbe0000000328fc467e020003f4804d547faa00049bb= a3f802e300000000d01616e64726f6964707573683100100000012e029fb2c10000000328= fc5070020003f4804d547faf00049bba3fd241e80000000d01616e64726f6964707573683= 100100000012e02a032ab0000000328fc8b20020003f4804d547fd000049bba41c5ebf800= 00000d01616e64726f6964707573683100100000012e02a0c3b90000000328fcd6a502000= 3f4804d547ff500049bba43fc8aa80000000d01616e64726f696470757368310010000001= 2e02a0d34f0000000128fcddba020003f4804d547ff900049bba44397c380000000d01616= e64726f6964707573683100100000012e02a154520000000328fd16de020003f4804d5480= 1a00049bba463164380000000d01616e64726f6964707573683100100000012e02a18da40= 000000328fd2f9b020003f4804d54802900049bba471148a00000000d01616e64726f6964= 707573683100100000012e02a20ef20000000328fd6814020003f4804d54804a00049bba4= 90a61500000000d01616e64726f6964707573683100100000012e02a5ba6e0000000328ff= 1c25020003f4804d54813b00049bba57604d500000000d01616e64726f696470757368310= 0100000012e02a666520000000328ff7294020003f4804d54816700049bba59ffb0500000= 000d01616e64726f6964707573683100100000012e02a85c9d0000000329005b85020003f= 4804d5481e700049bba61a9c5480000000d01616e64726f6964707573683100100000012e= 02aa16e40000000329012abd020003f4804d54825800049bba68696e880000000d01616e6= 4726f6964707573683100100000012e02ab9985000000032901e1a8020003f4804d5482bb= 00049bba6e4faf880000000d01616e64726f6964707573683100100000012e02ac217c000= 0000329021e9c020003f4804d5482de00049bba7062cc600000000d01616e64726f696470= 7573683100100000012e02ac92c800000001929d5920020003f4804d5482fb00049bba721= d78980000000d01616e64726f6964707573683300100000012e02ace27600000003929d7c= ba020003f4804d54831000049bba73549cf00000000d01616e64726f69647075736833001= 00000012e02ad85b800000003929dc727020003f4804d54833900049bba75d256c0000000= 0d01616e64726f6964707573683300100000012e02ae18a200000001929e1519020003f48= 04d54835f00049bba781038d00000000d01616e64726f6964707573683300100000012e02= aed33600000003929e68cc020003f4804d54838f00049bba7ae912c00000000d01616e647= 26f6964707573683300100000012e02af109c00000003929e849a020003f4804d54839e00= 049bba7bd8e1600000000d01616e64726f6964707573683300100000012e03b8488b00000= 0019317e781020003f4804d54c78400049bbe87db5ef80000000d01616e64726f69647075= 73683300100000012e03b85ecf000000039317f035020003f4804d54c78900049bbe88325= 8980000000d01616e64726f6964707573683300100000012e03c634f100000001931d5132= 020003f4804d54cb1400049bbebe3ecd680000000d01616e64726f6964707573683300100= 000012e03c6473800000003931d57df020003f4804d54cb1900049bbebe8632c00000000d= 01616e64726f6964707573683300100000012e03d46428000000019322b511020003f4804= d54ceb600049bbef5a73c400000000d01616e64726f6964707573683300100000012e03e2= 3521000000019327cee0020003f4804d54d23f00049bbf2b9f88e80000000d01616e64726= f6964707573683300100000012e03e25052000000039327d87a020003f4804d54d2460004= 9bbf2c09c0500000000d01616e64726f6964707573683300100000012e03f025e90000000= 1299162a9020003f4804d54d5d100049bbf62141a100000000d01616e64726f6964707573= 683100100000012e03f0467d0000000329916da6020003f4804d54d5d900049bbf629373a= 00000000d01616e64726f6964707573683100100000012e03fe1d9d00000001299644b502= 0003f4804d54d96400049bbf98a3ad480000000d01616e64726f696470757368310010000= 0012e03fe3e310000000329964f9a020003f4804d54d96d00049bbf9922ef680000000d01= 616e64726f6964707573683100100000012e040c13da00000001299af40a020003f4804d5= 4dcf700049bbfcf2d8b900000000d01616e64726f6964707573683100100000012e040c26= 8600000003299afa16020003f4804d54dcfc00049bbfcf767b700000000d01616e64726f6= 964707573683100100000012e041a437900000001299fa6db020003f4804d54e09900049b= c0069790a80000000d01616e64726f6964707573683100100000012e041f00ab0000000a9= 33cc163020003f4804d54e1d000049bc0191ae2480000000d01616e64726f696470757368= 3300100000012e04280f500000000129a42161020003f4804d54e42100049bc03c7bd0800= 000000d01616e64726f6964707573683100100000012e0428303e0000000329a42b290200= 03f4804d54e42a00049bc03cfc72300000000d01616e64726f69647075736831001000000= 12e043604700000000129a88584020003f4804d54e7b400049bc0730155800000000d0161= 6e64726f6964707573683100100000012e043617100000000329a88b50020003f4804d54e= 7b900049bc0734a16800000000d01616e64726f6964707573683100100000012e044435cf= 0000000129ace68d020003f4804d54eb5600049bc0aa7230980000000d01616e64726f696= 4707573683100100000012e0452024c0000000129b108c1020003f4804d54eede00049bc0= e058f8e00000000d01616e64726f6964707573683100100000012e045214080000000329b= 113cb020003f4804d54eee300049bc0e09e3f400000000d01616e64726f69647075736831= 00100000012e046030780000000129b532a7020003f4804d54f28000049bc117bd54c0000= 0000d01616e64726f6964707573683100100000012e046dffd10000000129b93632020003= f4804d54f60900049bc14daf48680000000d01616e64726f6964707573683100100000012= e046e11b50000000329b93b44020003f4804d54f60d00049bc14df52b080000000d01616e= 64726f6964707573683100100000012e047c2c980000000129bd46ba020003f4804d54f9a= a00049bc1850e31c00000000d01616e64726f6964707573683100100000012e0489fc8200= 000001935cef29020003f4804d54fd3300049bc1bb025bd00000000d01616e64726f69647= 07573683300100000012e048a1bc200000003935cf792020003f4804d54fd3b00049bc1bb= 7c6dd00000000d01616e64726f6964707573683300100000012e0497f0cf000000019360d= ee7020003f4804d5500c500049bc1f184a8980000000d01616e64726f6964707573683300= 100000012e049802ef000000039360e3b6020003f4804d5500ca00049bc1f1cb798000000= 00d01616e64726f6964707573683300100000012e04a62055000000019364e3db020003f4= 804d55046700049bc228ee4c080000000d01616e64726f6964707573683300100000012e0= 4b3eded000000019368bee2020003f4804d5507f000049bc25ed965c80000000d01616e64= 726f6964707573683300100000012e04b40049000000039368c3db020003f4804d5507f40= 0049bc25f211d280000000d01616e64726f6964707573683300100000012e04b8ea830000= 000a29ce5d25020003f4804d55093600049bc2725423400000000d01616e64726f6964707= 573683100100000012e04c21e8000000001936cbc6c020003f4804d550b9200049bc29647= 24000000000d01616e64726f6964707573683300100000012e04cfeb92000000019370a22= e020003f4804d550f1a00049bc2cc3032500000000d01616e64726f696470757368330010= 0000012e04cffdb3000000039370a6ed020003f4804d550f1f00049bc2cc7703380000000= d01616e64726f6964707573683300100000012e04de1b450000000193749db1020003f480= 4d5512bc00049bc3039a85880000000d01616e64726f6964707573683300100000012e04e= be6f1000000019378801e020003f4804d55164400049bc3397e1d680000000d01616e6472= 6f6964707573683300100000012e04ebf99a0000000393788521020003f4804d551649000= 49bc339c701900000000d01616e64726f6964707573683300100000012e04fa16a1000000= 01937c8eaf020003f4804d5519e600049bc370e864e80000000d01616e64726f696470757= 3683300100000012e0507e10a00000001938084bf020003f4804d551d6d00049bc3a6c70f= 100000000d01616e64726f6964707573683300100000012e0507f33e00000003938089bc0= 20003f4804d551d7200049bc3a70e2a300000000d01616e64726f69647075736833001000= 00012e051613bf0000000158ab0cd1020003f4804d55211000049bc3de3d22180000000d0= 1616e64726f6964707573683200100000012e0522b86d0000000158aec8ba020003f4804d= 55244c00049bc40fa069c80000000d01616e64726f6964707573683200100000012e0522c= f2a0000000358aed444020003f4804d55245200049bc40ff93c100000000d01616e64726f= 6964707573683200100000012e0530eb360000000158b32935020003f4804d5527ef00049= bc44716caf00000000d01616e64726f6964707573683200100000012e053eba6f00000001= 58b784a2020003f4804d552b7800049bc47d0841980000000d01616e64726f69647075736= 83200100000012e053ecc7a0000000358b789fe020003f4804d552b7d00049bc47d4ebc90= 0000000d01616e64 (8245925322843059048 bytes) incrementally ERROR [CompactionExecutor:1] 2011-02-08 09:34:45,770 = AbstractCassandraDaemon.java (line 91) Fatal exception in thread = Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.IOException: Corrupt (negative) value length = encountered at = org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdent= ityIterator.java:112) at = org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdent= ityIterator.java:34) at = org.apache.commons.collections.iterators.CollatingIterator.set(CollatingI= terator.java:284) at = org.apache.commons.collections.iterators.CollatingIterator.least(Collatin= gIterator.java:326) at = org.apache.commons.collections.iterators.CollatingIterator.next(Collating= Iterator.java:230) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:68) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = com.google.common.collect.Iterators$7.computeNext(Iterators.java:604) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.jav= a:76) at = org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50) at = org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java= :88) at = org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIter= ator.java:136) at = org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.= java:107) at = org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.= java:42) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:73) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.commons.collections.iterators.FilterIterator.setNextObject(Fil= terIterator.java:183) at = org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIte= rator.java:94) at = org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.= java:323) at = org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:2= 16) at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Corrupt (negative) value length = encountered at = org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:274= ) at = org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:94) at = org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:35) at = org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdent= ityIterator.java:108) ... 28 more ------=_NextPart_000_0025_01CBC87F.23BE7FC0 Content-Type: text/plain; name="Hinted Handoff Then Compaction Failure.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="Hinted Handoff Then Compaction Failure.txt" Occuring on node 3 INFO [CompactionExecutor:1] 2011-02-08 09:38:39,918 = CompactionManager.java (line 272) Compacting = [org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra= /data/kikmetrics/DeviceEventsByDevice-e-2228-Data.db'),org.apache.cassand= ra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/De= viceEventsByDevice-e-2381-Data.db'),org.apache.cassandra.io.sstable.SSTab= leReader(path=3D'/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-= e-2512-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/v= ar/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2568-Data.db'),or= g.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/da= ta/kikmetrics/DeviceEventsByDevice-e-2573-Data.db')] ERROR [HintedHandoff:1] 2011-02-08 09:39:22,578 = AbstractCassandraDaemon.java (line 91) Fatal exception in thread = Thread[HintedHandoff:1,1,main] java.lang.RuntimeException: java.lang.RuntimeException: File = :/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2568-Data.db at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: File = :/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2568-Data.db at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:131) at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:47) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTab= leSliceIterator.java:108) at = org.apache.commons.collections.iterators.CollatingIterator.set(CollatingI= terator.java:283) at = org.apache.commons.collections.iterators.CollatingIterator.least(Collatin= gIterator.java:326) at = org.apache.commons.collections.iterators.CollatingIterator.next(Collating= Iterator.java:230) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:68) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(Sli= ceQueryFilter.java:118) at = org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFi= lter.java:142) at = org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamily= Store.java:1230) at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySto= re.java:1107) at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySto= re.java:1077) at = org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffMan= ager.java:113) at = org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(Hinte= dHandOffManager.java:219) at = org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffMana= ger.java:81) at = org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffM= anager.java:297) at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 6 more Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at = org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280= ) at = org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:94) at = org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:35) at = org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFet= cher.getNextBlock(IndexedSliceReader.java:187) at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:119) ... 26 more INFO [HintedHandoff:1] 2011-02-08 09:39:22,601 = HintedHandOffManager.java (line 192) Started hinted handoff for endpoint = /192.168.4.16 INFO [COMMIT-LOG-WRITER] 2011-02-08 09:43:46,225 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297179826225.log INFO [MutationStage:1] 2011-02-08 09:43:52,882 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717982= 6225.log', position=3D6336478) INFO [MutationStage:1] 2011-02-08 09:43:52,885 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@1844846786(78348562 bytes, 1573332 = operations) INFO [FlushWriter:1] 2011-02-08 09:43:52,885 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@1844846786(78348562 bytes, 1573332 = operations) INFO [FlushWriter:1] 2011-02-08 09:43:56,780 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2585-Data.db = (93346984 bytes) INFO [MutationStage:1] 2011-02-08 09:45:47,187 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717982= 6225.log', position=3D106463825) INFO [MutationStage:1] 2011-02-08 09:45:47,188 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@464767205(77940719 bytes, 1573098 = operations) INFO [FlushWriter:1] 2011-02-08 09:45:47,188 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@464767205(77940719 bytes, 1573098 = operations) INFO [FlushWriter:1] 2011-02-08 09:45:50,711 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2586-Data.db = (84783198 bytes) INFO [COMMIT-LOG-WRITER] 2011-02-08 09:46:11,710 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297179971710.log INFO [ScheduledTasks:1] 2011-02-08 09:46:47,583 Gossiper.java (line = 195) InetAddress /192.168.4.16 is now dead. INFO [HintedHandoff:1] 2011-02-08 09:46:54,322 = HintedHandOffManager.java (line 226) Could not complete hinted handoff = to /192.168.4.16 INFO [HintedHandoff:1] 2011-02-08 09:46:54,323 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for HintsColumnFamily at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717997= 1710.log', position=3D33082308) INFO [HintedHandoff:1] 2011-02-08 09:46:54,323 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-HintsColumnFamily@920453652(246377 bytes, 4568 operations) INFO [FlushWriter:1] 2011-02-08 09:46:54,323 Memtable.java (line 155) = Writing Memtable-HintsColumnFamily@920453652(246377 bytes, 4568 = operations) INFO [FlushWriter:1] 2011-02-08 09:46:54,583 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/system/HintsColumnFamily-e-686-Data.db (258974 = bytes) INFO [GossipStage:1] 2011-02-08 09:46:54,735 Gossiper.java (line 569) = InetAddress /192.168.4.16 is now UP INFO [MutationStage:2] 2011-02-08 09:47:31,339 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129717997= 1710.log', position=3D71588101) INFO [MutationStage:2] 2011-02-08 09:47:31,339 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@13644265(78046441 bytes, 1572876 = operations) INFO [FlushWriter:1] 2011-02-08 09:47:31,340 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@13644265(78046441 bytes, 1572876 = operations) INFO [FlushWriter:1] 2011-02-08 09:47:36,128 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2587-Data.db = (84303215 bytes) INFO [COMMIT-LOG-WRITER] 2011-02-08 09:48:21,681 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297180101681.log INFO [MutationStage:1] 2011-02-08 09:48:43,808 ColumnFamilyStore.java = (line 648) switching in a fresh Memtable for DeviceEventsByDevice at = CommitLogContext(file=3D'/var/lib/cassandra/commitlog/CommitLog-129718010= 1681.log', position=3D30143206) INFO [MutationStage:1] 2011-02-08 09:48:43,808 ColumnFamilyStore.java = (line 952) Enqueuing flush of = Memtable-DeviceEventsByDevice@1355774386(77949187 bytes, 1573287 = operations) INFO [FlushWriter:1] 2011-02-08 09:48:43,809 Memtable.java (line 155) = Writing Memtable-DeviceEventsByDevice@1355774386(77949187 bytes, 1573287 = operations) INFO [ScheduledTasks:1] 2011-02-08 09:48:47,773 GCInspector.java (line = 133) GC for ParNew: 224 ms, 267050360 reclaimed leaving 4919650072 used; = max is 12747341824 INFO [FlushWriter:1] 2011-02-08 09:48:54,398 Memtable.java (line 162) = Completed flushing = /var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2588-Data.db = (82775159 bytes) ERROR [CompactionExecutor:1] 2011-02-08 09:49:06,850 = AbstractCassandraDaemon.java (line 91) Fatal exception in thread = Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.IOException: File = :/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2512-Data.db at = org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIde= ntityIterator.java:85) at = org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(S= STableScanner.java:178) at = org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(S= STableScanner.java:143) at = org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:1= 35) at = org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:3= 8) at = org.apache.commons.collections.iterators.CollatingIterator.set(CollatingI= terator.java:284) at = org.apache.commons.collections.iterators.CollatingIterator.least(Collatin= gIterator.java:326) at = org.apache.commons.collections.iterators.CollatingIterator.next(Collating= Iterator.java:230) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:68) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.commons.collections.iterators.FilterIterator.setNextObject(Fil= terIterator.java:183) at = org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIte= rator.java:94) at = org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.= java:323) at = org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:1= 22) at = org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:9= 2) at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: File = :/var/lib/cassandra/data/kikmetrics/DeviceEventsByDevice-e-2512-Data.db ... 21 more Caused by: java.io.EOFException: attempted to skip 1768190069 bytes but = only skipped 235475578 at = org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.j= ava:52) at = org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIde= ntityIterator.java:69) ... 20 more ------=_NextPart_000_0025_01CBC87F.23BE7FC0 Content-Type: text/plain; name="Invalid colum name length.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="Invalid colum name length.txt" Occuring on node 1=20 INFO [COMMIT-LOG-WRITER] 2011-02-08 08:45:35,832 CommitLogSegment.java = (line 50) Creating new commitlog segment = /var/lib/cassandra/commitlog/CommitLog-1297176335832.log INFO [ScheduledTasks:1] 2011-02-08 08:46:34,722 Gossiper.java (line = 195) InetAddress /192.168.4.16 is now dead. INFO [HintedHandoff:1] 2011-02-08 08:46:36,992 = HintedHandOffManager.java (line 192) Started hinted handoff for endpoint = /192.168.4.16 INFO [GossipStage:1] 2011-02-08 08:46:36,992 Gossiper.java (line 569) = InetAddress /192.168.4.16 is now UP ERROR [HintedHandoff:1] 2011-02-08 08:47:51,105 = AbstractCassandraDaemon.java (line 91) Fatal exception in thread = Thread[HintedHandoff:1,1,main] java.lang.RuntimeException: java.lang.RuntimeException: File = :/var/lib/cassandra/data/kikmetrics/UserEventsByUser-e-5906-Data.db at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: File = :/var/lib/cassandra/data/kikmetrics/UserEventsByUser-e-5906-Data.db at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:131) at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:47) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTab= leSliceIterator.java:108) at = org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(Col= latingIterator.java:364) at = org.apache.commons.collections.iterators.CollatingIterator.hasNext(Collat= ingIterator.java:217) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:55) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(Sli= ceQueryFilter.java:118) at = org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFi= lter.java:142) at = org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamily= Store.java:1230) at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySto= re.java:1107) at = org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilySto= re.java:1077) at = org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffMan= ager.java:113) at = org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(Hinte= dHandOffManager.java:219) at = org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffMana= ger.java:81) at = org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffM= anager.java:297) at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 6 more Caused by: = org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid = column name length 0 at = org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.jav= a:68) at = org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:364) at = org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.jav= a:313) at = org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFet= cher.getNextBlock(IndexedSliceReader.java:187) at = org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Ind= exedSliceReader.java:119) ... 25 more ------=_NextPart_000_0025_01CBC87F.23BE7FC0 Content-Type: text/plain; name="Negative seek.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="Negative seek.txt" Occuring on node 2 INFO [CompactionExecutor:1] 2011-02-06 18:38:18,036 = CompactionManager.java (line 272) Compacting = [org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra= /data/kikmetrics/UserEventsByEvent-e-1850-Data.db'),org.apache.cassandra.= io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/UserE= ventsByEvent-e-1841-Data.db'),org.apache.cassandra.io.sstable.SSTableRead= er(path=3D'/var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1830-Da= ta.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/ca= ssandra/data/kikmetrics/UserEventsByEvent-e-1851-Data.db'),org.apache.cas= sandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetric= s/UserEventsByEvent-e-1838-Data.db'),org.apache.cassandra.io.sstable.SSTa= bleReader(path=3D'/var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-= 1848-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D'/var= /lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1697-Data.db'),org.apa= che.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/data/ki= kmetrics/UserEventsByEvent-e-1842-Data.db'),org.apache.cassandra.io.sstab= le.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/UserEventsByE= vent-e-1844-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path=3D= '/var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1633-Data.db'),or= g.apache.cassandra.io.sstable.SSTableReader(path=3D'/var/lib/cassandra/da= ta/kikmetrics/UserEventsByEvent-e-1760-Data.db'),org.apache.cassandra.io.= sstable.SSTableReader(path=3D'/var/lib/cassandra/data/kikmetrics/UserEven= tsByEvent-e-1849-Data.db')] ...=20 ... lots of the usual logging many 'compacting large row incrementally' = ... ...=20 ERROR [CompactionExecutor:1] 2011-02-06 19:04:44,296 = AbstractCassandraDaemon.java (line 91) Fatal exception in thread = Thread[CompactionExecutor:1,1,main] java.lang.RuntimeException: java.io.IOException: Negative seek offset at = org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(S= STableScanner.java:188) at = org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(S= STableScanner.java:143) at = org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:1= 35) at = org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:3= 8) at = org.apache.commons.collections.iterators.CollatingIterator.set(CollatingI= terator.java:284) at = org.apache.commons.collections.iterators.CollatingIterator.least(Collatin= gIterator.java:326) at = org.apache.commons.collections.iterators.CollatingIterator.next(Collating= Iterator.java:230) at = org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.= java:68) at = com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera= tor.java:136) at = com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:= 131) at = org.apache.commons.collections.iterators.FilterIterator.setNextObject(Fil= terIterator.java:183) at = org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIte= rator.java:94) at = org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.= java:323) at = org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:2= 16) at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886) at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method) at = org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRa= ndomAccessFile.java:235) at = org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandom= AccessFile.java:265) at java.io.RandomAccessFile.readByte(RandomAccessFile.java:589) at = org.apache.cassandra.utils.FBUtilities.readShortLength(FBUtilities.java:3= 06) at = org.apache.cassandra.utils.FBUtilities.readShortByteArray(FBUtilities.jav= a:312) at = org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(S= STableScanner.java:169) ... 18 more ------=_NextPart_000_0025_01CBC87F.23BE7FC0--