I ran nodetool scrub on nodes in the less corrupted datacenter and tried nodetool rebuild from this datacenter.

This is the result:

2014-02-06 15:04:24.645+0100 [Thread-83] [ERROR] CassandraDaemon.java(191) org.apache.cassandra.service.CassandraDaemon: Exception in thread Thread[Thread-83,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:152)
at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:187)
at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:138)
at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:243)
at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:183)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:79)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:144)
... 5 more
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78)
at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:132)
at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:115)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:165)
at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:45)
at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:61)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:182)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154)
at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:86)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123)
at org.apache.cassandra.db.SliceQueryPager.next(SliceQueryPager.java:57)
at org.apache.cassandra.db.Table.indexRow(Table.java:424)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:803)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2014-02-06 15:04:24.646+0100 [CompactionExecutor:10] [ERROR] CassandraDaemon.java(191) org.apache.cassandra.service.CassandraDaemon: component=c4 Exception in thread Thread[CompactionExecutor:10,1,main]
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78)
at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:132)
at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:115)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:165)
at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:45)
at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:61)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:182)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154)
at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:86)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123)
at org.apache.cassandra.db.SliceQueryPager.next(SliceQueryPager.java:57)
at org.apache.cassandra.db.Table.indexRow(Table.java:424)
at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:803)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Should I file a bug report with all this?

regards,

ondrej cernos


On Thu, Feb 6, 2014 at 2:38 PM, Ondřej Černoš <cernoso@gmail.com> wrote:
Hi,

I am running a small 2 DC cluster of 3 nodes (each DC). I use 3 replicas in both DCs (all 6 nodes have everything) on Cassandra 1.2.11. I populated the cluster via cqlsh pipelined with a series of inserts. I use the cluster for tests, the dataset is pretty small (hundreds of thousands of records max).

The cluster was completely up during inserts. Inserts were done serially on one of the nodes.

The resulting load is uneven:

Datacenter: xxx
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns (effective)  Host ID                               Rack
UN  ip  1.63 GB    256     100.0%            83ecd32a-3f2b-4cf6-b3c7-b316cb1986cc  default-rack
UN  ip  1.5 GB     256     100.0%            091ca530-2e95-4954-92c4-76f51fab0b66  default-rack
UN ip 1.44 GB 256 100.0% d94d335e-08bf-4a30-ad58-4c5acdc2ef45 default-rack
Datacenter: yyy =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN ip 2.27 GB 256 100.0% e2584981-71f7-45b0-82f4-e08942c47585 1c
UN ip 2.27 GB 256 100.0% e5c6de9a-819e-4757-a420-55ec3ffaf131 1c
UN ip
2.27 GB 256 100.0% fa53f391-2dd3-4ec8-885d-8db6d453a708 1c

And 4 out of 6 nodes report corrupted sstables:

java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1618)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:119)
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68)
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:44)
        at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:104)
        at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
        at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
        at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
        at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
        at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1123)
        at org.apache.cassandra.db.Table.getRow(Table.java:347)
        at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1062)
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1614)
        ... 3 more
Caused by: java.io.IOException: mmap segment underflow; remaining is 239882945 but 1349280116 requested
        at org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:135)
        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
        at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108)
        at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92)
        at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73)
        at org.apache.cassandra.db.columniterator.IndexedSliceReader$SimpleBlockFetcher.<init>(IndexedSliceReader.java:477)
        at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:94)

repair -pr hangs, rebuild from the less corrupted dc hangs.

The only interesting exception (besides the java.io.EOFException during repair) is the following:

org.apache.cassandra.db.marshal.MarshalException: invalid UTF8 bytes 52f2665b
	at org.apache.cassandra.db.marshal.UTF8Type.getString(UTF8Type.java:54)
	at org.apache.cassandra.db.index.AbstractSimplePerColumnSecondaryIndex.insert(AbstractSimplePerColumnSecondaryIndex.java:102)
	at org.apache.cassandra.db.index.SecondaryIndexManager.indexRow(SecondaryIndexManager.java:448)
	at org.apache.cassandra.db.Table.indexRow(Table.java:431)
	at org.apache.cassandra.db.index.SecondaryIndexBuilder.build(SecondaryIndexBuilder.java:62)
	at org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:803)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

during rebuild and some repair -pr sessions.

Does anyone know what might have caused this?

regards,

ondrej cernos