Aaron,

thanks for your help. 

I ran 'nodetool scrub' and it finished after a couple of hours. But there are no infos about 
out of order rows in the logs and the compaction on the column family still raises the same
exception. 

With the row key I could identify some of the errant SSTables and removed them during
a node restart. On some nodes compaction is working for the moment but there are likely
more corrupt datafiles and than I would be in the same situation as before.

So I still need some help to resolve this issue!

Cheers
Andre


2013/2/12 aaron morton <aaron@thelastpickle.com>
snapshot all nodes so you have a backup: nodetool snapshot -t corrupt

run nodetool scrub on the errant CF. 

Look for messages such as:

"Out of order row detected…"
"%d out of order rows found while scrubbing %s; Those have been written (in order) to a new sstable (%s)"

In the logs. 

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton

On 12/02/2013, at 6:13 AM, Andre Sprenger <andre.sprenger@getanet.de> wrote:

Hi,

I'm running a 6 node Cassandra 1.1.5 cluster on EC2. We have switched to leveled compaction a couple of weeks ago,
this has been successful. Some days ago 3 of the nodes start to log the following exception during compaction of
a particular column family:

ERROR [CompactionExecutor:726] 2013-02-11 13:02:26,582 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[CompactionExecutor:726,1,main]
java.lang.RuntimeException: Last written key DecoratedKey(84590743047470232854915142878708713938, 31333535333333383530323237303130313030303232313537303030303132393832)
>= current key DecoratedKey(28357704665244162161305918843747894551, 31333430313336313830333831303130313030303230313632303030303036363338)
writing into /var/cassandra/data/AdServer/EventHistory/Adserver-EventHistory-tmp-he-68638-Data.db
        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
        at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
        at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
        at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

Compaction does not happen any more for the column family and read performance gets worse because of the growing
number of data files accessed during reads. Looks like one or more of the data files are corrupt and have keys
that are stored out of order.

Any help to resolve this situation would be greatly appreciated.

Thanks
Andre