Found a little more info in the logs. Looks like there was a lot of GC going on and during the CMS nodes were being marked as down, does this sound correct ? Then two nodes started doing handed hint off to each other. Just mentioning it in case it has some bearing on why the Keys must be written in ascending order error occurred. 

(thought I'd got the memory tuned in, back to the drawing board :) )

This is from the node that got the error (IP 34.26) At 00:06:02 it detects another node (34.28) is down for two minutes, when it comes back up it starts Hinted Handoff at 00:06:04. Then its' GC kicked it and logged at 00:06:23 that it took 15 secs to finish. 

Cheers
Aaron


NFO [Timer-0] 2010-09-01 00:06:02,227 Gossiper.java (line 188) InetAddress /192.168.34.28 is now dead.
 INFO [GOSSIP_STAGE:1] 2010-09-01 00:06:04,964 Gossiper.java (line 570) InetAddress /192.168.34.28 is now UP
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:04,964 HintedHandOffManager.java (line 191) Started hinted handoff for endpoint /192.168.34.28
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:04,983 ColumnFamilyStore.java (line 497) switching in a fresh Memtable for HintsColumnFamily at CommitLogContext(file='/local1/junkbox/cassandra/commitlog/Commi
tLog-1283256351959.log', position=3036151)
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:04,983 ColumnFamilyStore.java (line 781) Enqueuing flush of Memtable-HintsColumnFamily@2086669444(37587600 bytes, 654783 operations)
 INFO [CompactionExecutor:1] 2010-09-01 00:06:04,984 CompactionManager.java (line 298) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/system/HintsColumnFamily-
e-27-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/system/HintsColumnFamily-e-26-Data.db')]
 INFO [FLUSH-WRITER-POOL:1] 2010-09-01 00:06:04,984 Memtable.java (line 150) Writing Memtable-HintsColumnFamily@2086669444(37587600 bytes, 654783 operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-09-01 00:06:05,879 Memtable.java (line 157) Completed flushing /local1/junkbox/cassandra/data/system/HintsColumnFamily-e-28-Data.db
 INFO [COMMIT-LOG-WRITER] 2010-09-01 00:06:05,880 CommitLog.java (line 457) Discarding obsolete commit log:CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-1283252786258.log)
 INFO [COMMIT-LOG-WRITER] 2010-09-01 00:06:05,880 CommitLog.java (line 457) Discarding obsolete commit log:CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-1283252939645.log)
 INFO [COMMIT-LOG-WRITER] 2010-09-01 00:06:05,880 CommitLog.java (line 457) Discarding obsolete commit log:CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-1283252992788.log)
 INFO [GC inspection] 2010-09-01 00:06:07,259 GCInspector.java (line 129) GC for ParNew: 320 ms, 13628224 reclaimed leaving 6219850816 used; max is 6563430400
 INFO [GC inspection] 2010-09-01 00:06:23,741 GCInspector.java (line 129) GC for ConcurrentMarkSweep: 15821 ms, 208220008 reclaimed leaving 6151222488 used; max is 6563430400
 INFO [GC inspection] 2010-09-01 00:06:23,742 GCInspector.java (line 150) Pool Name                    Active   Pending
 INFO [GC inspection] 2010-09-01 00:06:23,742 GCInspector.java (line 156) GOSSIP_STAGE                      1         0
 INFO [GC inspection] 2010-09-01 00:06:23,742 GCInspector.java (line 156) STREAM-STAGE                      0         0
 INFO [GC inspection] 2010-09-01 00:06:23,742 GCInspector.java (line 156) FILEUTILS-DELETE-POOL             0         0
 INFO [GC inspection] 2010-09-01 00:06:23,743 GCInspector.java (line 156) RESPONSE-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:23,743 GCInspector.java (line 156) MIGRATION-STAGE                   0         0
 INFO [GC inspection] 2010-09-01 00:06:23,743 GCInspector.java (line 156) ROW-READ-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:23,744 GCInspector.java (line 156) LB-OPERATIONS                     0         0
 INFO [GC inspection] 2010-09-01 00:06:23,744 GCInspector.java (line 156) MISCELLANEOUS-POOL                0         0
 INFO [GC inspection] 2010-09-01 00:06:23,744 GCInspector.java (line 156) CONSISTENCY-MANAGER               0         0
 INFO [GC inspection] 2010-09-01 00:06:23,744 GCInspector.java (line 156) LB-TARGET                         0         0
 INFO [GC inspection] 2010-09-01 00:06:23,745 GCInspector.java (line 156) ROW-MUTATION-STAGE                0         0
 INFO [GC inspection] 2010-09-01 00:06:23,745 GCInspector.java (line 156) MESSAGE-STREAMING-POOL            0         0
 INFO [GC inspection] 2010-09-01 00:06:23,745 GCInspector.java (line 156) LOAD-BALANCER-STAGE               0         0
 INFO [GC inspection] 2010-09-01 00:06:23,745 GCInspector.java (line 156) FLUSH-SORTER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:23,745 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER             0         0
 INFO [GC inspection] 2010-09-01 00:06:23,746 GCInspector.java (line 156) AE-SERVICE-STAGE                  0         0
 INFO [GC inspection] 2010-09-01 00:06:23,746 GCInspector.java (line 156) FLUSH-WRITER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:23,746 GCInspector.java (line 156) HINTED-HANDOFF-POOL               1         1
 INFO [GC inspection] 2010-09-01 00:06:23,746 GCInspector.java (line 161) CompactionManager               n/a         0
 INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 72) Last written key : DecoratedKey(86578664653752589598792334348786543293, c0a8221d)
 INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 73) Current key : DecoratedKey(27007427795986317455504170376786785255, c0a8221c)
 INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 74) Writing into file /local1/junkbox/cassandra/data/system/HintsColumnFamily-tmp-e-29-Data.db
ERROR [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:24,993 CassandraDaemon.java (line 85) Uncaught exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main]
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.


Over on the 34.28 node a similar series of events happened. It was CMSing (for 2 mintues finishing 34.26 detected it was back) , detected 34.26 was dead when it was doing CMS and  then started handed hint off (which had zero rows).

INFO [CompactionExecutor:1] 2010-09-01 00:05:51,100 CompactionManager.java (line 298) Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/junkbox/Objects-e-1077-Da
ta.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/junkbox/Objects-e-1078-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/
junkbox/Objects-e-1079-Data.db'),org.apachecassandra.io.sstable.SSTableReader(path='/local1/junkbox/cassandra/data/junkbox/Objects-e-1080-Data.db')]
 INFO [GC inspection] 2010-09-01 00:06:04,951 GCInspector.java (line 129) GC for ConcurrentMarkSweep: 12003 ms, 164886960 reclaimed leaving 6225419328 used; max is 6563430400
 INFO [GC inspection] 2010-09-01 00:06:04,952 GCInspector.java (line 150) Pool Name                    Active   Pending
 INFO [GC inspection] 2010-09-01 00:06:04,952 GCInspector.java (line 156) GOSSIP_STAGE                      0         0
 INFO [GC inspection] 2010-09-01 00:06:04,953 GCInspector.java (line 156) STREAM-STAGE                      0         0
 INFO [GC inspection] 2010-09-01 00:06:04,953 GCInspector.java (line 156) FILEUTILS-DELETE-POOL             0         0
 INFO [GC inspection] 2010-09-01 00:06:04,953 GCInspector.java (line 156) RESPONSE-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:04,953 GCInspector.java (line 156) MIGRATION-STAGE                   0         0
 INFO [GC inspection] 2010-09-01 00:06:04,954 GCInspector.java (line 156) ROW-READ-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:04,954 GCInspectorjava (line 156) LB-OPERATIONS                     0         0
 INFO [GC inspection] 2010-09-01 00:06:04,954 GCInspector.java (line 156) MISCELLANEOUS-POOL                0         0
 INFO [GC inspection] 2010-09-01 00:06:04,954 GCInspector.java (line 156) CONSISTENCY-MANAGER               0         0
 INFO [GC inspection] 2010-09-01 00:06:04,954 GCInspector.java (line 156) LB-TARGET                         0         0
 INFO [GC inspection] 2010-09-01 00:06:04,955 GCInspector.java (line 156) ROW-MUTATION-STAGE                0         0
 INFO [GC inspection] 2010-09-01 00:06:04,955 GCInspector.java (line 156) MESSAGE-STREAMING-POOL            0         0
 INFO [GC inspection] 2010-09-01 00:06:04,955 GCInspector.java (line 156) LOAD-BALANCER-STAGE               0         0
 INFO [GC inspection] 2010-09-01 00:06:04,956 GCInspector.java (line 156) FLUSH-SORTER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:04,956 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER             0         0
 INFO [GC inspection] 2010-09-01 00:06:04,957 GCInspector.java (line 156) AE-SERVICE-STAGE                  0         0
 INFO [GC inspection] 2010-09-01 00:06:04,957 GCInspector.java (line 156) FLUSH-WRITER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:04,957 GCInspector.java (line 156) HINTED-HANDOFF-POOL               0         0
 INFO [GC inspection] 2010-09-01 00:06:04,957 GCInspector.java (line 161) CompactionManager               n/a         1
 INFO [Timer-0] 2010-09-01 00:06:16,954 Gossiper.java (line 188) InetAddress /192.168.34.26 is now dead.
 INFO [GC inspection] 2010-09-01 00:06:16,954 GCInspector.java (line 129) GC for ConcurrentMarkSweep: 10603 ms, 139541288 reclaimed leaving 6225476304 used; max is 6563430400
 INFO [GC inspection] 2010-09-01 00:06:16,956 GCInspector.java (line 150) Pool Name                    Active   Pending
 INFO [GC inspection] 2010-09-01 00:06:16,956 GCInspector.java (line 156) GOSSIP_STAGE                      0         0
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:16,956 HintedHandOffManager.java (line 191) Started hinted handoff for endpoint /192.168.34.26
 INFO [GC inspection] 2010-09-01 00:06:16,956 GCInspector.java (line 156) STREAM-STAGE                      0         0
 INFO [GOSSIP_STAGE:1] 2010-09-01 00:06:16,956 Gossiper.java (line 570) InetAddress /192.168.34.26 is now UP
 INFO [GC inspection] 2010-09-01 00:06:16,957 GCInspector.java (line 156) FILEUTILS-DELETE-POOL             0         0
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:16,957 HintedHandOffManager.java (line 247) Finished hinted handoff of 0 rows to endpoint /192.168.34.26
 INFO [GC inspection] 2010-09-01 00:06:16,957 GCInspector.java (line 156) RESPONSE-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:16,957 GCInspector.java (line 156) MIGRATION-STAGE                   0         0
 INFO [GC inspection] 2010-09-01 00:06:16,958 GCInspector.java (line 156) ROW-READ-STAGE                    0         0
 INFO [GC inspection] 2010-09-01 00:06:16,958 GCInspector.java (line 156) LB-OPERATIONS                     0         0
 INFO [GC inspection] 2010-09-01 00:06:16,958 GCInspector.java (line 156) MISCELLANEOUS-POOL                0         0
 INFO [GC inspection] 2010-09-01 00:06:16,958 GCInspector.java (line 156) CONSISTENCY-MANAGER               0         0
 INFO [GC inspection] 2010-09-01 00:06:16,958 GCInspector.java (line 156) LB-TARGET                         0         0
 INFO [GC inspection] 2010-09-01 00:06:16,959 GCInspector.java (line 156) ROW-MUTATION-STAGE                0         0
 INFO [GC inspection] 2010-09-01 00:06:16,959 GCInspector.java (line 156) MESSAGE-STREAMING-POOL            0         0
 INFO [GC inspection] 2010-09-01 00:06:16,959 GCInspector.java (line 156) LOAD-BALANCER-STAGE               0         0
 INFO [GC inspection] 2010-09-01 00:06:16,959 GCInspectorjava (line 156) FLUSH-SORTER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:16,960 GCInspector.java (line 156) MEMTABLE-POST-FLUSHER             0         0
 INFO [GC inspection] 2010-09-01 00:06:16,960 GCInspector.java (line 156) AE-SERVICE-STAGE                  0         0
 INFO [GC inspection] 2010-09-01 00:06:16,960 GCInspector.java (line 156) FLUSH-WRITER-POOL                 0         0
 INFO [GC inspection] 2010-09-01 00:06:16,960 GCInspector.java (line 156) HINTED-HANDOFF-POOL               0         0
 INFO [GC inspection] 2010-09-01 00:06:16,960 GCInspector.java (line 161) CompactionManager               n/a         1
 INFO [Timer-0] 2010-09-01 00:06:31,251 Gossiper.java (line 188) InetAddress /192.168.34.26 is now dead.
  


On 01 Sep, 2010,at 08:38 AM, Aaron Morton <aaron@thelastpickle.com> wrote:

I'm running the nightly trunk build from 2010-08-26_13-52-57 and got a bunch of java.io.IOException: Keys must be written in ascending order errors. 

Am running a 4 node cluster, this error occured on a single nodes. Eventually the node went OOM, not sure if it has anything to do with the error. There was a constant work load of around 400 to 500 reads and write per second, all columns under 10k. 


 Below if the first occurrence, there were a lot more over the next 4 hours until it ran out of heap space around 4am.  I restarted this morning and got some more, they are below this one It appears to be functioning now and is doing some handed hint off. Once that is finished I will do some repairing.


Quick question, do threads in the HINTED-HANDOFF-POOL just do sending or also handle receiving?

Did a search on jira and could not find any mention of this. 

thanks
Aaron


(sorry forgot to turn up the logging, its only INFO mode)

INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 72) Last written key : DecoratedKey(86578664653752589598792334348786543293, c0a8221d)
 INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 73) Current key : DecoratedKey(27007427795986317455504170376786785255, c0a8221c)
 INFO [CompactionExecutor:1] 2010-09-01 00:06:24,966 SSTableWriter.java (line 74) Writing into file /local1/junkbox/cassandra/data/system/HintsColumnFamily-tmp-e-29-Data.db
ERROR [HINTED-HANDOFF-POOL:1] 2010-09-01 00:06:24,993 CassandraDaemon.java (line 85) Uncaught exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main]
java.util.concurrent.ExecutionException: javalang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutorjava:87)
      g  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:243)
        at org.apache.cassandra.db.HintedHandOffManageraccess$000(HintedHandOffManager.java:78)
        at org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffManager.java:296)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        ... 6 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:239)
        ... 9 more
Caused by: java.io.IOException: Keys must be written in ascending order.
        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:75)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:92)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:352)
        at org.apachecassandra.db.CompactionManager$5.call(CompactionManager.java:229)
        ... 5 more

ERROR [CompactionExecutor:1] 2010-09-01 00:06:24,993 CassandraDaemon.java (line 85) Uncaught exception in thread Thread[CompactionExecutor:1,1,main]
java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:87)
        at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:651)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutorjava:888)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Keys must be written in ascending order.
        at org.apache.cassandraio.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:75)
        at org.apache.cassandra.io.sstableSSTableWriter.append(SSTableWriter.java:92)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:352)
        at org.apache.cassandra.db.CompactionManager$5.call(CompactionManager.java:229)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more
 INFO [Timer-0] 2010-09-01 00:06:41,764 Gossiper.java (line 188) InetAddress /192.168.34.28 is now dead.


================

Died 4 hours later, after the restart got more 

 INFO [CompactionExecutor:1] 2010-09-01 07:58:31,707 SSTableWriter.java (line 72) Last written key : DecoratedKey(86578664653752589598792334348786543293, c0a8221d)
 INFO [CompactionExecutor:1] 2010-09-01 07:58:31,708 SSTableWriter.java (line 73) Current key : DecoratedKey(27007427795986317455504170376786785255, c0a8221c)
 INFO [CompactionExecutor:1] 2010-09-01 07:58:31,708 SSTableWriter.java (line 74) Writing into file /local1/junkbox/cassandra/data/system/HintsColumnFamily-tmp-e-63-Data.db
ERROR [CompactionExecutor:1] 2010-09-01 07:58:31,709 CassandraDaemon.java (line 85) Uncaught exception in thread Thread[CompactionExecutor:1,1,main]
java.util.concurrent.ExecutionException: java.io.IOException: Keys must be written in ascending order.
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:87)
        at org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:651)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
        at java.utilconcurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Keys must be written in ascending order.
        at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:75)
        at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:92)
        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:352)
        at orgapache.cassandra.db.CompactionManager$2.call(CompactionManagerjava:154)
        at org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:135)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        ... 2 more