No, I don't think the direct cause is out of heap space. It didn't left any heap dump file with the option -XX:+HeapDumpOnOutOfMemoryError.

My system.log for the last minute is as follows(many GC occurs):

INFO [HINTED-HANDOFF-POOL:1] 2010-08-02 20:33:50,254 HintedHandOffManager.java (line 153) Started hinted handoff for endPoint /10.25.32.36
INFO [HINTED-HANDOFF-POOL:1] 2010-08-02 20:33:50,255 HintedHandOffManager.java (line 210) Finished hinted handoff of 0 rows to endpoint /10.25.32.36
INFO [GC inspection] 2010-08-02 20:34:01,919 GCInspector.java (line 110) GC for ParNew: 269 ms, 11161808 reclaimed leaving 5068544312 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:34:14,985 GCInspector.java (line 110) GC for ParNew: 208 ms, 12326736 reclaimed leaving 4044195008 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:34:15,986 GCInspector.java (line 110) GC for ParNew: 208 ms, 12283112 reclaimed leaving 2005777224 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:35:45,834 GCInspector.java (line 110) GC for ParNew: 229 ms, 13074080 reclaimed leaving 5374833480 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:35:46,836 GCInspector.java (line 110) GC for ParNew: 203 ms, 12529824 reclaimed leaving 5321733432 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:37:01,624 GCInspector.java (line 110) GC for ParNew: 206 ms, 11029656 reclaimed leaving 4473650352 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:38:19,064 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 4501 ms, 1057548400 reclaimed leaving 2461458096 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:41:20,446 GCInspector.java (line 110) GC for ParNew: 218 ms, 15072720 reclaimed leaving 5345683640 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:41:23,453 GCInspector.java (line 110) GC for ParNew: 234 ms, 16818048 reclaimed leaving 3937902088 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:42:15,229 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 8015 ms, 739534984 reclaimed leaving 3550138024 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:42:42,444 GCInspector.java (line 110) GC for ParNew: 203 ms, 14218928 reclaimed leaving 4398967608 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:42:43,565 GCInspector.java (line 110) GC for ParNew: 203 ms, 12274600 reclaimed leaving 1989854648 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:43:01,801 GCInspector.java (line 110) GC for ParNew: 212 ms, 10183184 reclaimed leaving 2337034168 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:44:01,215 GCInspector.java (line 110) GC for ParNew: 218 ms, 10402368 reclaimed leaving 4334140184 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:44:35,623 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 4424 ms, 3101007888 reclaimed leaving 2459621048 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:45:32,089 GCInspector.java (line 110) GC for ParNew: 227 ms, 27109720 reclaimed leaving 5410486832 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:45:35,095 GCInspector.java (line 110) GC for ParNew: 203 ms, 28235832 reclaimed leaving 3580093424 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:46:01,203 GCInspector.java (line 110) GC for ParNew: 257 ms, 12257744 reclaimed leaving 3469012312 used; max is 8719630336
INFO [GC inspection] 2010-08-02 20:46:51,060 GCInspector.java (line 110) GC for ParNew: 222 ms, 18473064 reclaimed leaving 5320004640 used; max is 8719630336





and the JRE crash log show that current thread is COMPACTION-POOL:1.




ps. Sorry for unnecessary message. my mistake.




2010/8/3 Ilun Ahn <prozect.mail@gmail.com>


2010/8/2 Peter Schuller <peter.schuller@infidyne.com>

> First, Cassandra suddenly dies during compaction. Java core dump says that
> the last thread run was  "COMPACTION-POOL:1".
> I suspect that my business logic could lead size of columns in a column
> family per a row to be greater than two gigabytes. (but i couldn't confirm
> it yet)

Are you running out of memory (java heap)? If you're running cassandra
with default options, it will be running with
-XX:+HeapDumpOnOutOfMemoryError

Have you checked the cassandra system.log for garbage collection
messages? What is in the last minute or two of logs?

--
/ Peter Schuller