incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject Truncate + Snapshot + Cannot Allocate Memory == Timeout
Date Wed, 29 Sep 2010 04:25:53 GMT
Running on the current 0.7.0 beta2 #3. 

On my dev workstation (unubtu 10.4) I accidentally let cassandra have 4GB and free memory
got down to <100 MB . I was running some tests that added a few (< 100) rows to the
CF's in Keyspace1, then Truncated the CF's for the next tests. 

 I got the following error in the cassandra system.log 

DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 (line 954) truncating
Standard1 in Keyspace1
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 (line 759) Starting a blocking
truncate operation on keyspace Keyspace1, CF 
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 (line 774) Starting to
send truncate messages to hosts [localhost/]
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 (line 50) Applying
Truncation(keyspace='Keyspace1', cf='Standard1')
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 (line 630) Truncating...
 INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 (line 459) switching
in a fresh Memtable for Standard1 at CommitLogContext(file='/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log',
 INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 (line 771) Enqueuing
flush of Memtable-Standard1@1701009673(40104 bytes, 1152 operations)
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,167 (line 780) Sent all truncate
messages, now waiting for 1 responses
 INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,167 (line 150) Writing Memtable-Standard1@1701009673(40104
bytes, 1152 operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,268 (line 157) Completed flushing
DEBUG [CompactionExecutor:1] 2010-09-29 16:44:39,268 (line 95) Checking
to see if compaction of Standard1 would be useful
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 (line 422) discard completed
log segments for CommitLogContext(file='/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log',
position=10022050), column family 1000.
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 (line 448) Marking replay
position 10022050 on commit log CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log)
ERROR [MUTATION_STAGE:25] 2010-09-29 16:44:39,341 (line 88) Fatal
exception in thread Thread[MUTATION_STAGE:25,5,main] Cannot run program "ln": error=12,
Cannot allocate memory
        at org.apache.cassandra.db.ColumnFamilyStore.snapshot(
        at org.apache.cassandradb.ColumnFamilyStore.truncate(
        at org.apache.cassandra.db.Table.truncate(
        at org.apache.cassandra.db.TruncateVerbHandler.doVerb(
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: Cannot run program "ln": error=12, Cannot
allocate memory
        at java.lang.ProcessBuilder.start(
        at org.apache.cassandra.db.ColumnFamilyStore.snapshot(
        ... 7 more
Caused by: error=12, Cannot allocate memory
        at java.langUNIXProcess.<init>(
        at java.langProcessImpl.start(
        at java.lang.ProcessBuilder.start(
        ... 10 more
DEBUG [GC inspection] 2010-09-29 16:44:39,539 (line 131) GC for ParNew: 43
ms, 29106728 reclaimed leaving 3567200784 used; max is 5498404864
DEBUG [pool-1-thread-44] 2010-09-29 16:44:49,189 (line 107) logged out: #<User
allow_all groups=[]>

Fair enough, though it appeared as a timeout on the client (with a 10 second timeout) 

  File "/tech/home//git_home/trojan/trojan/cassandra/", line 846, in truncate
  File "/tech/home//git_home/trojan/trojan/cassandra/", line 857, in recv_truncate
    (fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File "/tech/home//git_home/trojan/trojan/thrift/protocol/", line 126,
in readMessageBegin
    sz = self.readI32()
    chunk =
  File "/tech/home//git_home/trojan/trojan/thrift/transport/", line 92, in read
    buff = self.handle.recv(sz)
timeout: timed out

Here's the output of ulimit -a 

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Two questions:

1) Is the memory error just a result of me letting my machine run stupidly low on memory?

2) Should it have returned an ApplicationError or some such in this case? The code in ColumnFamilyStore:1368
is catching the IOException from the call to FileUtils.createHardLink and wrapping it in an
IOError. However the code in TruncateVerbHandler:56 is looking for the IOException. 


  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message