From user-return-9537-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Sep 29 04:26:29 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 83583 invoked from network); 29 Sep 2010 04:26:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Sep 2010 04:26:29 -0000 Received: (qmail 70330 invoked by uid 500); 29 Sep 2010 04:26:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70033 invoked by uid 500); 29 Sep 2010 04:26:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70025 invoked by uid 99); 29 Sep 2010 04:26:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 04:26:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a55.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Sep 2010 04:26:16 +0000 Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id DC1F92C006A for ; Tue, 28 Sep 2010 21:25:54 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=to:from :subject:date:message-id:content-type:mime-version; q=dns; s= thelastpickle.com; b=I9jMapkcWfWqX3thfFffGyULZibDysXb9q326SWHY8m RBHYBi67P6Yb2WqcJs+QKCz+lKxI84Ei/uKi/qwNyVXsFuMJnB4KZVTcsaFeEJgt 8cH7EOd4TbAdE7z9CzLDq2DwUUlT5Ge/e6ZeB7tkCQjVNEW5EdqLOnrKUumjMat4 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=to :from:subject:date:message-id:content-type:mime-version; s= thelastpickle.com; bh=/rKlIQlQPrBe9obPD8uJ6shpuyw=; b=fmXBKEc9Il 5uYHZtczOxUymFxNSSMSBTQmJ1GthFW9sKMr7SlnWIluuNbu1JlGzNcFK8h6SJ8Y pUJD/Sl3fT5qZxAI7VNyPSgNsyysoQwN3Zjy4lpxtkhnlC+uydePoxwk6sTpa/jm s2RrbNe/RKQdKEJZkXzV7jsrVfYdkepps= Received: from localhost (webms.mac.com [17.148.16.116]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id CCCB22C0061 for ; Tue, 28 Sep 2010 21:25:54 -0700 (PDT) To: Cassandra User From: Aaron Morton Subject: Truncate + Snapshot + Cannot Allocate Memory == Timeout Date: Wed, 29 Sep 2010 04:25:53 GMT X-Mailer: MobileMe Mail (1C3203) Message-id: <53310a9c-73a1-efcb-9dd0-f96d3935d07a@me.com> Content-Type: multipart/alternative; boundary=Apple-Webmail-42--e02da8be-f009-ebc8-f7cd-835829a8da68 MIME-Version: 1.0 --Apple-Webmail-42--e02da8be-f009-ebc8-f7cd-835829a8da68 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; format=flowed Running on the current 0.7.0 beta2 #3.=A0=0A=0AOn my dev workstation (unub= tu 10.4) I=A0accidentally=A0let cassandra have 4GB and free memory got dow= n to <100 MB . I was running some tests that added a few (< 100) rows to t= he CF's in Keyspace1, then Truncated the CF's for the next tests.=A0=0A=0A= =A0I got the following error in the cassandra system.log=A0=0A=0ADEBUG [po= ol-1-thread-44] 2010-09-29 16:44:39,166 CassandraServer.java (line 954) tr= uncating Standard1 in Keyspace1=0ADEBUG [pool-1-thread-44] 2010-09-29 16:4= 4:39,166 StorageProxy.java (line 759) Starting a blocking truncate operati= on on keyspace Keyspace1, CF=A0=0ADEBUG [pool-1-thread-44] 2010-09-29 16:4= 4:39,166 StorageProxy.java (line 774) Starting to send truncate messages t= o hosts [localhost/127.0.0.1]=0ADEBUG [MUTATION_STAGE:25] 2010-09-29 16:44= :39,166 TruncateVerbHandler.java (line 50) Applying Truncation(keyspace=3D= 'Keyspace1', cf=3D'Standard1')=0ADEBUG [MUTATION_STAGE:25] 2010-09-29 16:4= 4:39,166 Table.java (line 630) Truncating...=0A=A0INFO [MUTATION_STAGE:25]= 2010-09-29 16:44:39,166 ColumnFamilyStore.java (line 459) switching in a = fresh Memtable for Standard1 at CommitLogContext(file=3D'/local1/junkbox/c= assandra/commitlog/CommitLog-1285728080403.log', position=3D10022050)=0A=A0= INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 ColumnFamilyStore.java (l= ine 771) Enqueuing flush of Memtable-Standard1@1701009673(40104 bytes, 115= 2 operations)=0ADEBUG [pool-1-thread-44] 2010-09-29 16:44:39,167 StoragePr= oxy.java (line 780) Sent all truncate messages, now waiting for 1 response= s=0A=A0INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,167 Memtable.java (l= ine 150) Writing Memtable-Standard1@1701009673(40104 bytes, 1152 operation= s)=0A=A0INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,268 Memtable.java (= line 157) Completed flushing /local1/junkbox/cassandra/data/Keyspace1/Stan= dard1-e-10-Data.db=0ADEBUG [CompactionExecutor:1] 2010-09-29 16:44:39,268 = CompactionManager.java (line 95) Checking to see if compaction of Standard= 1 would be useful=0ADEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 Comm= itLog.java (line 422) discard completed log segments for CommitLogContext(= file=3D'/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log', = position=3D10022050), column family 1000.=0ADEBUG [COMMIT-LOG-WRITER] 2010= -09-29 16:44:39,268 CommitLog.java (line 448) Marking replay position 1002= 2050 on commit log CommitLogSegment(/local1/junkbox/cassandra/commitlog/Co= mmitLog-1285728080403.log)=0AERROR [MUTATION_STAGE:25] 2010-09-29 16:44:39= ,341 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thre= ad[MUTATION_STAGE:25,5,main]=0Ajava.io.IOError: java.io.IOException: Canno= t run program "ln": java.io.IOException: error=3D12, Cannot allocate memor= y=0A=A0=A0 =A0 =A0 =A0at org.apache.cassandra.db.ColumnFamilyStore.snapsho= t(ColumnFamilyStore.java:1368)=0A=A0=A0 =A0 =A0 =A0at org.apache.cassandra= db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1511)=0A=A0=A0 =A0 =A0= =A0at org.apache.cassandra.db.Table.truncate(Table.java:633)=0A=A0=A0 =A0= =A0 =A0at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerb= Handler.java:54)=0A=A0=A0 =A0 =A0 =A0at org.apache.cassandra.net.MessageDe= liveryTask.run(MessageDeliveryTask.java:50)=0A=A0=A0 =A0 =A0 =A0at java.ut= il.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:88= 6)=0A=A0=A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.r= un(ThreadPoolExecutor.java:908)=0A=A0=A0 =A0 =A0 =A0at java.lang.Thread.ru= n(Thread.java:619)=0ACaused by: java.io.IOException: Cannot run program "l= n": java.io.IOException: error=3D12, Cannot allocate memory=0A=A0=A0 =A0 =A0= =A0at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)=0A=A0=A0 =A0= =A0 =A0at org.apache.cassandra.io.util.FileUtils.createHardLinkWithExec(F= ileUtils.java:263)=0A=A0=A0 =A0 =A0 =A0at org.apache.cassandra.io.util.Fil= eUtils.createHardLink(FileUtils.java:229)=0A=A0=A0 =A0 =A0 =A0at org.apach= e.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1360)=0A=A0= =A0 =A0 =A0 =A0... 7 more=0ACaused by: java.io.IOException: java.io.IOExce= ption: error=3D12, Cannot allocate memory=0A=A0=A0 =A0 =A0 =A0at java.lang= UNIXProcess.(UNIXProcess.java:148)=0A=A0=A0 =A0 =A0 =A0at java.lang= ProcessImpl.start(ProcessImpl.java:65)=0A=A0=A0 =A0 =A0 =A0at java.lang.P= rocessBuilder.start(ProcessBuilder.java:453)=0A=A0=A0 =A0 =A0 =A0... 10 mo= re=0ADEBUG [GC inspection] 2010-09-29 16:44:39,539 GCInspector.java (line = 131) GC for ParNew: 43 ms, 29106728 reclaimed leaving 3567200784 used; max= is 5498404864=0ADEBUG [pool-1-thread-44] 2010-09-29 16:44:49,189 ClientSt= ate.java (line 107) logged out: #=0A=0AFair en= ough, though it appeared as a timeout on the client (with a 10 second time= out)=A0=0A=0A=A0=A0File "/tech/home//git_home/trojan/trojan/cassandra/Cass= andra.py", line 846, in truncate=0A=A0=A0 =A0self.recv_truncate()=0A=A0=A0= File "/tech/home//git_home/trojan/trojan/cassandra/Cassandra.py", line 857= , in recv_truncate=0A=A0=A0 =A0(fname, mtype, rseqid) =3D self._iprot.read= MessageBegin()=0A=A0=A0File "/tech/home//git_home/trojan/trojan/thrift/pro= tocol/TBinaryProtocol.py", line 126, in readMessageBegin=0A=A0=A0 =A0sz =3D= self.readI32()=0A=0A=A0=A0 =A0chunk =3D self.read(sz-have)=0A=A0=A0= File "/tech/home//git_home/trojan/trojan/thrift/transport/TSocket.py", lin= e 92, in read=0A=A0=A0 =A0buff =3D self.handle.recv(sz)=0Atimeout: timed o= ut=0A=0AHere's the output of ulimit -a=A0=0A=0Acore file size =A0 =A0 =A0 = =A0 =A0(blocks, -c) 0=0Adata seg size =A0 =A0 =A0 =A0 =A0 (kbytes, -d) unl= imited=0Ascheduling priority =A0 =A0 =A0 =A0 =A0 =A0 (-e) 20=0Afile size =A0= =A0 =A0 =A0 =A0 =A0 =A0 (blocks, -f) unlimited=0Apending signals =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 (-i) 16382=0Amax locked memory =A0 =A0 =A0 (kbytes= , -l) 64=0Amax memory size =A0 =A0 =A0 =A0 (kbytes, -m) unlimited=0Aopen f= iles =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(-n) 65536=0Apipe size =A0= =A0 =A0 =A0 =A0 =A0(512 bytes, -p) 8=0APOSIX message queues =A0 =A0 (byte= s, -q) 819200=0Areal-time priority =A0 =A0 =A0 =A0 =A0 =A0 =A0(-r) 0=0Asta= ck size =A0 =A0 =A0 =A0 =A0 =A0 =A0(kbytes, -s) 8192=0Acpu time =A0 =A0 =A0= =A0 =A0 =A0 =A0 (seconds, -t) unlimited=0Amax user processes =A0 =A0 =A0 = =A0 =A0 =A0 =A0(-u) unlimited=0Avirtual memory =A0 =A0 =A0 =A0 =A0(kbytes,= -v) unlimited=0Afile locks =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(-x= ) unlimited=0A=0A=0ATwo questions:=0A=0A1) Is the memory error just a resu= lt of me letting my machine run stupidly low on memory?=0A=0A2) Should it = have returned an ApplicationError or some such in this case? The code in C= olumnFamilyStore:1368 is catching the IOException from the call to FileUti= ls.createHardLink and wrapping it in an IOError. However the code in Trunc= ateVerbHandler:56 is looking for the IOException.=A0=0A=0ACheers=0AAaron=0A= =0A --Apple-Webmail-42--e02da8be-f009-ebc8-f7cd-835829a8da68 Content-Type: multipart/related; type="text/html"; boundary=Apple-Webmail-86--e02da8be-f009-ebc8-f7cd-835829a8da68 --Apple-Webmail-86--e02da8be-f009-ebc8-f7cd-835829a8da68 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1;
Running on the current 0.7.0 beta2 #3. 

On my dev workstation (unubtu 10.4) I accidentally let cassandr= a have 4GB and free memory got down to <100 MB . I was running some tes= ts that added a few (< 100) rows to the CF's in Keyspace1, then Truncat= ed the CF's for the next tests. 

 I got= the following error in the cassandra system.log 

DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 Cassandra= Server.java (line 954) truncating Standard1 in Keyspace1
DEBUG [= pool-1-thread-44] 2010-09-29 16:44:39,166 StorageProxy.java (line 759) Sta= rting a blocking truncate operation on keyspace Keyspace1, CF 
<= div>DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 StorageProxy.java (li= ne 774) Starting to send truncate messages to hosts [localhost/127.0.0.1]<= /div>
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 TruncateVerbHa= ndler.java (line 50) Applying Truncation(keyspace=3D'Keyspace1', cf=3D'Sta= ndard1')
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 Table= java (line 630) Truncating...
 INFO [MUTATION_STAGE:25] 20= 10-09-29 16:44:39,166 ColumnFamilyStore.java (line 459) switching in a fre= sh Memtable for Standard1 at CommitLogContext(file=3D'/local1/junkbox/cass= andra/commitlog/CommitLog-1285728080403.log', position=3D10022050)
 INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 ColumnFamilyStor= e.java (line 771) Enqueuing flush of Memtable-Standard1@1701009673(40104 b= ytes, 1152 operations)
DEBUG [pool-1-thread-44] 2010-09-29 16:44= :39,167 StorageProxy.java (line 780) Sent all truncate messages, now waiti= ng for 1 responses
 INFO [FLUSH-WRITER-POOL:1] 2010-09-29 1= 6:44:39,167 Memtable.java (line 150) Writing Memtable-Standard1@1701009673= (40104 bytes, 1152 operations)
 INFO [FLUSH-WRITER-POOL:1] = 2010-09-29 16:44:39,268 Memtable.java (line 157) Completed flushing /local= 1/junkbox/cassandra/data/Keyspace1/Standard1-e-10-Data.db
DEBUG = [CompactionExecutor:1] 2010-09-29 16:44:39,268 CompactionManager.java (lin= e 95) Checking to see if compaction of Standard1 would be useful
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 CommitLog.java (line 42= 2) discard completed log segments for CommitLogContext(file=3D'/local1/jun= kbox/cassandra/commitlog/CommitLog-1285728080403.log', position=3D10022050= ), column family 1000.
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:4= 4:39,268 CommitLog.java (line 448) Marking replay position 10022050 on com= mit log CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-128= 5728080403.log)
ERROR [MUTATION_STAGE:25] 2010-09-29 16:44= :39,341 AbstractCassandraDaemon.java (line 88) Fatal exception in thread T= hread[MUTATION_STAGE:25,5,main]
java.io.IOError: java.io.IOExcep= tion: Cannot run program "ln": java.io.IOException: error=3D12, Cannot all= ocate memory
        at org.apache.cass= andra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1368)
        at org.apache.cassandra.db.ColumnFamilyS= tore.truncate(ColumnFamilyStore.java:1511)
     &= nbsp;  at org.apache.cassandra.db.Table.truncate(Table.java:633)
        at org.apache.cassandra.db.Truncate= VerbHandler.doVerb(TruncateVerbHandler.java:54)
   &nb= sp;    at org.apache.cassandra.net.MessageDeliveryTask.run(Messa= geDeliveryTask.java:50)
        at java= util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java= :886)
        at java.util.concurrent.T= hreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 =        at java.lang.Thread.run(Thread.java:619)
<= div>Caused by: java.io.IOException: Cannot run program "ln": java.io.IOExc= eption: error=3D12, Cannot allocate memory
     &= nbsp;  at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at org.apache.cassandra.io.util.Fi= leUtils.createHardLinkWithExec(FileUtils.java:263)
   =      at org.apache.cassandra.io.util.FileUtils.createHardLi= nk(FileUtils.java:229)
        at org.a= pache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1360)=
        ... 7 more
Caused by= : java.io.IOException: java.io.IOException: error=3D12, Cannot allocate me= mory
        at java.lang.UNIXProcess.&= lt;init>(UNIXProcess.java:148)
       &nb= sp;at java.lang.ProcessImpl.start(ProcessImpl.java:65)
 &nb= sp;      at java.lang.ProcessBuilder.start(ProcessBuilder.j= ava:453)
        ... 10 more
= DEBUG [GC inspection] 2010-09-29 16:44:39,539 GCInspector.java (line 131) = GC for ParNew: 43 ms, 29106728 reclaimed leaving 3567200784 used; max is 5= 498404864
DEBUG [pool-1-thread-44] 2010-09-29 16:44:49,189 Clien= tState.java (line 107) logged out: #<User allow_all groups=3D[]>

Fair enough, though it appeared as a timeout o= n the client (with a 10 second timeout) 

  File "/tech/home//git_home/trojan/trojan/cassandra/Cassandra= py", line 846, in truncate
    self.recv_truncat= e()
  File "/tech/home//git_home/trojan/trojan/cassand= ra/Cassandra.py", line 857, in recv_truncate
    = (fname, mtype, rseqid) =3D self._iprot.readMessageBegin()
 =  File "/tech/home//git_home/trojan/trojan/thrift/protocol/TBinaryProt= ocol.py", line 126, in readMessageBegin
    sz =3D= self.readI32()
<snip>
    chunk = =3D self.read(sz-have)
  File "/tech/home//git_home/tr= ojan/trojan/thrift/transport/TSocket.py", line 92, in read
 = ;   buff =3D self.handle.recv(sz)
timeout: timed out

Here's the output of ulimit -a 

core file size          (b= locks, -c) 0
data seg size           (k= bytes, -d) unlimited
scheduling priority       &n= bsp;     (-e) 20
file size        =       (blocks, -f) unlimited
pending signals &nb= sp;               (-i) 16382
= max locked memory       (kbytes, -l) 64
max memor= y size         (kbytes, -m) unlimited
open f= iles                    =  (-n) 65536
pipe size           &= nbsp;(512 bytes, -p) 8
POSIX message queues     (bytes= , -q) 819200
real-time priority         &nbs= p;    (-r) 0
stack size         &n= bsp;    (kbytes, -s) 8192
cpu time      = ;         (seconds, -t) unlimited
max user p= rocesses              (-u) unlimited
virtual memory          (kbytes, -v) unli= mited
file locks              = ;        (-x) unlimited


Two questions:

1) Is the memory e= rror just a result of me letting my machine run stupidly low on memory?

2) Should it have returned an ApplicationError or s= ome such in this case? The code in ColumnFamilyStore:1368 is catching the = IOException from the call to FileUtils.createHardLink and wrapping it in a= n IOError. However the code in TruncateVerbHandler:56 is looking for the I= OException. 

Cheers
Aaron

--Apple-Webmail-86--e02da8be-f009-ebc8-f7cd-835829a8da68-- --Apple-Webmail-42--e02da8be-f009-ebc8-f7cd-835829a8da68--