Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB01BDB64 for ; Wed, 4 Jul 2012 10:55:32 +0000 (UTC) Received: (qmail 43425 invoked by uid 500); 4 Jul 2012 10:55:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43401 invoked by uid 500); 4 Jul 2012 10:55:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43379 invoked by uid 99); 4 Jul 2012 10:55:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jul 2012 10:55:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a52.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jul 2012 10:55:23 +0000 Received: from homiemail-a52.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTP id 04EB36B8163 for ; Wed, 4 Jul 2012 03:55:02 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=33Vap/WfUE Fzk9ovgq7tYbm1FsiQ8K403vF4FUUgHko5pxJ+TMl9nndn3aZJVuQSbsTvW9a439 1YQ94l55gdN/HFmNuEOh+mnYhtxZea3fuX6x/iuZBoflkCWSuKHd8N/VIk8bVjJm kEHYqck+bcR9/Q86C+lqVt116/D3+AoBs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=XdXCpBHEGClzQG+d Fb8VdxY4GlI=; b=dT3Cvy00QGz0jHOKO5GPwepn7+s0EdqMvIDBAfdfdnhDPcs3 rMBXlT/ZTNPwCyRwunP3RybZvZCqG+QPdiS3mWzaO06NgqEoIJmQh4/671s4q5pq SFsPuzsUv0Z3pQmqJ/Gk2QBjuWV+80Pra+pRloFdrE3DejygrG80ZWWUjkw= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTPSA id 7C21C6B8057 for ; Wed, 4 Jul 2012 03:54:58 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_B209DA5C-215D-4C66-B90A-40486FB870C1" Subject: Re: Thrift version and OOM errors Date: Wed, 4 Jul 2012 22:54:52 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1278) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_B209DA5C-215D-4C66-B90A-40486FB870C1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > We are using Cassandra 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 = Core), That's pretty small, try m1.xlarge.=20 > e are still not sure what version of thrift to use with Cassandra = 1.0.7 (we are still getting the same message regarding the 'old = client').=20 1.0.7 ships with thrift 0.6 What client are you using ? If you have rolled your own client try using = one of the pre-built ones to rule out errors in your code. > org.apache.thrift.TException: Message length exceeded: 1970238464 mmm 1.83 GB message size. Something is not right there.=20 > org.apache.thrift.TException: Message length exceeded: 218104076 208 MB message size which is too big (max is 16MB) followed by out of = memory.=20 Do you get these errors with a stock 1.0.X install and a pre-built = client ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/07/2012, at 9:57 AM, Vasileios Vlachos wrote: > Hello All, >=20 > We are using Cassandra 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 = Core), running Ubuntu 12.04. We have three nodes in the cluster and we = hit only one node from our application. Thrift version is 0.6.1 (we = changed from 0.8 because we thought there was a compatibility problem = between thrift and Cassandra ('old client' according to the output.log). = We are still not sure what version of thrift to use with Cassandra 1.0.7 = (we are still getting the same message regarding the 'old client'). I = would appreciate any help on that please. >=20 > Below, I am sharing the errors we are getting from the output.log = file. First three errors are not responsible for the crash, only the OOM = error is, but something seems to be really wrong there... >=20 > Error #1 >=20 > ERROR 14:00:12,057 Thrift error occurred during processing of message. > org.apache.thrift.TException: Message length exceeded: 1970238464 > at = org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol= .java:393) > at = org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java= :363) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:102) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:121) > at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60) > at org.apache.cassandra.thrift.Mutation.read(Mutation.java:355) > at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18966) > at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441) > at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9) > at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) >=20 > Error #2 >=20 > ERROR 14:03:48,004 Error occurred during processing of message. > java.lang.StringIndexOutOfBoundsException: String index out of range: = -2147418111 > at java.lang.String.checkBounds(String.java:397) > at java.lang.String.(String.java:442) > at = org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java= :339) > at = org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtoco= l.java:210) > at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:287= 7) > at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) >=20 > Error #3 >=20 > ERROR 14:07:24,415 Thrift error occurred during processing of message. > org.apache.thrift.protocol.TProtocolException: Missing version in = readMessageBegin, old client? > at = org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtoco= l.java:213) > at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:287= 7) > at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) >=20 > Error #4 >=20 > ERROR 16:07:10,168 Thrift error occurred during processing of message. > org.apache.thrift.TException: Message length exceeded: 218104076 > at = org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol= .java:393) > at = org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.= java:352) > at = org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java= :347) > at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18958) > at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441) > at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9) > at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) > java.lang.OutOfMemoryError: Java heap space > Dumping heap to /var/lib/cassandra/java_1341224307.hprof ... > INFO 16:07:18,882 GC for Copy: 886 ms for 1 collections, 2242700896 = used; max is 2670985216 > Java HotSpot(TM) 64-Bit Server VM warning: record is too large > Heap dump file created [4429997807 bytes in 95.755 secs] > INFO 16:08:54,749 GC for ConcurrentMarkSweep: 1157 ms for 4 = collections, 2246857528 used; max is 2670985216 > WARN 16:08:54,761 Heap is 0.8412092715978552 full. You may need to = reduce memtable and/or cache sizes.=20 > Cassandra will now flush up to the two largest memtables to free up = memory.=20 > Adjust flush_largest_memtables_at threshold in cassandra.yaml if you = don't want Cassandra to do this automatically > ERROR 16:08:54,761 Fatal exception in thread Thread[Thrift:446,5,main] > java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.(HashMap.java:187) > at java.util.HashMap.(HashMap.java:199) > at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18953) > at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441) > at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9) > at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) > INFO 16:08:54,760 InetAddress /10.128.16.110 is now dead. > INFO 16:08:54,764 InetAddress /10.128.16.112 is now dead. > = --------------------------------------------------------------------------= --------------------------------------------------------------------------= ----------------- >=20 > First three errors appear a lot of times before error #4, which = actually causes the crash. 10.128.16.110 is the node our application = hits. Although the log suggests that 10.128.16.112 died, it did not. We = ran 'nodetool ring' on 10.128.16.112 and only 10.128.16.110 appeared to = be down. >=20 > Proper hardware might solve some of our problems, but we need a fair = understanding before we move on. At the moment we cannot get a stable = cluster for more than 12 hours. After that, 10.128.16.110 dies and the = output.log has the same errors. >=20 > Any help would be much appreciated. Please, let me know if you need = more information in order to figure out what is going on. >=20 > Thank you in advance. >=20 > --=20 > Kind Regards, >=20 > Vasilis --Apple-Mail=_B209DA5C-215D-4C66-B90A-40486FB870C1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
We are using Cassandra = 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 = Core),
That's pretty small, try = m1.xlarge. 

e = are still not sure what version of thrift to use with Cassandra 1.0.7 = (we are still getting the same message regarding the 'old = client'). 
1.0.7 ships with thrift =  0.6
What client are you using ? If you have rolled your own = client try using one of the pre-built ones to rule out errors in your = code.

org.apache.thrift.TException: Message length exceeded: = 1970238464
mmm 1.83 GB message size. Something is = not right = there. 


org.apache.thrift.TException: Message = length exceeded: 218104076
208 MB message size = which is too big (max is 16MB) followed by out of = memory. 

Do you get these errors with a = stock 1.0.X install and a pre-built client = ?

Cheers


http://www.thelastpickle.com

On 3/07/2012, at 9:57 AM, Vasileios Vlachos = wrote:

Hello All,

We are using Cassandra = 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 Core), running Ubuntu = 12.04. We have three nodes in the cluster and we hit only one node from = our application. Thrift version is 0.6.1 (we changed from 0.8 because we = thought there was a compatibility problem between thrift and Cassandra = ('old client' according to the output.log). We are still not sure what = version of thrift to use with Cassandra 1.0.7 (we are still getting the = same message regarding the 'old client'). I would appreciate any help on = that please.

Below, I am sharing the errors we are getting from the output.log = file. First three errors are not responsible for the crash, only the OOM = error is, but something seems to be really wrong there...

Error = #1

ERROR 14:00:12,057 Thrift error occurred during processing of = message.
org.apache.thrift.TException: Message length exceeded: = 1970238464
at = org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol= .java:393)
at = org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java= :363)
at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:102)
= at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
= at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112)
= at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:121)
at = org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60)
= at = org.apache.cassandra.thrift.Mutation.read(Mutation.java:355)
= at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18966)
at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441)
at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9)
at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187)
at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
at java.lang.Thread.run(Thread.java:662)

Error #2

ERROR 14:03:48,004 Error occurred during processing = of message.
java.lang.StringIndexOutOfBoundsException: String index = out of range: -2147418111
at = java.lang.String.checkBounds(String.java:397)
at java.lang.String.<init>(String.java:442)
at = org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java= :339)
at = org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtoco= l.java:210)
at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:287= 7)
at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187)
at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
at java.lang.Thread.run(Thread.java:662)

Error = #3

ERROR 14:07:24,415 Thrift error occurred during processing of = message.
org.apache.thrift.protocol.TProtocolException: Missing version in = readMessageBegin, old client?
at = org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtoco= l.java:213)
at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:287= 7)
at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187)
at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
at java.lang.Thread.run(Thread.java:662)

Error = #4

ERROR 16:07:10,168 Thrift error occurred during processing of = message.
org.apache.thrift.TException: Message length exceeded: = 218104076
at = org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol= .java:393)
at = org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.= java:352)
at = org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java= :347)
at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18958)
at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441)
at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9)
at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187)
at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
at java.lang.Thread.run(Thread.java:662)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to = /var/lib/cassandra/java_1341224307.hprof ...
INFO 16:07:18,882 GC = for Copy: 886 ms for 1 collections, 2242700896 used; max is = 2670985216
Java HotSpot(TM) 64-Bit Server VM warning: record is too = large
Heap dump file created [4429997807 bytes in 95.755 secs]
INFO = 16:08:54,749 GC for ConcurrentMarkSweep: 1157 ms for 4 collections, = 2246857528 used; max is 2670985216
WARN 16:08:54,761 Heap is = 0.8412092715978552 full. You may need to reduce memtable and/or cache = sizes.
Cassandra will now flush up to the two largest memtables to free up = memory.
Adjust flush_largest_memtables_at threshold in = cassandra.yaml if you don't want Cassandra to do this automatically
= ERROR 16:08:54,761 Fatal exception in thread = Thread[Thrift:446,5,main]
java.lang.OutOfMemoryError: Java heap space
at = java.util.HashMap.<init>(HashMap.java:187)
at = java.util.HashMap.<init>(HashMap.java:199)
at = org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.jav= a:18953)
at = org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassa= ndra.java:3441)
at = org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:288= 9)
at = org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cust= omTThreadPoolServer.java:187)
at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
at java.lang.Thread.run(Thread.java:662)
INFO 16:08:54,760 InetAddress /10.128.16.110 is now dead.
INFO = 16:08:54,764 InetAddress /10.128.16.112 is now = dead.
-----------------------------------------------------------------= --------------------------------------------------------------------------= --------------------------

First three errors appear a lot of times before error #4, which = actually causes the crash. 10.128.16.110 is the node our application = hits. Although the log suggests that 10.128.16.112 died, it did not. We = ran 'nodetool ring' on 10.128.16.112 and only 10.128.16.110 appeared to = be down.

Proper hardware might solve some of our problems, but we need a fair = understanding before we move on. At the moment we cannot get a stable = cluster for more than 12 hours. After that, 10.128.16.110 dies and = the output.log has the same errors.

Any help would be much appreciated. Please, let me know if you need = more information in order to figure out what is going on.

Thank = you in advance.

--
Kind Regards,

Vasilis

= --Apple-Mail=_B209DA5C-215D-4C66-B90A-40486FB870C1--