Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E80E9293 for ; Tue, 3 Jul 2012 12:35:10 +0000 (UTC) Received: (qmail 43667 invoked by uid 500); 3 Jul 2012 12:35:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43506 invoked by uid 500); 3 Jul 2012 12:35:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43479 invoked by uid 99); 3 Jul 2012 12:35:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 12:35:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vasileiosvlachos@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-ee0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2012 12:35:01 +0000 Received: by eekd4 with SMTP id d4so2680368eek.31 for ; Tue, 03 Jul 2012 05:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=psCdrI4hRl9GSj0bgr031LVgXl71d7rOAbkDgi0Htdk=; b=row/aneDSi3O8829+OIY5PE95VuLbwT64fMyW+/K9ZR0OSWCA6En/KwCaqVOaAJjX6 PYBr0Z7Sv1ojF+Vv3f/j4klx7ODmSUoGixjNkwZsMTgyHFi0Vlbz9kKRD5+WUGfzaQ2Q lBYikbmCQxKHtGjg032BwvVF7oWeMIEsldgEhvGMJu2Es4J6BZtCNr/MlkgN74Kxs+0D jGwqXViiB21VMuEX+9Jp5YLIZutEhJdQVH/KD7k78+c0A9gEcO2Kr/jjQVtmhx0nqkWF mYyfu6Un1MI+nX2IGhadkj9a6LQ7MK5wGpdo7uLMdXmS0QFVbUqLzrrdBWjxVJE87zaa rJHw== MIME-Version: 1.0 Received: by 10.14.29.78 with SMTP id h54mr4150386eea.132.1341318879264; Tue, 03 Jul 2012 05:34:39 -0700 (PDT) Received: by 10.14.98.200 with HTTP; Tue, 3 Jul 2012 05:34:39 -0700 (PDT) In-Reply-To: References: Date: Tue, 3 Jul 2012 13:34:39 +0100 Message-ID: Subject: Re: Thrift version and OOM errors From: Vasileios Vlachos To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba539eec54406504c3ec2388 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba539eec54406504c3ec2388 Content-Type: text/plain; charset=ISO-8859-1 Just an update to correct something... The application hits 10.128.16.111. The last lines of Error #4 suggest that 10.128.16.110 and 10.128.16.112 where down because Cassandra service was down on 10.128.16.111 and it could not detect the cluster (I think it must be gossip related, right???). Thanks, Vasilis On Mon, Jul 2, 2012 at 10:57 PM, Vasileios Vlachos < vasileiosvlachos@gmail.com> wrote: > Hello All, > > We are using Cassandra 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 Core), > running Ubuntu 12.04. We have three nodes in the cluster and we hit only > one node from our application. Thrift version is 0.6.1 (we changed from 0.8 > because we thought there was a compatibility problem between thrift and > Cassandra ('old client' according to the output.log). We are still not sure > what version of thrift to use with Cassandra 1.0.7 (we are still getting > the same message regarding the 'old client'). I would appreciate any help > on that please. > > Below, I am sharing the errors we are getting from the output.log file. > First three errors are not responsible for the crash, only the OOM error > is, but something seems to be really wrong there... > > Error #1 > > ERROR 14:00:12,057 Thrift error occurred during processing of message. > org.apache.thrift.TException: Message length exceeded: 1970238464 > at > org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) > at > org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:102) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:112) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:121) > at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60) > at org.apache.cassandra.thrift.Mutation.read(Mutation.java:355) > at > org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18966) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Error #2 > > ERROR 14:03:48,004 Error occurred during processing of message. > java.lang.StringIndexOutOfBoundsException: String index out of range: - > 2147418111 > at java.lang.String.checkBounds(String.java:397) > at java.lang.String.(String.java:442) > at > org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:210) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Error #3 > > ERROR 14:07:24,415 Thrift error occurred during processing of message. > org.apache.thrift.protocol.TProtocolException: Missing version in > readMessageBegin, old client? > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Error #4 > > ERROR 16:07:10,168 Thrift error occurred during processing of message. > org.apache.thrift.TException: Message length exceeded: 218104076 > at > org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) > at > org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:352) > at > org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:347) > at > org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > java.lang.OutOfMemoryError: Java heap space > Dumping heap to /var/lib/cassandra/java_1341224307.hprof ... > INFO 16:07:18,882 GC for Copy: 886 ms for 1 collections, 2242700896 used; > max is 2670985216 > Java HotSpot(TM) 64-Bit Server VM warning: record is too large > Heap dump file created [4429997807 bytes in 95.755 secs] > INFO 16:08:54,749 GC for ConcurrentMarkSweep: 1157 ms for 4 collections, > 2246857528 used; max is 2670985216 > WARN 16:08:54,761 Heap is 0.8412092715978552 full. You may need to reduce > memtable and/or cache sizes. > Cassandra will now flush up to the two largest memtables to free up > memory. > Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't > want Cassandra to do this automatically > ERROR 16:08:54,761 Fatal exception in thread Thread[Thrift:446,5,main] > java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.(HashMap.java:187) > at java.util.HashMap.(HashMap.java:199) > at > org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18953) > at > org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3441) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > INFO 16:08:54,760 InetAddress /10.128.16.110 is now dead. > INFO 16:08:54,764 InetAddress /10.128.16.112 is now dead. > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > First three errors appear a lot of times before error #4, which actually > causes the crash. 10.128.16.110 is the node our application hits. Although > the log suggests that 10.128.16.112 died, it did not. We ran 'nodetool > ring' on 10.128.16.112 and only 10.128.16.110 appeared to be down. > > Proper hardware might solve some of our problems, but we need a fair > understanding before we move on. At the moment we cannot get a stable > cluster for more than 12 hours. After that, 10.128.16.110 dies and the > output.log has the same errors. > > Any help would be much appreciated. Please, let me know if you need more > information in order to figure out what is going on. > > Thank you in advance. > > -- > Kind Regards, > > Vasilis > --90e6ba539eec54406504c3ec2388 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Just an update to correct something...=A0

The application hits 10.128.16.111. The last lines of Error #4 suggest t= hat=A010.128.16.110 and 10.128.16.112 where down because Cassandra service = was down on 10.128.16.111 and it could not detect the cluster (I think it m= ust be gossip related, right???).

Thanks,

Vasilis


<= div class=3D"gmail_quote">On Mon, Jul 2, 2012 at 10:57 PM, Vasileios Vlacho= s <vasileiosvlachos@gmail.com> wrote:
Hello All,

We are us= ing Cassandra 1.0.7 on AWS on mediums (that is 3.8G RAM, 1 Core), running U= buntu 12.04. We have three nodes in the cluster and we hit only one node fr= om our application. Thrift version is 0.6.1 (we changed from 0.8 because we= thought there was a compatibility problem between thrift and Cassandra (&#= 39;old client' according to the output.log). We are still not sure what= version of thrift to use with Cassandra 1.0.7 (we are still getting the sa= me message regarding the 'old client'). I would appreciate any help= on that please.

Below, I am sharing the errors we are getting from the output.log file.= First three errors are not responsible for the crash, only the OOM error i= s, but something seems to be really wrong there...

Error #1

ERROR 14:00:12,057 Thrift error occurred during processing of message.
o= rg.apache.thrift.TException: Message length exceeded: 1970238464
= at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProt= ocol.java:393)
at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryPro= tocol.java:363)
at org.apache.thrift.protocol.TProtocolUtil.skip= (TProtocolUtil.java:102)
at org.apache.thrift.protocol.TProtoco= lUtil.skip(TProtocolUtil.java:112)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java= :112)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocol= Util.java:112)
at org.apache.thrift.protocol.TProtocolUtil.skip= (TProtocolUtil.java:121)
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java= :60)
at org.apache.cassandra.thrift.Mutation.read(Mutation.java:= 355)
at org.apache.cassandra.thrift.Cassandra$batch_mutate_args= .read(Cassandra.java:18966)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.pro= cess(Cassandra.java:3441)
at org.apache.cassandra.thrift.Cassand= ra$Processor.process(Cassandra.java:2889)
at org.apache.cassand= ra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer= .java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$W= orker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(= Thread.java:662)

Error #2

ERROR 14:03:48,004 Error occurred during processing of = message.
java.lang.StringIndexOutOfBoundsException: String index out of = range: -2147418111
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.<init>(String.java:442)
at org= .apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339= )
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin= (TBinaryProtocol.java:210)
at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassand= ra.java:2877)
at org.apache.cassandra.thrift.CustomTThreadPoolSe= rver$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java= .util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:= 886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolEx= ecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Error #3

ERROR 14:07:24,415 Thrift error occurred during processing= of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMess= ageBegin, old client?
at org.apache.thrift.protocol.TBinaryProto= col.readMessageBegin(TBinaryProtocol.java:213)
at org.apache.cas= sandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProce= ss.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent= .ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
= at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:908)
at java.lang.Thread.run(Thread.java:662)

Error #4

ER= ROR 16:07:10,168 Thrift error occurred during processing of message.
org= .apache.thrift.TException: Message length exceeded: 218104076
a= t org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtoco= l.java:393)
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinar= yProtocol.java:352)
at org.apache.thrift.protocol.TBinaryProtoco= l.readString(TBinaryProtocol.java:347)
at org.apache.cassandra.= thrift.Cassandra$batch_mutate_args.read(Cassandra.java:18958)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.pro= cess(Cassandra.java:3441)
at org.apache.cassandra.thrift.Cassand= ra$Processor.process(Cassandra.java:2889)
at org.apache.cassand= ra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer= .java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$W= orker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(= Thread.java:662)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/lib/cas= sandra/java_1341224307.hprof ...
INFO 16:07:18,882 GC for Copy: 886 ms = for 1 collections, 2242700896 used; max is 2670985216
Java HotSpot(TM) 64-Bit Server VM warning: record is too large
Heap dump file created [4429997807 bytes in 95.755 secs]
INFO 16:08:54,= 749 GC for ConcurrentMarkSweep: 1157 ms for 4 collections, 2246857528 used; ma= x is 2670985216
WARN 16:08:54,761 Heap is 0.8412092715978552 full. You may need to reduc= e memtable and/or cache sizes.
Cassandra will now flush up to the two largest memtables to free up memory.=
Adjust flush_largest_memtables_at threshold in cassandra.yaml if you d= on't want Cassandra to do this automatically
ERROR 16:08:54,761 Fat= al exception in thread Thread[Thrift:446,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap= .<init>(HashMap.java:187)
at java.util.HashMap.<init>= ;(HashMap.java:199)
at org.apache.cassandra.thrift.Cassandra$ba= tch_mutate_args.read(Cassandra.java:18953)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.pro= cess(Cassandra.java:3441)
at org.apache.cassandra.thrift.Cassand= ra$Processor.process(Cassandra.java:2889)
at org.apache.cassand= ra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer= .java:187)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$W= orker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(= Thread.java:662)
INFO 16:08:54,760 InetAddress /10.128.16.110 is now dead.
INFO 16:08:54,764 InetAddress /<= a href=3D"http://10.128.16.112" target=3D"_blank">10.128.16.112 is now = dead.
---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------

First three errors appear a lot of times before error #4, which actuall= y causes the crash. 10.128.16.110 is the node our application hits. Althoug= h the log suggests that 10.128.16.112 died, it did not. We ran 'nodetoo= l ring' on 10.128.16.112 and only 10.128.16.110 appeared to be down.
Proper hardware might solve some of our problems, but we need a fair un= derstanding before we move on. At the moment we cannot get a stable cluster= for more than 12 hours. After that,=A010.128.16.110 dies and the output.lo= g has the same errors.

Any help would be much appreciated. Please, let me know if you need mor= e information in order to figure out what is going on.

Thank you in = advance.

--
Kind = Regards,

Vasilis

--90e6ba539eec54406504c3ec2388--