Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72782FA6A for ; Wed, 20 Mar 2013 09:21:35 +0000 (UTC) Received: (qmail 84225 invoked by uid 500); 20 Mar 2013 09:21:33 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83763 invoked by uid 500); 20 Mar 2013 09:21:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83517 invoked by uid 99); 20 Mar 2013 09:21:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Mar 2013 09:21:31 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a94.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Mar 2013 09:21:26 +0000 Received: from homiemail-a94.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTP id 7BFCA38A06F for ; Wed, 20 Mar 2013 02:21:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=z5QZPqDKm6dTCXORF/GYxms3KLE=; b=yGe2MNtZwX YoZklnCMW27JdEu2lpAeC07RUmn/EyPxguJQE1IXYo+40EkB3qRE3RBQ+3Zl4ZeN jiODLfpEgYkDz1CsqYrG1mIP/s0N29ZOb4y+sotbL+GeFQfdcJ19Kjpr23Nmny3j GbUWwdj4DK72lcrBBknjM4PH5xjyadVF0= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a94.g.dreamhost.com (Postfix) with ESMTPSA id C700238A058 for ; Wed, 20 Mar 2013 02:21:05 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Cassandra 1.2.2 | Unexpected Connection Pool Shutdown From: aaron morton In-Reply-To: <7E82CA8F708D324E85BE0FBDBF0B7515077E3C5DFD@NYKPCMMGMB05.INTRANET.BARCAPINT.COM> Date: Wed, 20 Mar 2013 22:21:05 +1300 Content-Transfer-Encoding: quoted-printable Message-Id: <433AF08B-BC30-4A6F-B63D-52786709E9AD@thelastpickle.com> References: <7E82CA8F708D324E85BE0FBDBF0B7515077E3C5DFD@NYKPCMMGMB05.INTRANET.BARCAPINT.COM> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org > On average, this involves abandoning 20k mutations, for a total of = 14Mb of data. That's too many mutations to be practical. Each row mutation becomes a = single task in the mutation thread pool. When you send so many risk = flooding the mutation thread pool and starving other requests. Each node = has by default 32 threads to write, consider a batch size that makes = sense for the number of nodes, the number of threads and the number of = other clients making requests. I also think you are running into the max message size for a thrift = frame, have a look at thrift_framed_transport_size_in_mb and = thrift_max_message_length_in_mb in the yaml file.=20 > Should we reduce the size of the batch? Yes, yup, sure thing.=20 More is not always better.=20 > What is causing these errors, and how can we eliminate them? I would start by using a much smaller batch size.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/03/2013, at 6:49 AM, radu.manolescu@barclays.com wrote: > We have recently upgraded to C* 1.2.2 from 1.0.2, and we have started = seeing errors such as the one below. > Our app collects changes and then flushes them out to C* in a batch. > Sometimes (at high volume) we see the following error: > =20 > The log shows this error repeated for each host in the ring (total: = eight) all within the same second: > =20 > [03/19/13 10:33:37.286 ERROR] Could not flush transport (to be = expected if the pool is shutting down) in close for client: = CassandraClient (HThriftClient.java:124) = in thread "MessageStorer-thread" > org.apache.thrift.transport.TTransportException: = java.net.SocketException: Broken pipe > at = org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.ja= va:147) > at = org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:1= 56) > at = me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClie= nt.java:122) > at = me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClie= nt.java:38) > at = me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnec= tionManager.java:324) > at = me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover= (HConnectionManager.java:272) > at = me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(Execut= ingKeyspace.java:113) > at = me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) > at = com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283) > at = com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:= 233) > at = com.mycompany.some.package.DataWriter.persistFixMessages(DataWriter.java:1= 40) > at = com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151= ) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at = java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at = java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at = org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.ja= va:145) > ... 12 more > [03/19/13 10:33:37.289 ERROR] MARK HOST AS DOWN TRIGGERED for host = someHost.mycompany.com(so.me.ip.add):9160 (HConnectionManager.java:422) = in thread "MessageStorer-thread" > [03/19/13 10:33:37.289 ERROR] Pool state on shutdown: = :{someHost.mycompany.com(so.me.ip.add= ):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 5; = NumBeforeExhausted: 19 (HConnectionManager.java:426) in thread = "MessageStorer-thread" > [03/19/13 10:33:37.289 INFO ] Shutdown triggered on = :{someHost.mycompany.com(so.me.ip.add= ):9160} (ConcurrentHClientPool.java:162) in thread = "MessageStorer-thread" > [03/19/13 10:33:37.302 INFO ] Shutdown complete on = :{someHost.mycompany.com(so.me.ip.add= ):9160} (ConcurrentHClientPool.java:170) in thread = "MessageStorer-thread" > [03/19/13 10:33:37.302 INFO ] Host detected as down was added to retry = queue: someHost.mycompany.com(so.me.ip.add):9160 = (CassandraHostRetryService.java:68) in thread "MessageStorer-thread" > [03/19/13 10:33:37.302 INFO ] Client = CassandraClient released to inactive or = dead pool. Closing. (HConnectionManager.java:408) in thread = "MessageStorer-thread" > =20 > Then the application abandons writing the batch, because it cannot = write the changes (the client pool has shut down). > On average, this involves abandoning 20k mutations, for a total of = 14Mb of data. > =20 > [03/19/13 10:33:37.302 ERROR] DataWriter write failure -- count:21413 = byteSize:14155488 (DataWriter.java:286) in thread "MessageStorer-thread=94= > me.prettyprint.hector.api.exceptions.HectorTransportException: = org.apache.thrift.transport.TTransportException: = java.net.SocketException: Broken pipe > at = me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Except= ionsTranslatorImpl.java:33) > at = me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover= (HConnectionManager.java:264) > at = me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(Execut= ingKeyspace.java:113) > at = me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) > at = com.mycompany.some.package.DataWriter.handleInsert(DataWriter.java:283) > at = com.mycompany.some.package.DataWriter.writeObjectsColumns(DataWriter.java:= 233) > at = com.mycompany.some.package.DataWriter.persistMessages(DataWriter.java:140)= > at = com.mycompany.some.package.MessageStorer$Storer.run(MessageStorer.java:151= ) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.apache.thrift.transport.TTransportException: = java.net.SocketException: Broken pipe > at = org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.ja= va:147) > at = org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:1= 57) > at = org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) > at = org.apache.cassandra.thrift.Cassandra$Client.send_batch_mutate(Cassandra.j= ava:958) > at = org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:9= 49) > at = me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)= > at = me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)= > at = me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.j= ava:104) > at = me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover= (HConnectionManager.java:258) > ... 7 more > Caused by: java.net.SocketException: Broken pipe > at java.net.SocketOutputStream.socketWrite0(Native Method) > at = java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at = java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at = org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.ja= va:145) > ... 15 more > =20 > Immediately after shutting down, the pool restarts, so the application = continues writing data, but some data has been lost. > We have reduced the max size of each batch from 14.4Mb to 13.5Mb, but = we are still seeing the errors. > Should we reduce the size of the batch? > =20 > Our application is using the following JARs: > libthrift-0.7.0.jar > hector-core-1.1-2.jar > cassandra-thrift-1.2.1.jar > cassandra-javautils-0.7.1.jar > cassandra-all-1.2.0.jar > =20 > What is causing these errors, and how can we eliminate them? > =20 > Best regards > Radu Manolescu > _______________________________________________ >=20 > This message may contain information that is confidential or = privileged. If you are not an intended recipient of this message, please = delete it and any attachments, and notify the sender that you have = received it in error. Unless specifically stated in the message or = otherwise indicated, you may not duplicate, redistribute or forward this = message or any portion thereof, including any attachments, by any means = to any other person, including any retail investor or customer. This = message is not a recommendation, advice, offer or solicitation, to = buy/sell any product or service, and is not an official confirmation of = any transaction. Any opinions presented are solely those of the author = and do not necessarily represent those of Barclays. This message is = subject to terms available at:www.barclays.com/emaildisclaimer and, if = received from Barclays' Sales or Trading desk, the terms available = at:www.barclays.com/salesandtradingdisclaimer/. By messaging with = Barclays you consent to the foregoing. Barclays Bank PLC is a company = registered in England (number 1026167) with its registered office at 1 = Churchill Place, London, E14 5HP. This email may relate to or be sent = from other members of the Barclays group. >=20 > _______________________________________________ >=20