Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C613EE360 for ; Fri, 22 Feb 2013 07:15:50 +0000 (UTC) Received: (qmail 79848 invoked by uid 500); 22 Feb 2013 07:15:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 79824 invoked by uid 500); 22 Feb 2013 07:15:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79791 invoked by uid 99); 22 Feb 2013 07:15:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 07:15:47 +0000 X-ASF-Spam-Status: No, hits=0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.139.213.149] (HELO nm28-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.149) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 07:15:38 +0000 Received: from [98.139.215.143] by nm28.bullet.mail.bf1.yahoo.com with NNFMP; 22 Feb 2013 07:15:15 -0000 Received: from [98.139.215.254] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 22 Feb 2013 07:15:15 -0000 Received: from [127.0.0.1] by omp1067.mail.bf1.yahoo.com with NNFMP; 22 Feb 2013 07:15:15 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 873035.16918.bm@omp1067.mail.bf1.yahoo.com Received: (qmail 51602 invoked by uid 60001); 22 Feb 2013 07:15:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1361517315; bh=DNyypEAh1YzXzSbu7f4YCVl3BT+AQcBsiR+kNfyKVMo=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=VppZCCE1F2BDpudi3K98jnRw6wMx6Qb+yMgnksmTg9YEeWfMU8z/Q6KEbeKaAupzpDRQBesV+ThiNxsbOCWfpR7XDXDin8CdR7x8q829avR7F0ixznv6iQiQ54zO/t1bHUT3t1Ipd1am6P00HxcpABdh7bw6vM+SgGipJzxHegs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=JP5uxy/CKW94U66P0dC6s+xX4S4qJbDBCK+T7ucDUJt5IWgNZE8IkPgo/hPk5aD1jZJgZobEaWP9U7apBU+YEuRkiS76cAZN2OFkK3IbLA3QU16sdXHkimRyMy1+LL7nrEMGPqK8lUy0yKgfqx0ponqMKC6Vz9YXe/EkxDJh2ks=; X-YMail-OSG: tKxgqmsVM1lPByMvRoMkk2Mfs1ELTyivmf.NWlq_8_jGTdv a7LORbfdsfUlg4GRdWZMyt.He.T_srE7KVSQVNTXS8L4hGTeYvjBNaO0_j8S IYKopw_I_aRfv7NROeGBC3nYwPTuPH2LwyulHZFDZ68wYXzbH8dFSlhU56Ao Gagzp5R19Im0y4YNhaVAtzK_T.1lZrbkp5lK6Oq6Wj4bT5Ekrg.YyfwZUQLW A3nB.Di.FEE1NOg8LsQGoVRJPyrBSEriAKao0wcBvEgybMVE0deedphUzGdq dS6UlqItA9NC0SF.B5pTI2kOl9PCC3k3OLof7vv.6oXffGh9JrxZeP6JLrAP hKad41brrfSHbQVZQRq2joac8Zr3nnWBmFDw4DrAT5WMJRKPbqbBJF0T_BXY sAsT.h0G8DoVRRWNUDSPv5DxoKYf.eLi7LemYCr7YCQtSY4BFqw92Q44u6Sx ftPKwsJmq_vMjbMP_WnyRuYVnir05dS4Gc6qf4V6uEuWwrM4odO5_4vmeI7H vqmOJfSReHzFZL7.9JUmOPZSIPz8YyxhqErqAeNPPx7LLDNsnUqleHHfKyBK OXzIM.MI- Received: from [208.185.20.30] by web160903.mail.bf1.yahoo.com via HTTP; Thu, 21 Feb 2013 23:15:15 PST X-Mailer: YahooMailWebService/0.8.134.513 Message-ID: <1361517315.49934.GenericBBA@web160903.mail.bf1.yahoo.com> Date: Thu, 21 Feb 2013 23:15:15 -0800 (PST) From: Wei Zhu Reply-To: Wei Zhu Subject: Re: Mutation dropped To: user@cassandra.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks Aaron for the great information as always. I just checked cfhistogra= ms and only a handful of read latency are bigger than 100ms, but for proxyh= istograms there are 10 times more are greater than 100ms. We are using QUOR= UM for reading with RF=3D3, and I understand coordinator needs to get the = digest from other nodes and read repair on the miss match etc. But is it no= rmal to see the latency from proxyhistograms to go beyond 100ms? Is there a= nyway to improve that? =0AWe are tracking the metrics from Client side and = we see the 95th percentile response time averages at 40ms which is a bit hi= gh. Our 50th percentile was great under 3ms. =0A=0AAny suggestion is very m= uch appreciated.=0A=0AThanks.=0A-Wei=0A=0A----- Original Message -----=0AFr= om: "aaron morton" =0ATo: "Cassandra User" =0ASent: Thursday, February 21, 2013 9:20:49 AM=0ASubje= ct: Re: Mutation dropped=0A=0A> What does rpc_timeout control? Only the rea= ds/writes? =0AYes. =0A=0A> like data stream,=0Astreaming_socket_timeout_in_= ms in the yaml=0A=0A> merkle tree request? =0AEither no time out or a numbe= r of days, cannot remember which right now. =0A=0A> What is the side effect= if it's set to a really small number, say 20ms?=0AYou will probably get a = lot more requests that fail with a TimedOutException. =0A=0Arpc_timeout nee= ds to be longer than the time it takes a node to process the message, and t= he time it takes the coordinator to do it's thing. You can look at cfhistog= rams and proxyhistograms to get a better idea of how long a request takes i= n your system. =0A =0ACheers=0A=0A-----------------=0AAaron Morton=0AFree= lance Cassandra Developer=0ANew Zealand=0A=0A@aaronmorton=0Ahttp://www.thel= astpickle.com=0A=0AOn 21/02/2013, at 6:56 AM, Wei Zhu wr= ote:=0A=0A> What does rpc_timeout control? Only the reads/writes? How about= other inter-node communication, like data stream, merkle tree request? Wh= at is the reasonable value for roc_timeout? The default value of 10 seconds= are way too long. What is the side effect if it's set to a really small nu= mber, say 20ms?=0A> =0A> Thanks.=0A> -Wei=0A> =0A> From: aaron morton =0A> To: user@cassandra.apache.org =0A> Sent: Tuesday, = February 19, 2013 7:32 PM=0A> Subject: Re: Mutation dropped=0A> =0A>> Does = the rpc_timeout not control the client timeout ?=0A> No it is how long a no= de will wait for a response from other nodes before raising a TimedOutExcep= tion if less than CL nodes have responded. =0A> Set the client side socket = timeout using your preferred client. =0A> =0A>> Is there any param which is= configurable to control the replication timeout between nodes ?=0A> There = is no such thing.=0A> rpc_timeout is roughly like that, but it's not right = to think about it that way. =0A> i.e. if a message to a replica times out a= nd CL nodes have already responded then we are happy to call the request co= mplete. =0A> =0A> Cheers=0A> =0A> =0A> -----------------=0A> Aaron Morton= =0A> Freelance Cassandra Developer=0A> New Zealand=0A> =0A> @aaronmorton=0A= > http://www.thelastpickle.com=0A> =0A> On 19/02/2013, at 1:48 AM, Kanwar S= angha wrote:=0A> =0A>> Thanks Aaron.=0A>> =0A>> Does = the rpc_timeout not control the client timeout ? Is there any param which i= s configurable to control the replication timeout between nodes ? Or the sa= me param is used to control that since the other node is also like a client= ?=0A>> =0A>> =0A>> =0A>> From: aaron morton [mailto:aaron@thelastpickle= .com] =0A>> Sent: 17 February 2013 11:26=0A>> To: user@cassandra.apache.org= =0A>> Subject: Re: Mutation dropped=0A>> =0A>> You are hitting the maximum= throughput on the cluster. =0A>> =0A>> The messages are dropped because t= he node fails to start processing them before rpc_timeout. =0A>> =0A>> How= ever the request is still a success because the client requested CL was ach= ieved. =0A>> =0A>> Testing with RF 2 and CL 1 really just tests the disks = on one local machine. Both nodes replicate each row, and writes are sent to= each replica, so the only thing the client is waiting on is the local node= to write to it's commit log. =0A>> =0A>> Testing with (and running in pro= d) RF3 and CL QUROUM is a more real world scenario. =0A>> =0A>> Cheers=0A>= > =0A>> -----------------=0A>> Aaron Morton=0A>> Freelance Cassandra Devel= oper=0A>> New Zealand=0A>> =0A>> @aaronmorton=0A>> http://www.thelastpickl= e.com=0A>> =0A>> On 15/02/2013, at 9:42 AM, Kanwar Sangha wrote:=0A>> =0A>> =0A>> Hi =E2=80=93 Is there a parameter which can be= tuned to prevent the mutations from being dropped ? Is this logic correct = ?=0A>> =0A>> Node A and B with RF=3D2, CL =3D1. Load balanced between the = two.=0A>> =0A>> -- Address Load Tokens Owns (effective) = Host ID Rack=0A>> UN 10.x.x.x 746.78 = GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1= =0A>> UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99= -455f-90d1-f43981d3d778 rack1=0A>> =0A>> Once we hit a very high TPS (aro= und 50k/sec of inserts), the nodes start falling behind and we see the muta= tion dropped messages. But there are no failures on the client. Does that m= ean other node is not able to persist the replicated data ? Is there some t= imeout associated with replicated data persistence ?=0A>> =0A>> Thanks,=0A= >> Kanwar=0A>> =0A>> =0A>> =0A>> =0A>> =0A>> =0A>> =0A>> From: Kanwa= r Sangha [mailto:kanwar@mavenir.com] =0A>> Sent: 14 February 2013 09:08=0A>= > To: user@cassandra.apache.org=0A>> Subject: Mutation dropped=0A>> =0A>> = Hi =E2=80=93 I am doing a load test using YCSB across 2 nodes in a cluster = and seeing a lot of mutation dropped messages. I understand that this is d= ue to the replica not being written to the=0A>> other node ? RF =3D 2, CL = =3D1.=0A>> =0A>> From the wiki -=0A>> For MUTATION messages this means tha= t the mutation was not applied to all replicas it was sent to. The inconsis= tency will be repaired by Read Repair or Anti Entropy Repair=0A>> =0A>> Th= anks,=0A>> Kanwar=0A>> =0A> =0A> =0A> =0A=0A