Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=JP5uxy/CKW94U66P0dC6s+xX4S4qJbDBCK+T7ucDUJt5IWgNZE8IkPgo/hPk5aD1jZJgZobEaWP9U7apBU+YEuRkiS76cAZN2OFkK3IbLA3QU16sdXHkimRyMy1+LL7nrEMGPqK8lUy0yKgfqx0ponqMKC6Vz9YXe/EkxDJh2ks=;
Message-ID: <1361517315.49934.GenericBBA@web160903.mail.bf1.yahoo.com>
Date: Thu, 21 Feb 2013 23:15:15 -0800 (PST)
From: Wei Zhu <wz1975@yahoo.com>
Reply-To: Wei Zhu <wz1975@yahoo.com>
Subject: Re: Mutation dropped
To: user@cassandra.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Thanks Aaron for the great information as always. I just checked cfhistogra=
ms and only a handful of read latency are bigger than 100ms, but for proxyh=
istograms there are 10 times more are greater than 100ms. We are using QUOR=
UM  for reading with RF=3D3, and I understand coordinator needs to get the =
digest from other nodes and read repair on the miss match etc. But is it no=
rmal to see the latency from proxyhistograms to go beyond 100ms? Is there a=
nyway to improve that? =0AWe are tracking the metrics from Client side and =
we see the 95th percentile response time averages at 40ms which is a bit hi=
gh. Our 50th percentile was great under 3ms. =0A=0AAny suggestion is very m=
uch appreciated.=0A=0AThanks.=0A-Wei=0A=0A----- Original Message -----=0AFr=
om: "aaron morton" <aaron@thelastpickle.com>=0ATo: "Cassandra User" <user@c=
assandra.apache.org>=0ASent: Thursday, February 21, 2013 9:20:49 AM=0ASubje=
ct: Re: Mutation dropped=0A=0A> What does rpc_timeout control? Only the rea=
ds/writes? =0AYes. =0A=0A> like data stream,=0Astreaming_socket_timeout_in_=
ms in the yaml=0A=0A> merkle tree request? =0AEither no time out or a numbe=
r of days, cannot remember which right now. =0A=0A> What is the side effect=
 if it's set to a really small number, say 20ms?=0AYou will probably get a =
lot more requests that fail with a TimedOutException. =0A=0Arpc_timeout nee=
ds to be longer than the time it takes a node to process the message, and t=
he time it takes the coordinator to do it's thing. You can look at cfhistog=
rams and proxyhistograms to get a better idea of how long a request takes i=
n your system.  =0A  =0ACheers=0A=0A-----------------=0AAaron Morton=0AFree=
lance Cassandra Developer=0ANew Zealand=0A=0A@aaronmorton=0Ahttp://www.thel=
astpickle.com=0A=0AOn 21/02/2013, at 6:56 AM, Wei Zhu <wz1975@yahoo.com> wr=
ote:=0A=0A> What does rpc_timeout control? Only the reads/writes? How about=
 other inter-node communication, like data stream, merkle tree request?  Wh=
at is the reasonable value for roc_timeout? The default value of 10 seconds=
 are way too long. What is the side effect if it's set to a really small nu=
mber, say 20ms?=0A> =0A> Thanks.=0A> -Wei=0A> =0A> From: aaron morton <aaro=
n@thelastpickle.com>=0A> To: user@cassandra.apache.org =0A> Sent: Tuesday, =
February 19, 2013 7:32 PM=0A> Subject: Re: Mutation dropped=0A> =0A>> Does =
the rpc_timeout not control the client timeout ?=0A> No it is how long a no=
de will wait for a response from other nodes before raising a TimedOutExcep=
tion if less than CL nodes have responded. =0A> Set the client side socket =
timeout using your preferred client. =0A> =0A>> Is there any param which is=
 configurable to control the replication timeout between nodes ?=0A> There =
is no such thing.=0A> rpc_timeout is roughly like that, but it's not right =
to think about it that way. =0A> i.e. if a message to a replica times out a=
nd CL nodes have already responded then we are happy to call the request co=
mplete. =0A> =0A> Cheers=0A> =0A>  =0A> -----------------=0A> Aaron Morton=
=0A> Freelance Cassandra Developer=0A> New Zealand=0A> =0A> @aaronmorton=0A=
> http://www.thelastpickle.com=0A> =0A> On 19/02/2013, at 1:48 AM, Kanwar S=
angha <kanwar@mavenir.com> wrote:=0A> =0A>> Thanks Aaron.=0A>>  =0A>> Does =
the rpc_timeout not control the client timeout ? Is there any param which i=
s configurable to control the replication timeout between nodes ? Or the sa=
me param is used to control that since the other node is also like a client=
 ?=0A>>  =0A>>  =0A>>  =0A>> From: aaron morton [mailto:aaron@thelastpickle=
.com] =0A>> Sent: 17 February 2013 11:26=0A>> To: user@cassandra.apache.org=
=0A>> Subject: Re: Mutation dropped=0A>>  =0A>> You are hitting the maximum=
 throughput on the cluster. =0A>>  =0A>> The messages are dropped because t=
he node fails to start processing them before rpc_timeout. =0A>>  =0A>> How=
ever the request is still a success because the client requested CL was ach=
ieved. =0A>>  =0A>> Testing with RF 2 and CL 1 really just tests the disks =
on one local machine. Both nodes replicate each row, and writes are sent to=
 each replica, so the only thing the client is waiting on is the local node=
 to write to it's commit log. =0A>>  =0A>> Testing with (and running in pro=
d) RF3 and CL QUROUM is a more real world scenario. =0A>>  =0A>> Cheers=0A>=
>  =0A>> -----------------=0A>> Aaron Morton=0A>> Freelance Cassandra Devel=
oper=0A>> New Zealand=0A>>  =0A>> @aaronmorton=0A>> http://www.thelastpickl=
e.com=0A>>  =0A>> On 15/02/2013, at 9:42 AM, Kanwar Sangha <kanwar@mavenir.=
com> wrote:=0A>> =0A>> =0A>> Hi =E2=80=93 Is there a parameter which can be=
 tuned to prevent the mutations from being dropped ? Is this logic correct =
?=0A>>  =0A>> Node A and B with RF=3D2, CL =3D1. Load balanced between the =
two.=0A>>  =0A>> --  Address           Load       Tokens  Owns (effective) =
 Host ID                               Rack=0A>> UN  10.x.x.x       746.78 =
GB  256     100.0%            dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1=
=0A>> UN  10.x.x.x       880.77 GB  256     100.0%            95d59054-be99=
-455f-90d1-f43981d3d778  rack1=0A>>  =0A>> Once we hit a very high TPS (aro=
und 50k/sec of inserts), the nodes start falling behind and we see the muta=
tion dropped messages. But there are no failures on the client. Does that m=
ean other node is not able to persist the replicated data ? Is there some t=
imeout associated with replicated data persistence ?=0A>>  =0A>> Thanks,=0A=
>> Kanwar=0A>>  =0A>>  =0A>>  =0A>>  =0A>>  =0A>>  =0A>>  =0A>> From: Kanwa=
r Sangha [mailto:kanwar@mavenir.com] =0A>> Sent: 14 February 2013 09:08=0A>=
> To: user@cassandra.apache.org=0A>> Subject: Mutation dropped=0A>>  =0A>> =
Hi =E2=80=93 I am doing a load test using YCSB across 2 nodes in a cluster =
and seeing a lot of mutation dropped messages.  I understand that this is d=
ue to the replica not being written to the=0A>> other node ? RF =3D 2, CL =
=3D1.=0A>>  =0A>> From the wiki -=0A>> For MUTATION messages this means tha=
t the mutation was not applied to all replicas it was sent to. The inconsis=
tency will be repaired by Read Repair or Anti Entropy Repair=0A>>  =0A>> Th=
anks,=0A>> Kanwar=0A>>  =0A> =0A> =0A> =0A=0A