incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Víctor Hugo Oliveira Molinar <vhmoli...@gmail.com>
Subject Re: Mutation dropped
Date Sat, 23 Feb 2013 17:41:13 GMT
Aaron, what did u mean with RF3 CLQuorum is more a real world scenario?
If there are only 2 nodes, where will be placed the third replica?
By increasing the CL wont it decrease the performance on w/r and then
increase the timeoutexceptions of this mentioned case?


On Fri, Feb 22, 2013 at 1:59 PM, aaron morton <aaron@thelastpickle.com>wrote:

> If you are running repair, using QUORUM, and there are not dropped writes
> you should not be getting DigestMismatch during reads.
>
> If everything else looks good, but the request latency is higher than the
> CF latency I would check that client load is evenly distributed. Then start
> looking to see if the request throughput is at it's maximum for the
> cluster.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/02/2013, at 8:15 PM, Wei Zhu <wz1975@yahoo.com> wrote:
>
> Thanks Aaron for the great information as always. I just checked
> cfhistograms and only a handful of read latency are bigger than 100ms, but
> for proxyhistograms there are 10 times more are greater than 100ms. We are
> using QUORUM  for reading with RF=3, and I understand coordinator needs to
> get the digest from other nodes and read repair on the miss match etc. But
> is it normal to see the latency from proxyhistograms to go beyond 100ms? Is
> there anyway to improve that?
> We are tracking the metrics from Client side and we see the 95th
> percentile response time averages at 40ms which is a bit high. Our 50th
> percentile was great under 3ms.
>
> Any suggestion is very much appreciated.
>
> Thanks.
> -Wei
>
> ----- Original Message -----
> From: "aaron morton" <aaron@thelastpickle.com>
> To: "Cassandra User" <user@cassandra.apache.org>
> Sent: Thursday, February 21, 2013 9:20:49 AM
> Subject: Re: Mutation dropped
>
> What does rpc_timeout control? Only the reads/writes?
>
> Yes.
>
> like data stream,
>
> streaming_socket_timeout_in_ms in the yaml
>
> merkle tree request?
>
> Either no time out or a number of days, cannot remember which right now.
>
> What is the side effect if it's set to a really small number, say 20ms?
>
> You will probably get a lot more requests that fail with a
> TimedOutException.
>
> rpc_timeout needs to be longer than the time it takes a node to process
> the message, and the time it takes the coordinator to do it's thing. You
> can look at cfhistograms and proxyhistograms to get a better idea of how
> long a request takes in your system.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21/02/2013, at 6:56 AM, Wei Zhu <wz1975@yahoo.com> wrote:
>
> What does rpc_timeout control? Only the reads/writes? How about other
> inter-node communication, like data stream, merkle tree request?  What is
> the reasonable value for roc_timeout? The default value of 10 seconds are
> way too long. What is the side effect if it's set to a really small number,
> say 20ms?
>
> Thanks.
> -Wei
>
> From: aaron morton <aaron@thelastpickle.com>
> To: user@cassandra.apache.org
> Sent: Tuesday, February 19, 2013 7:32 PM
> Subject: Re: Mutation dropped
>
> Does the rpc_timeout not control the client timeout ?
>
> No it is how long a node will wait for a response from other nodes before
> raising a TimedOutException if less than CL nodes have responded.
> Set the client side socket timeout using your preferred client.
>
> Is there any param which is configurable to control the replication
> timeout between nodes ?
>
> There is no such thing.
> rpc_timeout is roughly like that, but it's not right to think about it
> that way.
> i.e. if a message to a replica times out and CL nodes have already
> responded then we are happy to call the request complete.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/02/2013, at 1:48 AM, Kanwar Sangha <kanwar@mavenir.com> wrote:
>
> Thanks Aaron.
>
> Does the rpc_timeout not control the client timeout ? Is there any param
> which is configurable to control the replication timeout between nodes ? Or
> the same param is used to control that since the other node is also like a
> client ?
>
>
>
> From: aaron morton [mailto:aaron@thelastpickle.com]
> Sent: 17 February 2013 11:26
> To: user@cassandra.apache.org
> Subject: Re: Mutation dropped
>
> You are hitting the maximum throughput on the cluster.
>
> The messages are dropped because the node fails to start processing them
> before rpc_timeout.
>
> However the request is still a success because the client requested CL was
> achieved.
>
> Testing with RF 2 and CL 1 really just tests the disks on one local
> machine. Both nodes replicate each row, and writes are sent to each
> replica, so the only thing the client is waiting on is the local node to
> write to it's commit log.
>
> Testing with (and running in prod) RF3 and CL QUROUM is a more real world
> scenario.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/02/2013, at 9:42 AM, Kanwar Sangha <kanwar@mavenir.com> wrote:
>
>
> Hi – Is there a parameter which can be tuned to prevent the mutations from
> being dropped ? Is this logic correct ?
>
> Node A and B with RF=2, CL =1. Load balanced between the two.
>
> --  Address           Load       Tokens  Owns (effective)  Host ID
>                               Rack
> UN  10.x.x.x       746.78 GB  256     100.0%
>            dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
> UN  10.x.x.x       880.77 GB  256     100.0%
>            95d59054-be99-455f-90d1-f43981d3d778  rack1
>
> Once we hit a very high TPS (around 50k/sec of inserts), the nodes start
> falling behind and we see the mutation dropped messages. But there are no
> failures on the client. Does that mean other node is not able to persist
> the replicated data ? Is there some timeout associated with replicated data
> persistence ?
>
> Thanks,
> Kanwar
>
>
>
>
>
>
>
> From: Kanwar Sangha [mailto:kanwar@mavenir.com]
> Sent: 14 February 2013 09:08
> To: user@cassandra.apache.org
> Subject: Mutation dropped
>
> Hi – I am doing a load test using YCSB across 2 nodes in a cluster and
> seeing a lot of mutation dropped messages.  I understand that this is due
> to the replica not being written to the
> other node ? RF = 2, CL =1.
>
> From the wiki -
> For MUTATION messages this means that the mutation was not applied to all
> replicas it was sent to. The inconsistency will be repaired by Read Repair
> or Anti Entropy Repair
>
> Thanks,
> Kanwar
>
>
>
>
>
>
>
>

Mime
View raw message