Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74DF8C8F2 for ; Thu, 23 Aug 2012 22:15:38 +0000 (UTC) Received: (qmail 95013 invoked by uid 500); 23 Aug 2012 22:15:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94989 invoked by uid 500); 23 Aug 2012 22:15:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94980 invoked by uid 99); 23 Aug 2012 22:15:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 22:15:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a41.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 22:15:30 +0000 Received: from homiemail-a41.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTP id 3CEF144C06F for ; Thu, 23 Aug 2012 15:15:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=Wgq/gK4OLkfRbV4+CnDevblE0t A=; b=Fy6CpuzU1sqtNWP9YS1N9S82q8XedhKRKmndDzZSVIsw3hgeTLr4tQr9qk gcmsu6z5nDbyrOtf3QV87O/eLpc4HB94rapyjHUYZn2cLCVqf4PMS9v4ZDKK8AtP TM+9qu6QtuFtFhQw/F2GIdm1Tt33cmoaWAaAQD2hk/Yc9B+kA= Received: from [192.168.2.77] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a41.g.dreamhost.com (Postfix) with ESMTPSA id 418D944C061 for ; Thu, 23 Aug 2012 15:15:07 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_9C21151F-34C2-4D39-9C46-C5ED522C7444" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: nodetool repair - when is it not needed ? Date: Fri, 24 Aug 2012 10:15:03 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1485) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_9C21151F-34C2-4D39-9C46-C5ED522C7444 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > Also when hints are replayed they are sent of as mutations, which may = still be dropped by the target if they are not serviced before = rpc_timeout. Sending nodes throttle their requests so it's unlikely but = possible.=20 My bad there. I thought the mutations were send one way.=20 When node is sending hints it waits the normal rpc_timeout. If there is = a time out hint delivery for that endpoint is aborted. It will be = re-tried the in the next HH round, which is every 10 minutes.=20 Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/08/2012, at 9:36 PM, aaron morton wrote: > HH works to a point. Specifically, it only collects hints for the = first hour the node is down and it has a safety valve to avoid the node = collecting hints getting overwhelmed. Looking at the code it takes a bit = for that the trip and you would get a TimeoutException coming back.=20 >=20 > Also when hints are replayed they are sent of as mutations, which may = still be dropped by the target if they are not serviced before = rpc_timeout. Sending nodes throttle their requests so it's unlikely but = possible.=20 >=20 > HH is is much more robust, but AFAIK repair is still _the_ way to = ensure on disk consistency.=20 >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 23/08/2012, at 6:59 AM, Rob Coli wrote: >=20 >> On Wed, Aug 22, 2012 at 8:37 AM, Senthilvel Rangaswamy >> wrote: >>> We are running Cassandra 1.1.2 on EC2. Our database is primarily all >>> counters and we don't do any >>> deletes. >>>=20 >>> Does nodetool repair do anything for such a database. All the docs I = read >>> for nodetool repair suggests >>> that nodetool repair is needed only if there is deletes. >>=20 >> Since 1.0, repair is only needed if a node crashes. If a node = crashes, >> my understanding is that a cluster-wide repair (with -pr on each = node) >> is required, because the crashed node could have lost a hint for any >> other node. >>=20 >> https://issues.apache.org/jira/browse/CASSANDRA-2034 >>=20 >> =3DRob >>=20 >> --=20 >> =3DRobert Coli >> AIM>ALK - rcoli@palominodb.com >> YAHOO - rcoli.palominob >> SKYPE - rcoli_palominodb >=20 --Apple-Mail=_9C21151F-34C2-4D39-9C46-C5ED522C7444 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
Also = when hints are replayed they are sent of as mutations, which may still = be dropped by the target if they are not serviced before rpc_timeout. = Sending nodes throttle their requests so it's unlikely but = possible. 

My bad there. I thought the mutations were send one = way. 

When node is sending hints it = waits the normal rpc_timeout. If there is a time out hint delivery for = that endpoint is aborted. It will be re-tried the in the next HH round, = which is every 10 minutes. 


http://www.thelastpickle.com

On 23/08/2012, at 9:36 PM, aaron morton <aaron@thelastpickle.com> = wrote:

HH = works to a point. Specifically, it only collects hints for the first = hour the node is down and it has a safety valve to avoid the node = collecting hints getting overwhelmed. Looking at the code it takes a bit = for that the trip and you would get a TimeoutException coming = back. 

Also when hints are replayed they are = sent of as mutations, which may still be dropped by the target if they = are not serviced before rpc_timeout. Sending nodes throttle their = requests so it's unlikely but possible. 

HH = is is much more robust, but AFAIK repair is still _the_ way to ensure on = disk = consistency. 

Cheers

http://www.thelastpickle.com

On 23/08/2012, at 6:59 AM, Rob Coli <rcoli@palominodb.com> = wrote:

On Wed, Aug 22, 2012 at 8:37 AM, Senthilvel = Rangaswamy
<senthilvel@gmail.com> = wrote:
We are running Cassandra 1.1.2 on = EC2. Our database is primarily all
counters and we don't do = any
deletes.

Does nodetool repair do anything for such a = database. All the docs I read
for nodetool repair suggests
that = nodetool repair is needed only if there is = deletes.

Since 1.0, repair is only needed if a node = crashes. If a node crashes,
my understanding is that a cluster-wide = repair (with -pr on each node)
is required, because the crashed node = could have lost a hint for any
other node.

https://issu= es.apache.org/jira/browse/CASSANDRA-2034

=3DRob

-- =
=3DRobert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - = rcoli.palominob
SKYPE - = rcoli_palominodb

=

= --Apple-Mail=_9C21151F-34C2-4D39-9C46-C5ED522C7444--