Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of karthik.nar@gmail.com
 designates 209.85.160.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAM6vzruEtmgK2vPNjiDJNq_PwXgc4g-EZH3czQWh5Ds9Yi6xag@mail.gmail.com>
References: 
 <CAM6vzrswHOH8VACB_OM_XOZZCCs6iqtD63cY-us8B+u2Qya0RQ@mail.gmail.com>
	<CAOT3TWrKN56dFRs6T0EoaHQTnr5O=xWu78a_1ipWeVmBT3d9ow@mail.gmail.com>
	<CAM6vzrtTHGfBh4BRkDGSwyPAF1B_QoPLsJCZM45g91vk1jQA0w@mail.gmail.com>
	<CAOT3TWqj5yua31auwNCOsBW63vCXpM2XNRMwnC9=S_=Ofg0FjQ@mail.gmail.com>
	<CAM6vzruEtmgK2vPNjiDJNq_PwXgc4g-EZH3czQWh5Ds9Yi6xag@mail.gmail.com>
Date: Tue, 26 Jun 2012 10:20:09 -0700
Message-ID: 
 <CAM6vzrt4fh11hkJAnBC=VWVFuvFUaCoFfeysrXZt0SOXnKD6oQ@mail.gmail.com>
Subject: Re: Multi datacenter, WAN hiccups and replication
From: Karthik N <karthik.nar@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

I re-read my last post and didn't think I had done a good job articulating.

Sorry! I'll try again...

Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra
stores three copies in my local datacenter. Therefore the cost
associated with "losing" one node is not very high locally, and I
usually disable HH, and use read repair/nodetool repair instead.

However over the WAN, network blips are quite normal and HH really
helps. More so because for WAN replication Cassandra sends only one
copy to a coordinator in the remote datacenter, and it's rather vital
for that copy to make it over to keep the two datacenters in sync.

Therefore I was wondering if Cassandra already intelligently special cases
for HH-over-WAN (since this is common) even if HH is disabled or alternately
if there's a way to enable HH for WAN replication only while disabling it for
the LOCAL_QUORUM?

Thank you.
Thanks, Karthik


On Tue, Jun 26, 2012 at 10:14 AM, Karthik N <karthik.nar@gmail.com> wrote:
> Let me attempt to articulate my question a little better.
>
> Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra
> stores three copies in my local datacenter. Therefore the cost
> associated with "losing" one node is not very high locally, and I
> usually HH, and use read repair/nodetool repair instead.
>
> However over the WAN network blips are quite normal and HH really
> helps. More so because for WAN replication Cassandra sends only one
> copy to a coordinator in the remote datacenter.
>
> Therefore I was wondering if Cassandra already intelligently optimizes
> for HH-over-WAN (since this is common) or alternately if there's a way
> to enable HH for WAN replication?
>
> Thank you.
>
> On Tue, Jun 26, 2012 at 9:22 AM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
>>
>>
>> On Tue, Jun 26, 2012 at 8:16 AM, Karthik N <karthik.nar@gmail.com> wrote:
>>>
>>> Since Cassandra optimizes and sends only one copy over the WAN, can I opt
>>> in only for HH for WAN replication and avoid HH for the local quorum? (since
>>> I know I have more copies)
>>>
>>>
>>
>> I am not sure if I understand your question. In general I don't think you
>> can selectively decide on HH. Besides HH should only be used when the outage
>> is in mts, for longer outages using HH would only create memory pressure.
>>>
>>> On Tuesday, June 26, 2012, Mohit Anchlia wrote:
>>>>
>>>>
>>>> On Tue, Jun 26, 2012 at 7:52 AM, Karthik N <karthik.nar@gmail.com> wrote:
>>>>>
>>>>> My Cassandra ring spans two DCs. I use local quorum with replication
>>>>> factor=3. I do a write in DC1 with local quorum. Data gets written to
>>>>> multiple nodes in DC1. For the same write to propagate to DC2 only one
>>>>> copy is sent from the coordinator node in DC1 to a coordinator node in
>>>>> DC2 for optimizing traffic over the WAN (from what I have read in the
>>>>> Cassandra documentation)
>>>>>
>>>>> Will a Wan hiccup result in a Hinted Handoff (HH) being created in
>>>>> DC1's coordinator for DC2 to be delivered when the Wan link is up
>>>>> again?
>>>>
>>>>
>>>> I have seen hinted handoff messages in the log files when the remote DC
>>>> is unreachable. But this mechanism is only used for a the time defined in
>>>> cassandra.yaml file.
>>>
>>>
>>>
>>> --
>>> Thanks, Karthik
>>
>>