cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla <rsvi...@datastax.com>
Subject Re: Multi DC informations (sync)
Date Mon, 22 Dec 2014 16:40:11 GMT
In effect you're saying "I require data centers to be consistent at write
time except when they can't". Basically you've gotten the worst of both
worlds and bad performance during healthy times and less than desired
consistency during unhealthy times.

I believe you may have some misconceptions about availability. If you're
doing fall back..you may as well just use that fall back level the point of
a consistency level is to specify what CL you need to have your app be
happy. You will in effect in practice be using LOCAL_QUORUM.

If you can consider an app "good" at LOCAL_QUORUM in a fall back scenario
you may as well _always_ use LOCAL_QUORUM and then whatever mode or
tradeoff you're making in that disconnected state will be what you want to
use anyway.

Dropped mutations and HH are as close to what you're used to when it comes
to "replication lag" but Cassandra is not Oracle RAC, there is no
background replication service copying data between data centers. There are
writes, and retries of failed writes, and at most "repairs" of inconsistent
datasets

To answer your question about Netflix as they talk about their usage in
public, I'm certain they monitor dropped mutations and HH, and a number of
other handy things like heap usage, node health, load, among a number of
other cluster health issues.

Any and all applications have to design with some idea of the tradeoff
between data center level consistency and global consistency, global
consistency is going to be more expensive, and in effect less available,
but there are CERTAINLY use cases that will not require this.

Example, I want to log click of a link in a webpage?  Does that need
each_quorum? probably not, but let's say I want to change a password? does
that require each_quorum? if it's a security issue, yes almost certainly,
should I accept that change if all datacenters aren't up? probably not,
would I want to fall back? I don't know you tell me.

May I suggest you design any application with the same thought process I
discuss above, and come to grips with monitoring your cluster for health,
and then designing your application to be behave in an expected fashion in
the same way during either  healthy times and bad.

On Mon, Dec 22, 2014 at 10:14 AM, Alain RODRIGUEZ <arodrime@gmail.com>
wrote:
>
> @Jonathan. I read my previous message. GC grace period is 10 days
> (default) not 10 sec, my bad. Repairs are run every 7 days. I should be
> fine regarding this.
>
> @Ryan
>
> Indeed I might want to use Each_Quorum with a customised fallback to
> local_quorum + alerting in case of partition (like a whole cluster down).
> Our writes are not blocking and we might use this to allow a Each_Quorum.
>
> I am going to discuss this internally and think about it.
>
> Though, I am still intrigued about how big companies like Netflix or Apple
> use and monitor their multiDC env. I can imagine that each_quorum is often
> not acceptable. In this case, I am curious to know how to make sur you're
> always synced (maybe alerting on dropped messages or HH indeed).
>
> Thanks for the information and for your patience :).
>
> See you around.
>
> 2014-12-19 20:35 GMT+01:00 Jonathan Haddad <jon@jonhaddad.com>:
>
>> Your gc grace should be longer than your repair schedule.  You're likely
>>  going to have deleted data resurface.
>>
>>
>> On Fri Dec 19 2014 at 8:31:13 AM Alain RODRIGUEZ <arodrime@gmail.com>
>> wrote:
>>
>>> All that you said match the idea I had of how it works except this part:
>>>
>>> "The request blocks however until all CL is satisfied" --> Does this
>>> mean that the client will see an error if the local DC write the data
>>> correctly (i.e. CL reached) but the remote DC fails ? This is not the idea
>>> I had of something asynchronous...
>>>
>>> If it doesn't fail on client side (real asynchronous), is there a way to
>>> make sure remote DC has indeed received the information ? I mean if the
>>> throughput cross regions is to small, the write will fail and so will the
>>> HH, potentially. How to detect we are lacking of throughput cross DC for
>>> example ?
>>>
>>> Repairs are indeed a good thing (we run them as a weekly routine, GC
>>> grace period 10 sec), but having inconsistency for a week without knowing
>>> it is quite an issue.
>>>
>>> Thanks for this detailed information Ryan, I hope I am clear enough
>>> while expressing my doubts.
>>>
>>> C*heers
>>>
>>> Alain
>>>
>>> 2014-12-19 15:43 GMT+01:00 Ryan Svihla <rsvihla@datastax.com>:
>>>>
>>>> More accurately,the write path of Cassandra in a multi dc sense is
>>>> kinda like the following
>>>>
>>>> 1. write goes to a node which acts as coordinator
>>>> 2. writes go out to all replicas in that DC, and then one write per
>>>> remote DC goes out to another node which takes responsibility for writing
>>>> to all replicas in it's data center. The request blocks however until all
>>>> CL is satisfied.
>>>> 3. if any of these writes fail by default a hinted handoff is
>>>> generated..
>>>>
>>>> So as you can see..there is effectively not "lag" beyond either raw
>>>> network latency+node speed and/or just failed writes and waiting on hint
>>>> replay to occur. Likewise repairs can be used to make the data centers back
>>>> in sync, and in the case of substantial outages you will need repairs to
>>>> bring you back in sync, you're running repairs already right?
>>>>
>>>> Think of Cassandra as a global write, and not a message queue, and
>>>> you've got the basic idea.
>>>>
>>>>
>>>> On Fri, Dec 19, 2014 at 7:54 AM, Alain RODRIGUEZ <arodrime@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jens, thanks for your insight.
>>>>>
>>>>> Replication lag in Cassandra terms is probably “Hinted handoff” -->
>>>>> Well I think hinted handoff are only used when a node is down, and are
not
>>>>> even mandatory enabled. I guess that cross DC async replication is
>>>>> something else, taht has nothing to see with hinted handoff, am I wrong
?
>>>>>
>>>>> `nodetool status` is your friend. It will tell you whether the cluster
>>>>> considers other nodes reachable or not. Run it on a node in the datacenter
>>>>> that you’d like to test connectivity from. --> Connectivity ≠
write success
>>>>>
>>>>> Basically the two question can be changed this way:
>>>>>
>>>>> 1 - How to monitor the async cross dc write latency ?
>>>>> 2 - What error should I look for when async write fails (if any) ? Or
>>>>> is there any other way to see that network throughput (for example) is
too
>>>>> small for a given traffic.
>>>>>
>>>>> Hope this is clearer.
>>>>>
>>>>> C*heers,
>>>>>
>>>>> Alain
>>>>>
>>>>> 2014-12-19 11:44 GMT+01:00 Jens Rantil <jens.rantil@tink.se>:
>>>>>>
>>>>>> Alain,
>>>>>>
>>>>>> AFAIK, the DC replication is not linearizable. That is, writes are
>>>>>> are not replicated according to a binlog or similar like MySQL. They
are
>>>>>> replicated concurrently.
>>>>>>
>>>>>> To answer you questions:
>>>>>> 1 - Replication lag in Cassandra terms is probably “Hinted handoff”.
>>>>>> You’d want to check the status of that.
>>>>>> 2 - `nodetool status` is your friend. It will tell you whether the
>>>>>> cluster considers other nodes reachable or not. Run it on a node
in the
>>>>>> datacenter that you’d like to test connectivity from.
>>>>>>
>>>>>> Cheers,
>>>>>> Jens
>>>>>>
>>>>>> ——— Jens Rantil Backend engineer Tink AB Email: jens.rantil@tink.se
>>>>>> Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 11:16 AM, Alain RODRIGUEZ <arodrime@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> We expanded our cluster to a multiple DC configuration.
>>>>>>>
>>>>>>> Now I am wondering if there is any way to know:
>>>>>>>
>>>>>>> 1 - The replication lag between these 2 DC (Opscenter, nodetool,
>>>>>>> other ?)
>>>>>>> 2 - Make sure that sync is ok at any time
>>>>>>>
>>>>>>> I guess big companies running Cassandra are interested in these
kind
>>>>>>> of info, so I think something exist but I am not aware of it.
>>>>>>>
>>>>>>> Any other important information or advice you can give me about
best
>>>>>>> practices or tricks while running a multi DC (cross regions US
<-> EU) is
>>>>>>> welcome of course !
>>>>>>>
>>>>>>> cheers,
>>>>>>>
>>>>>>> Alain
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Ryan Svihla
>>>>
>>>> Solution Architect
>>>>
>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>
>>>> DataStax is the fastest, most scalable distributed database technology,
>>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>>> size. With more than 500 customers in 45 countries, DataStax is the
>>>> database technology and transactional backbone of choice for the worlds
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>>
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Mime
View raw message