cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Riyad Kalla <rka...@gmail.com>
Subject Re: Will writes with < ALL consistency eventually propagate?
Date Mon, 07 Nov 2011 20:24:23 GMT
Ahh, I see your point.

Thanks for the help Stephen.

On Mon, Nov 7, 2011 at 12:43 PM, Stephen Connolly <
stephen.alan.connolly@gmail.com> wrote:

> at that point, your cluster will either have so much data on each node
> that you will need to split them, keeping rf=5 so you have 10 nodes... or
> the intra cluster traffic will swap you and you will split each node
> keeping rf=5 so you have 10 nodes again.
>
> safest thing is not to design with the assumption that rf=n
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
> On 7 Nov 2011 17:47, "Riyad Kalla" <rkalla@gmail.com> wrote:
>
>> Stephen,
>>
>> I appreciate you making the point more strongly; I won't make this
>> decision lightly given the stress you are putting on it, but the technical
>> aspects of this make me curious...
>>
>> If I start with RF=N (number of nodes) now, and in 2 years
>> (hypothetically) my dataset is too large and I say to myself "Dangit,
>> Stephen was right...", couldn't I just change the RF to some smaller value,
>> say "3" at that point or would the Cassandra ring not rebalance the data
>> set nicely at that point?
>>
>> More specifically, would it not know how best to slowly remove extraneous
>> copies from the nodes and make the data more sparse among the ring members?
>>
>> Thanks for the hand-holding; it is helping me understand the operational
>> landscape quickly.
>>
>> -R
>>
>> On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly <
>> stephen.alan.connolly@gmail.com> wrote:
>>
>>> Plan for the future....
>>>
>>> At some point your data set will become too big for the node that it
>>> is running on, or your load will force you to split nodes.... once you
>>> do that RF < N
>>>
>>> To solve performance issues with C* the solution is add more nodes
>>>
>>> To solve storage issues with C* the solution is add more nodes
>>>
>>> In most cases the solution in C* is add more nodes.
>>>
>>> Don't assume RF=Number of nodes as a core design decision of your
>>> application and you will not have your ass bitten
>>>
>>> ;-)
>>>
>>> -Stephen
>>> P.S. making the point more extreme to make it clear
>>>
>>> On 7 November 2011 15:04, Riyad Kalla <rkalla@gmail.com> wrote:
>>> > Stephen,
>>> > Excellent breakdown; I appreciate all the detail.
>>> > Your last comment about RF being smaller than N (number of nodes) --
>>> in my
>>> > particular case my data set isn't particularly large (a few GB) and is
>>> > distributed globally across a handful of data centers. What I am
>>> utilizing
>>> > Cassandra for is the replication in order to minimize latency for
>>> requests.
>>> > So when a request comes into any location, I want each node in the
>>> ring to
>>> > contain the full data set so it never needs to defer to another member
>>> of
>>> > the ring to answer a question (even if this means eventually
>>> consistency,
>>> > that is alright in my case).
>>> > Given that, the way I've understood this discussion so far is I would
>>> have a
>>> > RF of N (my total node count) but my Consistency Level with all my
>>> writes
>>> > will *likely* be QUORUM -- I think that is a good/safe default for me
>>> to use
>>> > as writes aren't the scenario I need to optimize for latency; that
>>> being
>>> > said, I also don't want to wait for a ConsistencyLevel of ALL to
>>> complete
>>> > before my code continues though.
>>> > Would you agree with this assessment or am I missing the boat on
>>> something?
>>> > Best,
>>> > Riyad
>>> >
>>> > On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
>>> > <stephen.alan.connolly@gmail.com> wrote:
>>> >>
>>> >> Consistency Level is a pseudo-enum...
>>> >>
>>> >> you have the choice between
>>> >>
>>> >> ONE
>>> >> Quorum (and there are different types of this)
>>> >> ALL
>>> >>
>>> >> At CL=ONE, only one node is guaranteed to have got the write if the
>>> >> operation is a success.
>>> >> At CL=ALL, all nodes that the RF says it should be stored at must
>>> >> confirm the write before the operation succeeds, but a partial write
>>> >> will succeed eventually if at least one node recorded the write
>>> >> At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
>>> >> operation to succeed, otherwise failure, but a partial write will
>>> >> succeed eventually if at least one node recorded the write.
>>> >>
>>> >> Read repair will eventually ensure that the write is replicated across
>>> >> all RF nodes in the cluster.
>>> >>
>>> >> The N in QUORUM above depends on the type of QUORUM you choose, in
>>> >> general think N=RF unless you choose a fancy QUORUM.
>>> >>
>>> >> To have a consistent read, CL of write + CL of read must be > RF...
>>> >>
>>> >> Write at ONE, read at ONE => may not get the most recent write if
RF >
>>> >> 1 [fastest write, fastest read] {data loss possible if node lost
>>> >> before read repair}
>>> >> Write at QUORUM, read at ONE => consistent read [moderate write,
>>> >> fastest read] {multiple nodes must be lost for data loss to be
>>> >> possible}
>>> >> Write at ALL, read at ONE => consistent read, writes may be blocked
if
>>> >> any node fails [slowest write, fastest read]
>>> >>
>>> >> Write at ONE, read at QUORUM => may not get the most recent write
if
>>> >> RF > 2 [fastest write, moderate read]  {data loss possible if node
>>> >> lost before read repair}
>>> >> Write at QUORUM, read at QUORUM => consistent read [moderate write,
>>> >> moderate read] {multiple nodes must be lost for data loss to be
>>> >> possible}
>>> >> Write at ALL, read at QUORUM => consistent read, writes may be blocked
>>> >> if any node fails [slowest write, moderate read]
>>> >>
>>> >> Write at ONE, read at ALL => consistent read, reads may fail if any
>>> >> node fails [fastest write, slowest read] {data loss possible if node
>>> >> lost before read repair}
>>> >> Write at QUORUM, read at ALL => consistent read, reads may fail if
any
>>> >> node fails [moderate write, slowest read] {multiple nodes must be lost
>>> >> for data loss to be possible}
>>> >> Write at ALL, read at ALL => consistent read, writes may be blocked
if
>>> >> any node fails, reads may fail if any node fails [slowest write,
>>> >> slowest read]
>>> >>
>>> >> Note: You can choose the CL for each and every operation. This is
>>> >> something that you should design into your application (unless you
>>> >> exclusively use QUORUM for all operations, in which case you are
>>> >> advised to bake the logic in, but it is less necessary)
>>> >>
>>> >> The other thing to remember is that RF does not have to equal the
>>> >> number of nodes in your cluster... in fact I would recommend designing
>>> >> your app on the basis that RF < number of nodes in your cluster...
>>> >> because at some point, when your data set grows big enough, you will
>>> >> end up with RF < number of nodes.
>>> >>
>>> >> -Stephen
>>> >>
>>> >> On 7 November 2011 13:03, Riyad Kalla <rkalla@gmail.com> wrote:
>>> >> > Ah! Ok I was interpreting what you were saying to mean that if
my
>>> RF was
>>> >> > too
>>> >> > high, then the ring would die if I lost one.
>>> >> > Ultimately what I want (I think) is:
>>> >> > Replication Factor: 5 (aka "all of my nodes")
>>> >> > Consistency Level: 2
>>> >> > Put another way, when I write a value, I want it to exist on two
>>> servers
>>> >> > *at
>>> >> > least* before I consider that write "successful" enough for my
code
>>> to
>>> >> > continue, but in the background I would like Cassandra to keep
>>> copying
>>> >> > that
>>> >> > value around at its leisure until all the ring nodes know about
it.
>>> >> > This sounds like what I need. Thanks for pointing me in the right
>>> >> > direction.
>>> >> > Best,
>>> >> > Riyad
>>> >> >
>>> >> > On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda
>>> >> > <anthony.ikeda.dev@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Riyad, I'm also just getting to know the different settings
and
>>> values
>>> >> >> myself :)
>>> >> >> I believe, and it also depends on your config, CL.ONE Should
>>> ignore the
>>> >> >> loss of a node if your RF is 5, once you increase the CL then
if
>>> you
>>> >> >> lose a
>>> >> >> node the CL is not met and you will get exceptions returned.
>>> >> >> Sent from my iPhone
>>> >> >> On 07/11/2011, at 4:32, Riyad Kalla <rkalla@gmail.com>
wrote:
>>> >> >>
>>> >> >> Anthony and Jaydeep, thank you for weighing in. I am glad to
see
>>> that
>>> >> >> they
>>> >> >> are two different values (makes more sense mentally to me).
>>> >> >> Anthony, what you said caught my attention "to ensure all nodes
>>> have a
>>> >> >> copy you may not be able to survive the loss of a single node."
--
>>> why
>>> >> >> would
>>> >> >> this be the case?
>>> >> >> I assumed (incorrectly?) that a node would simply disappear
off
>>> the map
>>> >> >> until I could bring it back up again, at which point all the
>>> missing
>>> >> >> values
>>> >> >> that it didn't get while it was done, it would slowly retrieve
from
>>> >> >> other
>>> >> >> members of the ring. Is this the wrong understanding?
>>> >> >> If forcing a replication factor equal to the number of nodes
in my
>>> ring
>>> >> >> will cause a hard-stop when one ring goes down (as I understood
>>> your
>>> >> >> comment
>>> >> >> to mean), it seems to me I should go with a much lower replication
>>> >> >> factor...
>>> >> >> something along the lines of 3 or roughly ceiling(N / 2) and
just
>>> deal
>>> >> >> with
>>> >> >> the latency when one of the nodes has to route a request to
another
>>> >> >> server
>>> >> >> when it doesn't contain the value.
>>> >> >> Is there a better way to accomplish what I want, or is keeping
the
>>> >> >> replication factor that aggressively high generally a bad thing
and
>>> >> >> using
>>> >> >> Cassandra in the "wrong" way?
>>> >> >> Thank you for the help.
>>> >> >> -Riyad
>>> >> >>
>>> >> >> On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep
>>> >> >> <chovatia_jaydeep@yahoo.co.in> wrote:
>>> >> >>>
>>> >> >>> Hi Riyad,
>>> >> >>> You can set replication = 5 (number of replicas) and write
with
>>> CL =
>>> >> >>> ONE.
>>> >> >>> There is no hard requirement from Cassandra to write with
CL=ALL
>>> to
>>> >> >>> replicate the data unless you need it. Considering your
example,
>>> If
>>> >> >>> you
>>> >> >>> write with CL=ONE then also it will replicate your data
to all 5
>>> >> >>> replicas
>>> >> >>> eventually.
>>> >> >>> Thank you,
>>> >> >>> Jaydeep
>>> >> >>> ________________________________
>>> >> >>> From: Riyad Kalla <rkalla@gmail.com>
>>> >> >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> >> >>> Sent: Sunday, 6 November 2011 9:50 PM
>>> >> >>> Subject: Will writes with < ALL consistency eventually
propagate?
>>> >> >>>
>>> >> >>> I am new to Cassandra and was curious about the following
>>> scenario...
>>> >> >>>
>>> >> >>> Lets say i have a ring of 5 servers. Ultimately I would
like each
>>> >> >>> server
>>> >> >>> to be a full replication of the next (master-master-*).
>>> >> >>>
>>> >> >>> In a presentation i watched today on Cassandra, the presenter
>>> >> >>> mentioned
>>> >> >>> that the ring members will shard data and route your requests
to
>>> the
>>> >> >>> right
>>> >> >>> host when they come in to a server that doesnt physically
contain
>>> the
>>> >> >>> value
>>> >> >>> you wanted. To the client requesting this is seamless excwpt
for
>>> the
>>> >> >>> added
>>> >> >>> latency.
>>> >> >>>
>>> >> >>> If i wanted to avoid the routing and latency and ensure
every
>>> server
>>> >> >>> had
>>> >> >>> the full data set, do i have to write with a consistency
level of
>>> ALL
>>> >> >>> and
>>> >> >>> wait for all of those writes to return in my code, or can
i write
>>> with
>>> >> >>> a CL
>>> >> >>> of 1 or 2 and let the ring propagate the rest of the copies
to the
>>> >> >>> other
>>> >> >>> servers in the background after my code has continued executing?
>>> >> >>>
>>> >> >>> I dont mind eventual consistency in my case, but i do (eventually)
>>> >> >>> want
>>> >> >>> all nodes to have all values and cannot tell if this is
default
>>> >> >>> behavior, or
>>> >> >>> if sharding is the default and i can only force duplicates
onto
>>> the
>>> >> >>> other
>>> >> >>> servers explicitly with a CL of ALL.
>>> >> >>>
>>> >> >>> Best,
>>> >> >>> Riyad
>>> >> >>>
>>> >> >>
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>

Mime
View raw message