cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Roth <benjamin.r...@jaumo.com>
Subject Re: How does cassandra achieve Linearizability?
Date Fri, 10 Feb 2017 10:31:44 GMT
Hi Kant,

If you read the published papers about Paxos, you will most probably
recognize that there is no way to "do it better". This is a conceptional
thing due to the nature of distributed systems + the CAP theorem.
If you want A+P in the triangle, then C is very expensive. CS is made for
A+P mostly with tunable C. In ACID databases this is a completely different
thing as they are mostly either not partition tolerant, not highly
available or not scalable (in a distributed manner, not speaking of
"monolithic super servers").

There is no free lunch ...


2017-02-10 11:09 GMT+01:00 Kant Kodali <kant@peernova.com>:

> "That’s the safety blanket everyone wants but is extremely expensive,
> especially in Cassandra."
>
> yes LWT's are expensive. Are there any plans to make this better?
>
> On Fri, Feb 10, 2017 at 12:17 AM, Kant Kodali <kant@peernova.com> wrote:
>
>> Hi Jon,
>>
>> Thanks a lot for your response. I am well aware that the LWW != LWT but I
>> was talking more in terms of LWW with respective to LWT's which I believe
>> you answered. so thanks much!
>>
>> kant
>>
>> On Thu, Feb 9, 2017 at 6:01 PM, Jon Haddad <jonathan.haddad@gmail.com>
>> wrote:
>>
>>> LWT != Last Write Wins.  They are totally different.
>>>
>>> LWTs give you (assuming you also read at SERIAL) “atomic consistency”,
>>> meaning you are able to perform operations atomically and in isolation.
>>> That’s the safety blanket everyone wants but is extremely expensive,
>>> especially in Cassandra.  The lightweight part, btw, may be a little
>>> optimistic, especially if a key is under contention.  With regard to the
>>> “last write” part you’re asking about - w/ LWT Cassandra provides the
>>> timestamp and manages it as part of the ballot, and it always is
>>> increasing.  See org.apache.cassandra.servi
>>> ce.ClientState#getTimestampForPaxos.  From the code:
>>>
>>>  * Returns a timestamp suitable for paxos given the timestamp of the
>>> last known commit (or in progress update).
>>>  * Paxos ensures that the timestamp it uses for commits respects the
>>> serial order of those commits. It does so
>>>  * by having each replica reject any proposal whose timestamp is not
>>> strictly greater than the last proposal it
>>>  * accepted. So in practice, which timestamp we use for a given proposal
>>> doesn't affect correctness but it does
>>>  * affect the chance of making progress (if we pick a timestamp lower
>>> than what has been proposed before, our
>>>  * new proposal will just get rejected).
>>>
>>> Effectively paxos removes the ability to use custom timestamps and
>>> addresses clock variance by rejecting ballots with timestamps less than
>>> what was last seen.  You can learn more by reading through the other
>>> comments and code in that file.
>>>
>>> Last write wins is a free for all that guarantees you *nothing* except
>>> the timestamp is used as a tiebreaker.  Here we acknowledge things like the
>>> speed of light as being a real problem that isn’t going away anytime soon.
>>> This problem is sometimes addressed with event sourcing rather than
>>> mutating in place.
>>>
>>> Hope this helps.
>>>
>>> Jon
>>>
>>>
>>> On Feb 9, 2017, at 5:21 PM, Kant Kodali <kant@peernova.com> wrote:
>>>
>>> @Justin I read this article http://www.datastax.com/dev/bl
>>> og/lightweight-transactions-in-cassandra-2-0. And it clearly says
>>> Linearizable consistency can be achieved with LWT's.  so should I assume
>>> the Linearizability in the context of the above article is possible
>>> with LWT's and synchronization of clocks through ntpd ? because LWT's also
>>> follow Last Write Wins. isn't it? Also another question does most of the
>>> production clusters do setup ntpd? If so what is the time it takes to sync?
>>> any idea
>>>
>>> @Micheal Schuler Are you referring to  something like true time as in
>>> https://static.googleusercontent.com/media/research.google.c
>>> om/en//archive/spanner-osdi2012.pdf?  Actually I never heard of setting
>>> up GPS modules and how that can be helpful. Let me research on that but
>>> good point.
>>>
>>> On Thu, Feb 9, 2017 at 5:09 PM, Michael Shuler <michael@pbandjelly.org>
>>> wrote:
>>>
>>>> If you require the best precision you can get, setting up a pair of
>>>> stratum 1 ntpd masters in each data center location with a GPS modules
>>>> is not terribly complex. Low latency and jitter on servers you manage.
>>>> 140ms is a long way away network-wise, and I would suggest that was a
>>>> poor choice of upstream (probably stratum 2 or 3) source.
>>>>
>>>> As Jonathan mentioned, there's no guarantee from Cassandra, but if you
>>>> need as close as you can get, you'll probably need to do it yourself.
>>>>
>>>> (I run several stratum 2 ntpd servers for pool.ntp.org)
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>> On 02/09/2017 06:47 PM, Kant Kodali wrote:
>>>> > Hi Justin,
>>>> >
>>>> > There are bunch of issues w.r.t to synchronization of clocks when we
>>>> > used ntpd. Also the time it took to sync the clocks was approx 140ms
>>>> > (don't quote me on it though because it is reported by our devops :)
>>>> >
>>>> > we have multiple clients (for example bunch of micro services are
>>>> > reading from Cassandra) I am not sure how one can achieve
>>>> > Linearizability by setting timestamps on the clients ? since there is
>>>> no
>>>> > total ordering across multiple clients.
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> > On Thu, Feb 9, 2017 at 4:16 PM, Justin Cameron <
>>>> justin@instaclustr.com
>>>> > <mailto:justin@instaclustr.com>> wrote:
>>>> >
>>>> >     Hi Kant,
>>>> >
>>>> >     Clock synchronization is important - you should ensure that ntpd
>>>> is
>>>> >     properly configured on all nodes. If your particular use case is
>>>> >     especially sensitive to out-of-order mutations it is possible to
>>>> set
>>>> >     timestamps on the client side using the
>>>> >     drivers. https://docs.datastax.com/en/d
>>>> eveloper/java-driver/3.1/manual/query_timestamps/
>>>> >     <https://docs.datastax.com/en/developer/java-driver/3.1/man
>>>> ual/query_timestamps/>
>>>> >
>>>> >     We use our own NTP cluster to reduce clock drift as much as
>>>> >     possible, but public NTP servers are good enough for most
>>>> >     uses. https://www.instaclustr.com/bl
>>>> og/2015/11/05/apache-cassandra-synchronization/
>>>> >     <https://www.instaclustr.com/blog/2015/11/05/apache-cassand
>>>> ra-synchronization/>
>>>> >
>>>> >     Cheers,
>>>> >     Justin
>>>> >
>>>> >     On Thu, 9 Feb 2017 at 16:09 Kant Kodali <kant@peernova.com
>>>> >     <mailto:kant@peernova.com>> wrote:
>>>> >
>>>> >         How does Cassandra achieve Linearizability with “Last write
>>>> >         wins” (conflict resolution methods based on time-of-day
>>>> clocks) ?
>>>> >
>>>> >         Relying on synchronized clocks are almost certainly
>>>> >         non-linearizable, because clock timestamps cannot be
>>>> guaranteed
>>>> >         to be consistent with actual event ordering due to clock skew.
>>>> >         isn't it?
>>>> >
>>>> >         Thanks!
>>>> >
>>>> >     --
>>>> >
>>>> >     Justin Cameron
>>>> >
>>>> >     Senior Software Engineer | Instaclustr
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >     This email has been sent on behalf of Instaclustr Pty Ltd
>>>> >     (Australia) and Instaclustr Inc (USA).
>>>> >
>>>> >     This email and any attachments may contain confidential and
>>>> legally
>>>> >     privileged information.  If you are not the intended recipient,
do
>>>> >     not copy or disclose its content, but please reply to this email
>>>> >     immediately and highlight the error to the sender and then
>>>> >     immediately delete the message.
>>>> >
>>>> >
>>>>
>>>>
>>>
>>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Mime
View raw message