cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott McCarty <scott.mcca...@pana.ma>
Subject Re: Basic question on distributed delete
Date Wed, 19 Jan 2011 21:26:25 GMT
Okay, this helps.  Cassandra works as I expected in the theoretically "pure"
case (writing to the rest of the replicas in a background thread).

I asked the question because we've been struggling to understand why we're
seeing inconsistencies when we haven't had nodes go down, etc.  (However,
even though they've not gone down, I definitely can't say that there haven't
been communication hiccups that might account for some of the
inconsistencies.)

We're definitely now aware that we need to run nodetool repair periodically,
but we're seeing some apparently permanent inconsistencies when we haven't
had any known node failures, so I wanted to check on my basic understanding
of the distributed delete operation.

Thanks,
  Scott

On Wed, Jan 19, 2011 at 2:05 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > When we do a delete on a column in the above configuration, the call to
> the
> > server won't return until 2 of the 3 replicas are written to unless
> there's
> > an error.  That part is well-documented and understood.  The question I
> have
> > is whether or not the last of the 3 replica nodes gets the delete request
> > (which would result in a tombstone being added to it).  Does the code
> finish
> > up writing to the rest of the replicas in a background thread?
>
> Yes it does. But this is only strictly true in the theoretical
> unrealistic case that absolutely all nodes are absolutely always up
> and working and there are no glitches etc.
>
> > What I've read implies that the 3 replicas will get the delete request
> but
> > it's not explicit.
> > If there's no guarantee that the 3rd replica will get it (assuming that
> > there aren't node problems that would preclude it from receiving the
> delete
> > request), then that implies to me that I should follow each delete with a
> > read so a ReadRepair will pick up the discrepancy and fix it (or do a
> > nodetool repair).  Or am I missing something?
>
> No need for that. The QUORUM consistency level is about just that -
> consistency, in terms of what you are or are not guaranteed to see on
> a subsequent read that is also on QUORUM. It is not meant to affect
> the number of nodes that are to receive the data. That is always
> determined by the replication strategy, but for each individual
> request will be effectively filtered for real-life concerns (i.e., RPC
> messages timing out, nodes being marked as down, etc).
>
> > An underlying question is if anything special needs to be done when
> nothing
> > "out of the ordinary" happens to the cluster (e.g., node goes down,
> network
> > problems, etc.) in order to have eventual consistency.
>
> Unless your situation is special and you really know what you're
> doing, the critical point is that you should run 'nodetool repair'
> often enough relative to GCGraceSeconds:
>
>   http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>
> --
> / Peter Schuller
>

Mime
View raw message