incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: R and N
Date Sun, 20 Feb 2011 19:25:00 GMT
My understanding..

1 read repair involves the coordinator sending a full data read to CL nodes, resolving the
differences and sending writes back. For CL one this happens after returning, for higher CL
this happens before. (my understanding of the internals of RR are a little rough though)

2 not sure

3) RR is not used in write, hinted handoff is.

4) e node responsible for the key is often the node asked for the full data of the request,
the other nodes are asked for a digest of their response. However the dynamic snitch can re-order
the nodes based on load. It's also the starting point when the partitioner is working which
nodes replicas shoud be stored on. It's not a point of failure. 

5) partitioner knows where the data was written to. http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/

Aaron

On 19/02/2011, at 6:28 AM, Anthony John <chirayithaj@gmail.com> wrote:

> K - let me state the facts first (As I see know them)
> - I do not know the inner workings, so interpret my response with that caveat. Although,
at an architectural level, one should be able to keep detailed implementation at bay
> - Quorum is (N+!)/2 where N is the Replication Factor (RF)
> - And consistency is a guarantee if R(ead) + W(rite) > RF (Which Quorum gives you,
but can be achieved via other permutations, depending on whether Read or Write performance
is desired)
> 
> No getting to your questions:- 
> 1. If Read at Q is nondeterministic, it would likely have to read the other (RF-Q) nodes
to achieve Quorum on a deterministic value. At which point - sync'ing all with writes should
not be that expensive. But at what point precisely the read is returned - do not know - you
will have to look at the code. IMO - at this level it should not matter.
> 2. Should be at the granularity of data divergence
> 3. Read Repair or Nodetool (which ever comes first)
> 4. All peer - there is no primary. There might be a connected node - but no special role/privileges
> 5. Tries to Q - returns on deterministic read. If not - see (1)
> 6. Writer supplies timestamp value - can be any value that makes sense within the scope
of data/application.
> 
> HTH,
> 
> -JA
> 
> On Fri, Feb 18, 2011 at 10:28 AM, A J <s5alye@gmail.com> wrote:
> Couple of more related questions:
> 
> 5. For reads, does Cassandra first read N nodes or just the R nodes it
> selects ? I am thinking unless it reads all the N nodes, how will it
> know which node has the latest write.
> 
> 6. Who decides the timestamp that gets inserted into the timestamp
> field of every column. I would guess the coordinator node picks up its
> system's timestamp.  If that is true, the clocks on all the nodes
> should be synchronized, right ? Otherwise conflict resolution cannot
> be done correctly.
> For a distributed system, this is not always possible. How do folks
> get around this issue ?
> 
> Thanks.
> 
> 
> 
> On Fri, Feb 18, 2011 at 10:23 AM, A J <s5alye@gmail.com> wrote:
> > Questions about R and N (and W):
> > 1. If I set R to Quorum and cassandra identifies a need for read
> > repair before returning, would the read repair happen on R nodes (I
> > mean subset of R that needs repair) or N nodes before the data is
> > delivered to the client ?
> > 2. Also does the repair happen at level of row (key) or at level of column ?
> >
> > 3. During write, if W is met but N-W is not met for some reason; would
> > cassandra try to repair N-W nodes in the background as and when it
> > can. Or the N-W are only repaired when a read is issued ?
> >
> > 4. What is the significance of the 'primary' replica for writes from
> > usage point ? Writes to primary and non-primary replicas all happen
> > simultaneously. Ensuring W is decided irrespective of it being primary
> > or not. Ensuring R is decided by any of the R nodes out of N.
> > I know the tokens are divided per the primary replica. But other than
> > that, for read and write operations, do the primary replica play any
> > special role ?
> >
> > Thanks.
> >
> 

Mime
View raw message