incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Doubleday <daniel.double...@gmx.net>
Subject Re: org.apache.cassandra.service.ReadResponseResolver question
Date Wed, 15 Dec 2010 17:23:21 GMT

On Dec 14, 2010, at 9:20 PM, Jonathan Ellis wrote:

> Correct.  https://issues.apache.org/jira/browse/CASSANDRA-1830 is open to fix that. 
If you'd like to review the patch there, that would be very helpful. :)

That patch looks good to me :-) Should have checked jira first ...

Speaking of which, https://issues.apache.org/jira/browse/CASSANDRA-982 is referenced there
and seems to be pretty close to something I was trying to do last 2 days.

I'm not going to repeat the reasoning here, but its this thread: http://thread.gmane.org/gmane.comp.db.cassandra.user/10927/focus=10977

Just wanted to mention that I implemented my idea and did some functional testing and load
testing. Though certainly not enough ...

But I was able to test 
- normal read behavior (all nodes up, two nodes up)
- normal failure behavior (not enough nodes up)
- behavior when the environment changes (affecting cores by controlling latency in the ReadVerbHandler,
Timeouts in the read path, Exceptions in read path of a selected node, nodes going down during
a read)

So far it looks pretty promising. Everything worked as expected.

The only real draw back I found is when a read fails on a selected node (such as an exception).
As far as I understand it there's no way to signal the readresolvehandler to return early
in this case. Thus you have to wait for the timeout until the rest of the nodes are consulted.
But I hope that failure detection + scores should be good enough to prevent this from happening
to often.

I did some load testing and compared with vanilla cassandra. It's one of our use cases we
have in production. Its a chat app. So it writes and reads messages and offline notifications.
It's of limited use though since I was not able to reproduce our IO overload yet.

But to give a first impression: In this rather cpu bound test the patched version did ~20
- 25% more tests. Test was on 3 nodes, rf 3, quorum read / writes. reproduced many times.

I am currently working on a load test to reproduce the problem in our production environment
last week.

If someone's interested (note that this makes only sense - if at all - for quorum reads with
the dynamic snitch):

That's the patch I did to 0.6.8:  https://gist.github.com/742280

And of course I'd be glad to get feedback if someone feels that I am about to lose my job...


Thanks,

Daniel
smeet.com, Berlin


> On Tue, Dec 14, 2010 at 1:55 PM, Daniel Doubleday <daniel.doubleday@gmx.net> wrote:
> Hi
> 
> I'm sorry - don't want to be a pain in the neck with source questions. So please just
ignore me if this is stupid:
> 
> Isn't org.apache.cassandra.service.ReadResponseResolver suposed to throw a DigestMismatchException
if it receives a digest wich does not match the digest of a read message?
> 
> If messages contains multiple digest responses it will drop all but one. So if any of
the dropped digest are a mismatch to the version that mismatch is simply ignored.
> It can cope with multiple reads (versions) but not with multiple digests and that's what
it gets from quorum reads.
> 
> It might be an edge case, but I think that would break quorum promise with rf > 3
because you could have 1 broken data message, 1 broken digest message and 2 good digest messages.
If the 2 good messages were dropped than the quorum read that should have triggered repair
and conflict resolution would return old data.
> 
> I just can't see what I'm not seeing here.
> 
> Cheers,
> Daniel
> 
> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Mime
View raw message