hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: ConnectionLoss (node too big?)
Date Wed, 03 Jun 2009 17:09:48 GMT
wrt bandwidth the issue there is when you do a write you end up copying 
the data btw servers in the quorum:

1) client setdata("largedata") -> follower ZK server (copy data)
2) follower ZK server forwards the proposal to the ZK server leader 
(copy data)
3) ZK server leader does atomic broadcast to all followers - ie sends
individual copies of the data to all the followers (copy * (x-1 servers))
4) majority of followers ack, leader commits, follower responds to 
client, done

Again, if you have a handful of nodes it's not a big deal... but as/if 
you expand your use you end up with a potential issue.

Of course if you care about reliability/availablity of the data then 
choice of "third party data store" is important... this really depends 
on your requirements. Perhaps storing in ZK makes sense... it really 
depends on your use case/requirements.


Eric Bowman wrote:
> Thanks for the quick reply Henry & Patrick.
> I understand the important of "small things" for a common use case point
> of view; I don't think my case is so common, but it's also not that big
> a deal to just write the data to an NFS volume and puts its path in ZK. 
> I was kind of hoping to avoid that, but I have to do that anyhow for
> other things, so this doesn't do much damage. :)
> At some point I'll spend some time understanding how this really affects
> latency in my case ... I'm keeping just a handful of things that are
> about 10M in the ensemble, so the memory footprint is no problem.  But
> the network bandwidth could be ... I'll check it out.
> Thanks,
> Eric

View raw message