zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kelly <iv...@apache.org>
Subject Re: zookeeper max no of children of znode
Date Sat, 27 Jun 2015 14:37:44 GMT
Responses inline

On Sat, Jun 27, 2015 at 4:12 PM Shushant Arora <shushantarora09@gmail.com>

> when client reconnects to another node , watches get lost ? Or watches
> stays and in between update notification is lost but client will get next
> update without adding watch again.
When a client reconnects, watches are lost. They're not lost silently
though. The watcher will received a disconnected, and once reconnected, a
connected event, which allows the client to check if any changes occurred.

I just wanted to understand how hbase and kafka uses zookeeper reliably
> when it does not have strict consistency ,, Multiple clients can have
> different states of same node at a time.
The important thing to note is that zookeeper locks are advisory. Just
because zookeeper has told a client it has a lock on a resource, doesn't
mean that by the time the message with this information reaches the client
in question, that it still has the lock.

Locks have to be enforced by the underlying storage.

In the case of hbase master, the master region writes to a WAL, which is a
HDFS appendable file. The region server does an fsync on the file (which
incidently doesn't guarantee it hits disk) after any write. This fsync
involves a message to the HDFS namenode which guarantees that the WAL only
has one writer at any time. While clients may have an incorrect view of who
is the real hbase master, any attempt to write to the false master will
result in an error which tells them to check their assumptions. In a
non-broken system, they will eventually be informed of the correct master,
and thus be able to continue making progress.

For kafka it's a little more complex because they have a concept of ISR,
and if the required number of acknowledgements for the writes is less that
half the ISR + 1, then indeed writes can be lost. But assuming that the
required number of acknowledgements for a write is greater that (ISR/2) +
1, then no write will be able to succeed without hitting a majority of the
nodes in the ISR. Therefore if there are conflicting writes to a certain
sequence, then the conflicting write should overlap on at least one node,
and thus the system can resolve the conflict like that. I'm sketchy on the

In summary, it's fine for the clients to be out of sync to a degree, since
it's unavoidable until we have some sort of quantum entanglement star trek
computers. But they shouldn't be able to do any damage with out of date
information, and this is why the storage medium needs to be aware of locks.

In Hbase Master selection process, how does a node is 100 % sure that a
> master is created ? Does it has to create the /master node and that node
> already exists will thow node exists excpetion .  Since only by reading (ls
> /) . It may get stale data and gets node does not exists.but in actual
> /master was present.
Actually I possibly went into too much detail above.  Node exists
exceptions like this should be expected.

Does these applications read everything on reconnect of session.
Nope. It they don't read anything unless the client explicitly asks for it.

And also Is zookeeper not fit for high writes? since write goes via leader
> node then how kafka maintains offsets using zookeeper update operation very
> efficiently.
I think zk can do about 10k writes per second. For most kafka usecases this
should be ok. If there are a lot of subscribers you may feel a squeeze
though. Best to ask the Kafka folks about this directly. I'm sure they've
hit across the problem in LinkedIn.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message