zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bangert <...@groovie.org>
Subject Re: Zookeeper protocol weirdness and pure Python kazoo client
Date Fri, 31 Aug 2012 04:00:15 GMT
On Aug 30, 2012, at 5:10 PM, Henry Robinson <henry@cloudera.com> wrote:

> FWIW, this is my only reservation about a pure Python client - there isn't
> a spec, and three separate implementations that might have subtly different
> behaviours can be a nightmare to maintain. Ben - if you're able to turn any
> of your efforts towards documenting your observations about how the
> protocol actually works, that would be awesome.

Given that most ppl don't use the client directly, they program against 'helpers' such as
Curator, I'm not sure thats such a big deal. The c style of completion callbacks already wasn't
terribly useful with the fact that Python has its basic threading way, and many ppl use gevent
for an async style (or twisted, also quite different).

The C binding falls apart badly with gevent, since gevent has issues talking across the C
thread to the gevent hub. This necessitates a lot of very hacky Python code to try and bridge
it, which isn't needed when its pure Python. And trying to finangle the zookeeper logging
stream into proper Python logging was another set of hacks.

Right now, the code for the protocol handling looks very close to the Java client (Alan Cabrera
did some great work on pookeeper which I refactored into kazoo). I don't see any reason to
deviate too heavily, though of course there are some things that are nicer to implement for
a more Pythonic feel. Keeping the 'feel' from a user perspective similar to the existing Zookeeper
Programmers Guide is best just to avoid having to replicate docs.

On the happy note, Kazoo actually has a good amount of docs:
- http://kazoo.readthedocs.org/

Unfortunately the C lib only has doc strings, which make the Java API docs look like a documentation
heaven in comparison. And you can't even get to the C API docs online... I finally got tired
of digging them out of the source and using doxygen so I copied them up to my own server here:

Some notes on the implementation...

Pure Python Zookeeper implementation:
- Approx. 280 lines of code for the socket response/request handling
- Approx. 250 lines of code for the request/response serialization/deserialization
- Anyone that knows Python reasonably well can trouble-shoot and contribute
- Can be used in gevent, Pypy, Jython (Jython doesn't even have a GIL!)
- Worst case... an exception bubbles up that you might have to catch

zkpython (Python binding to C lib):
- Approx. 1500 lines of C (Not including the C lib itself, which is another ~ 8000 lines of
- Anyone that knows and wants to read the C library *and* Python *and* the Python C binding
tricks can trouble-shoot and contribute. And maybe the patches will actually be accepted and
incorporated at some point...
- Only usable with CPython
- Worst case... Python segfaults

I really don't know many (hardly any) Python developers that know C well enough to debug it
or dive into it. If their only Zookeeper experience is marred by bugs in the 'black box' of
C, they'll move on to something else. Which saddens me cause I think Zookeeper is pretty awesome.

It took a week for us to figure out why our test suite failed on rare occasions. This wasn't
helped by the fact that Zookeeper doesn't tell you if you supply a bad session id/password
you don't get what you do in every app known to mankind (bad password or username).... it
tells you SESSION_EXPIRED. Which is insanely confusing when you see your other client using
that id/password happily connected still. We had to debug the Java server, use gdb and such
to debug two C libs, etc. I really really don't want to ever repeat that experience, it was
that bad. :)

On a side-note, why on Earth does Zookeeper not give you an AUTH_FAILED when you fail the
auth for the session ID/password on connect?

I'd be happy to document and post more implementation details I've found about the actual
protocol. I think it makes sense that powerful dynamic languages implement the protocol directly
in a manner thats documented by the Zookeeper project rather than being crippled by using
the C lib, and suffering segfaults as a result. Already for kazoo, I've been posting implementation
details about how kazoo handles the C lib and bridging it to gevent/threads to help avoid
common errors:

I can update that for protocol details, though it'd prolly be more useful to have a page on
the Zookeeper site itself that discusses and documents the protocol and how it should be implemented
for consistency.

> And as regards the unapplied Python patches - that's my bad, I should be
> committing them much more often. Can you give me a list of those you've
> found useful, and in return for your excellent work I'll get them committed
> as soon as I can?

Well, we've been maintaining a static zookeeper python library here:

We've been adding critical patches to it as we've found them on Jira and in our own tests.
Each Jira bug ticket is linked to on there. Several of those are patched in the custom ubuntu
compiled distro of the python-zookeeper bindings as well.

But obviously at some point it becomes futile. We'd like to use the read-only feature, but
there's no hope of that getting into the Python binding since its still not in the C lib:

There's been patches for that since 2010... and still its not resolved. That's pretty discouraging,
and given the lack of online generated C docs there's definitely a "we don't care much about
the C lib" message being broadcast. It's very obvious the Java client is what gets the support.
Searching Jira for 'zkpython' and seeing the various unresolved memory leaks and segfault
issues is also sad. Kazoo is already getting use at several companies, and we all want this
thing to be solid, to not seg-fault our Python, and to be able to easily trouble-shoot it
without going through C gymnastics. :)

View raw message