incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taylor Gautier <tgaut...@tagged.com>
Subject Re: Guide to Writing a Client for Kafka
Date Fri, 02 Dec 2011 03:22:24 GMT
One thing we should make clear somewhere is that while Kafka has a history
mechanism, it doesn't provide an index.

I probably moved forward in my implementation (and selection) to use Kafka
for 3-4 weeks before realizing that I would not be able to efficiently
query Kafka for the N-1000th message.

This was nearly a deal killer for us, but there are several available
workarounds/solutions:

   - Keep the history somewhere, outside of Kafka, e.g. in a DB, memcache,
   in memory, whatever, if you need to rewind N messages ago.  This kind of
   assumes you have clients that are always making forward progress and
   working against the Kafka stream.  If you have ephemeral clients that come
   and go, and don't have history with the stream, it doesn't work so well
   - Make a minor modification to Kafka to have it implement a reverse
   linked list - where each message also stores the offset of the previous
   message
   - Make a medium change to Kafka to have it store an index of message
   offsets in a secondary topic

We went with option #3...

On Tue, Nov 29, 2011 at 9:06 AM, David Ormsbee <dave@datadoghq.com> wrote:

> Hi Taylor,
>
> Yeah, Joe brought up the need for this distinction as well. When I
> move the doc over to the wiki, I'll try to consistently use "driver"
> to clear up ambiguities. The bits that are more higher-level client
> oriented are really just there for context, to explain why the network
> protocol is what it is. Things like the fetch and offsets requests are
> much easier to explain if you show how it connects to the
> implementation in the back. I wanted to create a single document that
> would take people 90% of the way there to writing a driver while
> assuming minimal prior knowledge, because it's the document I really
> wish I had last month.
>
> I always intended to write a separate document that would more
> comprehensively cover how to use our Python driver, but I imagine that
> part will vary substantially from one implementation to the next. I
> haven't started on that one yet just because our driver's API likely
> won't stabilize for another couple of weeks.
>
> Thank you.
>
> Dave
>
>
> On Tue, Nov 29, 2011 at 10:40 AM, Taylor Gautier <tgautier@tagged.com>
> wrote:
> > Just wanted to add my $0.02 - I'm glad David wrote this - excellent job
> sir!
> >
> > My comment is this (I think it might have already been mentioned,
> however I
> > will re-iterate it):  the document as is covers two audiences - those
> that
> > are writing Kafka "drivers" and those that are writing clients that
> publish
> > and consume to Kafka (using a "driver").  Most of the document is geared
> > for the former, however there are some bits that are meant for or are
> > useful also to the latter.
> >
> > I would like to suggest that we split the document up and address each
> > audience separately.  As great as it is that David wrote a lot of great
> > information for the "driver" writers, the need for that will slowly
> > decline, as the drivers slowly become more available and more stable
> > (there's only so many languages in the world).
> >
> > On the other hand, people will be writing their own "clients" using the
> > drivers far more often, so the latter audience will, assuming Kafka
> becomes
> > wildly successful, increase in need.  Beefing up this part of the
> document
> > - by focusing on that audience, will be incredibly useful to new
> adopters.
> >
> > Incidentally, it might behoove us as a community to have strong language
> > that separates these two activities.  I used "driver" and "client" - I am
> > not necessarily advocating for these terms but rather just that there is
> a
> > need for terms that are distinct - it is important to separate the
> concepts
> > using language/syntax so that people do not get confused.
> >
> > On Tue, Nov 29, 2011 at 7:27 AM, David Ormsbee <dave@datadoghq.com>
> wrote:
> >
> >> HI Jay,
> >>
> >> >   1. Would you be willing to add this to the kafka wiki so we could
> make
> >> >   this the official howto doc?
> >>
> >> Absolutely.
> >>
> >> >   2. It might be good to add a "how to contribute your client"
> section.
> >> >   This would be hard to write right now because we haven't given
> anyone
> >> any
> >> >   guidelines for doing it. We have been pretty liberal in accepting
> >> clients
> >> >   kind of proceeding on the "something is better than nothing" theory.
> >> But
> >> >   this leads to clients of mixed quality and little documentation, as
> >> you and
> >> >   Joe noted. I will break this into a separate thread to broaden the
> >> >   discussion.
> >>
> >> I'll be happy to add it as soon as we have consensus on what the
> >> guidelines should be.
> >>
> >> Thank you.
> >>
> >> Dave
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message