distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <khurrumnas...@gmail.com>
Subject Re: Distributed Log as Kafka's backend
Date Tue, 23 Aug 2016 16:38:16 GMT
Hi All,

After read the DL code, we have a better idea on how to use distributed log
as the kafka implementation. There are two approaches to do that: one is to
use distributedlog-core library directly in kafka broker, while the other
one is to use all the DL components.

The first approach is basically to replace the storage of kafka broker with
bookkeeper. The good part is that all the kafka wire-protocols will remain
unchanged. But it might take longer time and also make operations
complicated.

The second approach is to implement Kafka's publisher and subscriber API
using DL. It would be much faster and more consistent on operations (we
only need to operate DL backend only). However, it would only support java
client.

We discussed internally. We felt the second approach is good enough to us
and it is easier to achieve. We will start with the second approach. If
there are anyone interested in first approach, we'd like to participant and
help too.

Here is the outline about our changes:

* Kafka Namespace: as I replied in the other email thread, we want to
layout the streams in following format:

namespace/topic/partitions : storing all the partitions
namespace/topic/partitions/N : storing the given partition `N`
namespace/topic/subscriptions : storing all the subscriptions
namespace/topic/subscriptions/S : storing the information of subscription
`S`

both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
are DL streams.

* Offset Sequencer: we want to assign `offset` as the transaction id
instead of `timestamp`. we will add a `OffsetSequencer` and allow write
proxy to load `OffsetSequencer` instead of `TimeSequencer`.

* Use separated DL streams to store the information of a subscription, such
as offsets and consumer load balancing information.

Do you see any concerns here?


- KN

On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <sijie@apache.org> wrote:

> Thanks Khurrum.
>
> At this point, we don't have any specific process to follow for big
> features. We were discussing one under
> http://mail-archives.apache.org/mod_mbox/incubator-distribut
> edlog-dev/201607.mbox/browser
>
> But ideally, let's use mail list for discussion and use confluence page for
> reflecting the discussions into a design doc.
>
> If you already have a confluence account (if not, please create one),
> please email me your account. I can grant the permission to you, then you
> can edit.
>
> - Sijie
>
> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
> wrote:
>
> > Sijie,
> >
> > Thank you so much for your quick reply. We are using Kafka now and we are
> > interested in the features in DL like durability and handling slow
> > machines.
> >
> > If it is okay to the community, we'd like to give a try and evaluate the
> > solution. Is there any process that I should follow?
> >
> > KN
> >
> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
> >
> > > Khurrum,
> > >
> > > Interesting. Thank you for your interests in DistributedLog.
> > >
> > > Three years ago when we started the project internally at Twitter, we
> did
> > > have a plan to use it as a backend for both kestrel (Twitter's in-house
> > > queue system) and Kafka. However, we didn't go down that direction.
> > > Instead, we built a similar self-serve pub/sub system over
> DistributedLog
> > > to consolidate our kestrel and kafka. So we don't have a concrete plan
> to
> > > build the kafka's interface over DistributedLog. The module was put
> under
> > > tutorials is mostly to give people an idea how it can be used for
> > building
> > > a partition based pub/sub system.
> > >
> > > However, I don't have any strong preference here. If you think it would
> > be
> > > useful to other people, you are welcome to contribute. We'd be happy to
> > > guide and offer any helps.
> > >
> > > Also, it might be good if you can explain more about what you are
> > planning
> > > to do. Other people in the community can chime in and discuss.
> > >
> > > Please let us know your thoughts. You are very welcome to make any
> > > contributions.
> > >
> > > - Sijie
> > >
> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
> khurrumnasimm@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello folks,
> > > >
> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
> > seems
> > > > not complete yet. I am wondering if there is a plan to fully
> implement
> > > the
> > > > kafka's interface. It would be great if we can use kafka's interface
> to
> > > > access distributed log. I'd like to contribute if there is a plan.
> > > >
> > > > Thanks,
> > > > KN
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message