bookkeeper-distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <khurrumnas...@gmail.com>
Subject Re: Distributed Log as Kafka's backend
Date Thu, 25 Aug 2016 10:24:36 GMT
I sent out another pull request to improve the kafka publisher in the
tutorial : https://github.com/apache/incubator-distributedlog/pull/16

We tried to use the existing kafka configuration, key/value serializer and
partitioner as possible as we can. So we don't need to rewrite our existing
services to adopt distributedlog.

Although the pull request is still WIP, we'd like to know if we are using
distributed log in the right way. Especially we are thinking of changing
write proxy to also return either transaction id or sequence id on write
requests.

Appreciate your helps.

- KN



On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
wrote:

> I sent out a pull request about the offset sequencer. https://github.com/
> apache/incubator-distributedlog/pull/15
>
> I am not sure if there is any code guideline to follow. I tried my best to
> follow existing code style. If I did anything wrong, please help me fix
> them.
>
> - KN
>
>
>
>
> On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
> wrote:
>
>> Hi All,
>>
>> After read the DL code, we have a better idea on how to use distributed
>> log as the kafka implementation. There are two approaches to do that: one
>> is to use distributedlog-core library directly in kafka broker, while the
>> other one is to use all the DL components.
>>
>> The first approach is basically to replace the storage of kafka broker
>> with bookkeeper. The good part is that all the kafka wire-protocols will
>> remain unchanged. But it might take longer time and also make operations
>> complicated.
>>
>> The second approach is to implement Kafka's publisher and subscriber API
>> using DL. It would be much faster and more consistent on operations (we
>> only need to operate DL backend only). However, it would only support java
>> client.
>>
>> We discussed internally. We felt the second approach is good enough to us
>> and it is easier to achieve. We will start with the second approach. If
>> there are anyone interested in first approach, we'd like to participant and
>> help too.
>>
>> Here is the outline about our changes:
>>
>> * Kafka Namespace: as I replied in the other email thread, we want to
>> layout the streams in following format:
>>
>> namespace/topic/partitions : storing all the partitions
>> namespace/topic/partitions/N : storing the given partition `N`
>> namespace/topic/subscriptions : storing all the subscriptions
>> namespace/topic/subscriptions/S : storing the information of
>> subscription `S`
>>
>> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
>> are DL streams.
>>
>> * Offset Sequencer: we want to assign `offset` as the transaction id
>> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
>> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>>
>> * Use separated DL streams to store the information of a subscription,
>> such as offsets and consumer load balancing information.
>>
>> Do you see any concerns here?
>>
>>
>> - KN
>>
>> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <sijie@apache.org> wrote:
>>
>>> Thanks Khurrum.
>>>
>>> At this point, we don't have any specific process to follow for big
>>> features. We were discussing one under
>>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>>> edlog-dev/201607.mbox/browser
>>>
>>> But ideally, let's use mail list for discussion and use confluence page
>>> for
>>> reflecting the discussions into a design doc.
>>>
>>> If you already have a confluence account (if not, please create one),
>>> please email me your account. I can grant the permission to you, then you
>>> can edit.
>>>
>>> - Sijie
>>>
>>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
>>> wrote:
>>>
>>> > Sijie,
>>> >
>>> > Thank you so much for your quick reply. We are using Kafka now and we
>>> are
>>> > interested in the features in DL like durability and handling slow
>>> > machines.
>>> >
>>> > If it is okay to the community, we'd like to give a try and evaluate
>>> the
>>> > solution. Is there any process that I should follow?
>>> >
>>> > KN
>>> >
>>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
>>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>>> >
>>> > > Khurrum,
>>> > >
>>> > > Interesting. Thank you for your interests in DistributedLog.
>>> > >
>>> > > Three years ago when we started the project internally at Twitter,
>>> we did
>>> > > have a plan to use it as a backend for both kestrel (Twitter's
>>> in-house
>>> > > queue system) and Kafka. However, we didn't go down that direction.
>>> > > Instead, we built a similar self-serve pub/sub system over
>>> DistributedLog
>>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>>> plan to
>>> > > build the kafka's interface over DistributedLog. The module was put
>>> under
>>> > > tutorials is mostly to give people an idea how it can be used for
>>> > building
>>> > > a partition based pub/sub system.
>>> > >
>>> > > However, I don't have any strong preference here. If you think it
>>> would
>>> > be
>>> > > useful to other people, you are welcome to contribute. We'd be happy
>>> to
>>> > > guide and offer any helps.
>>> > >
>>> > > Also, it might be good if you can explain more about what you are
>>> > planning
>>> > > to do. Other people in the community can chime in and discuss.
>>> > >
>>> > > Please let us know your thoughts. You are very welcome to make any
>>> > > contributions.
>>> > >
>>> > > - Sijie
>>> > >
>>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>>> khurrumnasimm@gmail.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hello folks,
>>> > > >
>>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But
it
>>> > seems
>>> > > > not complete yet. I am wondering if there is a plan to fully
>>> implement
>>> > > the
>>> > > > kafka's interface. It would be great if we can use kafka's
>>> interface to
>>> > > > access distributed log. I'd like to contribute if there is a plan.
>>> > > >
>>> > > > Thanks,
>>> > > > KN
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message