distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leigh Stewart <lstew...@twitter.com.INVALID>
Subject Re: Distributed Log as Kafka's backend
Date Mon, 29 Aug 2016 14:45:37 GMT
Agree with Sijie, I think this is exciting work and I didn't mean to cut
off your options. My objection was just about code organization.

A contribs project seems like a good compromise for now, until we can think
of a better place to put the code.

Sijie's right though, if we want to fully productionize this and make it
reusable this might not be the best long term location.

What are your thoughts Khurrum? Does the code organization/ layering
argument make sense?

Thanks!

On Fri, Aug 26, 2016 at 8:24 PM, Sijie Guo <sijie@apache.org> wrote:

> + Leigh
>
> Khurrum,
>
> Thanks for your hard working on this. The approach in general looks good
> to me.
>
> However, I am kind of agreeing with what Leigh commented at pull request.
> Ideally we want to make DL more focus on single streams itself, such as
> durability, consistency and performance. As different applications might
> use streams in a different way to produce different data/consumption
> models. For example, you can use a set of streams to build Kafka-like
> partitioned pubsub, or other people can use a set of streams to build a
> queue-like messaging system, or build database.
>
> However, at the other side, it is very interesting to see a good Kafka
> client integration using DL streams as partitions rather than just a
> non-completed tutorial. I wouldn't discourage your hard working. Probably a
> tradeoff here is making a distributdlog-contribs module and moving the
> distributedlog-kafka module to under it. The distributedlog-contribs module
> hosts any integration related contributions. This would helping avoid any
> confusions. Any thoughts, Leigh?
>
> Also, Khurrum, did you talk with Kafka community? I am not sure if DL is
> the right repo to host this. Does anyone else have better suggestions on
> this?
>
> - Sijie
>
>
>
>
>
>
>
>
>
> On Thursday, August 25, 2016, Khurrum Nasim <khurrumnasimm@gmail.com>
> wrote:
>
>> I sent out another pull request to improve the kafka publisher in the
>> tutorial : https://github.com/apache/incubator-distributedlog/pull/16
>>
>> We tried to use the existing kafka configuration, key/value serializer and
>> partitioner as possible as we can. So we don't need to rewrite our
>> existing
>> services to adopt distributedlog.
>>
>> Although the pull request is still WIP, we'd like to know if we are using
>> distributed log in the right way. Especially we are thinking of changing
>> write proxy to also return either transaction id or sequence id on write
>> requests.
>>
>> Appreciate your helps.
>>
>> - KN
>>
>>
>>
>> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
>> wrote:
>>
>> > I sent out a pull request about the offset sequencer.
>> https://github.com/
>> > apache/incubator-distributedlog/pull/15
>> >
>> > I am not sure if there is any code guideline to follow. I tried my best
>> to
>> > follow existing code style. If I did anything wrong, please help me fix
>> > them.
>> >
>> > - KN
>> >
>> >
>> >
>> >
>> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <khurrumnasimm@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi All,
>> >>
>> >> After read the DL code, we have a better idea on how to use distributed
>> >> log as the kafka implementation. There are two approaches to do that:
>> one
>> >> is to use distributedlog-core library directly in kafka broker, while
>> the
>> >> other one is to use all the DL components.
>> >>
>> >> The first approach is basically to replace the storage of kafka broker
>> >> with bookkeeper. The good part is that all the kafka wire-protocols
>> will
>> >> remain unchanged. But it might take longer time and also make
>> operations
>> >> complicated.
>> >>
>> >> The second approach is to implement Kafka's publisher and subscriber
>> API
>> >> using DL. It would be much faster and more consistent on operations (we
>> >> only need to operate DL backend only). However, it would only support
>> java
>> >> client.
>> >>
>> >> We discussed internally. We felt the second approach is good enough to
>> us
>> >> and it is easier to achieve. We will start with the second approach. If
>> >> there are anyone interested in first approach, we'd like to
>> participant and
>> >> help too.
>> >>
>> >> Here is the outline about our changes:
>> >>
>> >> * Kafka Namespace: as I replied in the other email thread, we want to
>> >> layout the streams in following format:
>> >>
>> >> namespace/topic/partitions : storing all the partitions
>> >> namespace/topic/partitions/N : storing the given partition `N`
>> >> namespace/topic/subscriptions : storing all the subscriptions
>> >> namespace/topic/subscriptions/S : storing the information of
>> >> subscription `S`
>> >>
>> >> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions
>> /S`
>> >> are DL streams.
>> >>
>> >> * Offset Sequencer: we want to assign `offset` as the transaction id
>> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
>> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>> >>
>> >> * Use separated DL streams to store the information of a subscription,
>> >> such as offsets and consumer load balancing information.
>> >>
>> >> Do you see any concerns here?
>> >>
>> >>
>> >> - KN
>> >>
>> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <sijie@apache.org> wrote:
>> >>
>> >>> Thanks Khurrum.
>> >>>
>> >>> At this point, we don't have any specific process to follow for big
>> >>> features. We were discussing one under
>> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>> >>> edlog-dev/201607.mbox/browser
>> >>>
>> >>> But ideally, let's use mail list for discussion and use confluence
>> page
>> >>> for
>> >>> reflecting the discussions into a design doc.
>> >>>
>> >>> If you already have a confluence account (if not, please create one),
>> >>> please email me your account. I can grant the permission to you, then
>> you
>> >>> can edit.
>> >>>
>> >>> - Sijie
>> >>>
>> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <
>> khurrumnasimm@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Sijie,
>> >>> >
>> >>> > Thank you so much for your quick reply. We are using Kafka now
and
>> we
>> >>> are
>> >>> > interested in the features in DL like durability and handling slow
>> >>> > machines.
>> >>> >
>> >>> > If it is okay to the community, we'd like to give a try and evaluate
>> >>> the
>> >>> > solution. Is there any process that I should follow?
>> >>> >
>> >>> > KN
>> >>> >
>> >>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
>> >>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>> >>> >
>> >>> > > Khurrum,
>> >>> > >
>> >>> > > Interesting. Thank you for your interests in DistributedLog.
>> >>> > >
>> >>> > > Three years ago when we started the project internally at
Twitter,
>> >>> we did
>> >>> > > have a plan to use it as a backend for both kestrel (Twitter's
>> >>> in-house
>> >>> > > queue system) and Kafka. However, we didn't go down that
>> direction.
>> >>> > > Instead, we built a similar self-serve pub/sub system over
>> >>> DistributedLog
>> >>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>> >>> plan to
>> >>> > > build the kafka's interface over DistributedLog. The module
was
>> put
>> >>> under
>> >>> > > tutorials is mostly to give people an idea how it can be used
for
>> >>> > building
>> >>> > > a partition based pub/sub system.
>> >>> > >
>> >>> > > However, I don't have any strong preference here. If you think
it
>> >>> would
>> >>> > be
>> >>> > > useful to other people, you are welcome to contribute. We'd
be
>> happy
>> >>> to
>> >>> > > guide and offer any helps.
>> >>> > >
>> >>> > > Also, it might be good if you can explain more about what
you are
>> >>> > planning
>> >>> > > to do. Other people in the community can chime in and discuss.
>> >>> > >
>> >>> > > Please let us know your thoughts. You are very welcome to
make any
>> >>> > > contributions.
>> >>> > >
>> >>> > > - Sijie
>> >>> > >
>> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>> >>> khurrumnasimm@gmail.com
>> >>> > >
>> >>> > > wrote:
>> >>> > >
>> >>> > > > Hello folks,
>> >>> > > >
>> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials.
>> But it
>> >>> > seems
>> >>> > > > not complete yet. I am wondering if there is a plan to
fully
>> >>> implement
>> >>> > > the
>> >>> > > > kafka's interface. It would be great if we can use kafka's
>> >>> interface to
>> >>> > > > access distributed log. I'd like to contribute if there
is a
>> plan.
>> >>> > > >
>> >>> > > > Thanks,
>> >>> > > > KN
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message