distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerrit Sundaram <gerritsunda...@gmail.com>
Subject Re: [DISCUSS] DL Stream Operation Primitives
Date Wed, 23 Nov 2016 04:58:49 GMT
I also created the proposal page -
https://cwiki.apache.org/confluence/display/DL/DP-3+-+DistributedLog+Stream+Operation+Primitives

Let me know if you have any comments. If there is no objections, I'd like
to send out the pull requests soon.

- Gerrit

On Tue, Nov 22, 2016 at 8:11 PM, Gerrit Sundaram <gerritsundaram@gmail.com>
wrote:

> Thanks Sijie!
>
> I also created https://issues.apache.org/jira/browse/DL-72 for tracking
> the implementation of the discussion here. I will create the proposal soon.
>
> - Gerrit
>
> On Tue, Nov 22, 2016 at 7:58 PM, Sijie Guo <sijie@apache.org> wrote:
>
>> Done. Gerrit, I just granted the permissions to you. Let me know if it
>> works.
>>
>> - Sijie
>>
>> On Tue, Nov 22, 2016 at 7:51 PM, Gerrit Sundaram <
>> gerritsundaram@gmail.com>
>> wrote:
>>
>> > Sijie,
>> >
>> > Sorry for late response. Here is my wiki account :
>> > https://cwiki.apache.org/confluence/display/~gerritsundaram
>> >
>> > Thanks in advance.
>> >
>> > -  Gerrit
>> >
>> > On Thu, Nov 17, 2016 at 10:03 AM, Sijie Guo <sijieg@twitter.com.invalid
>> >
>> > wrote:
>> >
>> > > Gerrit,
>> > >
>> > > Can you send me your wiki account?
>> > >
>> > > - Sijie
>> > >
>> > > On Thu, Nov 17, 2016 at 1:38 AM, Gerrit Sundaram <
>> > gerritsundaram@gmail.com
>> > > >
>> > > wrote:
>> > >
>> > > > Can you grant me the permissions for editing the wiki page?
>> > > >
>> > > > - Gerrit
>> > > >
>> > > > On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram <
>> > > gerritsundaram@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <sijie@apache.org>
>> wrote:
>> > > > >
>> > > > >> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram <
>> > > > >> gerritsundaram@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <sijie@apache.org>
>> > > wrote:
>> > > > >> >
>> > > > >> > > I liked this topic. A better name might be 'stream
storage
>> > > > >> primitives',
>> > > > >> > as
>> > > > >> > > we treat DL as a stream storage. Comments inline.
>> > > > >> > >
>> > > > >> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram
<
>> > > > >> > gerritsundaram@gmail.com>
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > > > As what Sijie suggested in the other email
thread, I
>> started
>> > > this
>> > > > >> email
>> > > > >> > > > thread for discussing the stream operation
primitives.
>> > > > >> > > >
>> > > > >> > > > The stream operations that I am aware of that
DL supports
>> are
>> > > > >> > > >
>> > > > >> > > > * Open a distributedlog stream
>> > > > >> > > > * Delete a distributedlog stream
>> > > > >> > > > * List all the distributedlog streams under
a namespace
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > Are you also looking for listing streams under
a
>> > 'sub-namespace' -
>> > > > (or
>> > > > >> > > streams have common prefix)? (Based on my understanding
on
>> your
>> > > > >> proposal,
>> > > > >> > > you might need this for a filesystem-like API?)
>> > > > >> > >
>> > > > >> >
>> > > > >> > Yes. However it seems like DL is more designed with
flat
>> namespace
>> > > > with
>> > > > >> > just streams.
>> > > > >>
>> > > > >>
>> > > > >> Ah, yes. The original thought is to tight a namespace to
a user
>> or
>> > an
>> > > > >> application. Under a namespace, application can manage the
>> streams
>> > by
>> > > > >> their
>> > > > >> own. So that's why it was designed with a flat namespace.
>> > > > >>
>> > > > >>
>> > > > >> > There is no concept about 'sub-namespace'. Although
I
>> > > > >> > probably can hack it by just naming the stream names
in a
>> > filesystem
>> > > > >> > path-like way.
>> > > > >> >
>> > > > >> > However I am still curious do you guys want to introduce
any
>> sort
>> > of
>> > > > >> naming
>> > > > >> > hierarchy in the naming within a namespace. For example,
can
>> you
>> > > have
>> > > > a
>> > > > >> > 'StreamSet', which is a set of streams? (like in filesystem,
a
>> > > > directory
>> > > > >> > has a list of children). If you have similar hierarchical,
it
>> > > > definitely
>> > > > >> > will simply my work.
>> > > > >> >
>> > > > >>
>> > > > >> In the write proxy, we have a similar concept like 'StreamSet'
to
>> > > group
>> > > > >> some physical DL streams into one virtual stream. However
that
>> was
>> > > > mostly
>> > > > >> used for exporting metrics for grouped virtual streams. We
don't
>> > quite
>> > > > >> emphasize the concept of 'virtual stream' in DL. As we tended
to
>> let
>> > > the
>> > > > >> application decide what the virtual stream looks like.
>> > > > >>
>> > > > >> However, for metadata organization and management, it might
make
>> > sense
>> > > > to
>> > > > >> think of such hierarchy.
>> > > > >>
>> > > > >> What do you have in your mind about 'StreamSet'? Can you
explain
>> a
>> > > > little
>> > > > >> more?
>> > > > >
>> > > > >
>> > > > > I was thinking a group of streams that might be used for same
>> > > application
>> > > > > but store different parts of data. It is like the 'virtual' stream
>> > that
>> > > > you
>> > > > > mentioned.
>> > > > >
>> > > > > - Gerrit
>> > > > >
>> > > > >
>> > > > >>
>> > > > >> >
>> > > > >> >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > > * Seal a distributedlog stream
>> > > > >> > > > * Truncate a distributedlog stream
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > Just to clarify this, the 'truncate' in DL is to
trim the
>> head
>> > of
>> > > > the
>> > > > >> > > stream not the tail.
>> > > > >> > > The 'truncate' in filesystem world is to a size
of precisely
>> > > > *length*
>> > > > >> > > bytes, it is truncating the tail.
>> > > > >> > >
>> > > > >> > > Make sure we clarified it and are on same page.
>> > > > >> > >
>> > > > >> >
>> > > > >> > Yes, we are on the same page.
>> > > > >> >
>> > > > >> >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > >
>> > > > >> > > > I am looking for a more filesystem-like API.
for example,
>> > > > >> > > >
>> > > > >> > > > * Get the status/attributes of a stream (like
stat in
>> > > filesystem)
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > +1 for stream status/attributes. I think we might
actually
>> > already
>> > > > >> have
>> > > > >> > > this in DL. since in kestrel, we use that for storing
>> customized
>> > > > >> > metadata.
>> > > > >> > > It might make sense to formalize it into 'stream
status'.
>> > > > >> > >
>> > > > >> >
>> > > > >> > Gotcha.
>> > > > >> >
>> > > > >> >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > > * Rename a stream
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > we've talked about this for a while. +1.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > > * Symlink a stream
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > Symlink a stream is probably easy to do. +1 we've
thought
>> about
>> > > that
>> > > > >> for
>> > > > >> > > having the flexibility to move stream between different
>> storage
>> > > > >> backend.
>> > > > >> > > Symlink would help this.
>> > > > >> > >
>> > > > >> > > But a more fundamental thought here is symlinks
for log
>> > segments.
>> > > So
>> > > > >> > when a
>> > > > >> > > symlinked stream is deleted, the underneath log
segments
>> might
>> > not
>> > > > be
>> > > > >> > > deleted until its link count decreased to zero.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > >
>> > > > >> > > > Another operations that I can think of might
be useful.
>> > > > >> > > >
>> > > > >> > > > * Split/Fork a stream (it can be useful for
dynamic data
>> > > > >> partitioning)
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > Split and fork a stream sounds interesting. But
it sounds
>> like a
>> > > > more
>> > > > >> > > high-level feature rather than storage primitives.
Actually,
>> it
>> > > > might
>> > > > >> be
>> > > > >> > a
>> > > > >> > > good separate discussion feature.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > > * Merge/Concat streams
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > I think there is already one outstanding jira for
>> concatenating
>> > > two
>> > > > DL
>> > > > >> > > streams. Jia and Arvind are working on that.
>> > > > >> > >
>> > > > >> > > https://issues.apache.org/jira/browse/DL-46
>> > > > >> >
>> > > > >> >
>> > > > >> > I will watch that lira.
>> > > > >> >
>> > > > >> >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > >
>> > > > >> > > > The above operations are based on my knowledge
about DL.
>> Feel
>> > > free
>> > > > >> to
>> > > > >> > add
>> > > > >> > > > more.
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > >
>> > > > >> > > > - Gerrit
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message