bookkeeper-distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerrit Sundaram <gerritsunda...@gmail.com>
Subject Re: [DISCUSS] DL Stream Operation Primitives
Date Wed, 23 Nov 2016 03:51:31 GMT
Sijie,

Sorry for late response. Here is my wiki account :
https://cwiki.apache.org/confluence/display/~gerritsundaram

Thanks in advance.

-  Gerrit

On Thu, Nov 17, 2016 at 10:03 AM, Sijie Guo <sijieg@twitter.com.invalid>
wrote:

> Gerrit,
>
> Can you send me your wiki account?
>
> - Sijie
>
> On Thu, Nov 17, 2016 at 1:38 AM, Gerrit Sundaram <gerritsundaram@gmail.com
> >
> wrote:
>
> > Can you grant me the permissions for editing the wiki page?
> >
> > - Gerrit
> >
> > On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram <
> gerritsundaram@gmail.com
> > >
> > wrote:
> >
> > >
> > >
> > > On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <sijie@apache.org> wrote:
> > >
> > >> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram <
> > >> gerritsundaram@gmail.com>
> > >> wrote:
> > >>
> > >> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <sijie@apache.org>
> wrote:
> > >> >
> > >> > > I liked this topic. A better name might be 'stream storage
> > >> primitives',
> > >> > as
> > >> > > we treat DL as a stream storage. Comments inline.
> > >> > >
> > >> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram <
> > >> > gerritsundaram@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > As what Sijie suggested in the other email thread, I started
> this
> > >> email
> > >> > > > thread for discussing the stream operation primitives.
> > >> > > >
> > >> > > > The stream operations that I am aware of that DL supports
are
> > >> > > >
> > >> > > > * Open a distributedlog stream
> > >> > > > * Delete a distributedlog stream
> > >> > > > * List all the distributedlog streams under a namespace
> > >> > > >
> > >> > >
> > >> > > Are you also looking for listing streams under a 'sub-namespace'
-
> > (or
> > >> > > streams have common prefix)? (Based on my understanding on your
> > >> proposal,
> > >> > > you might need this for a filesystem-like API?)
> > >> > >
> > >> >
> > >> > Yes. However it seems like DL is more designed with flat namespace
> > with
> > >> > just streams.
> > >>
> > >>
> > >> Ah, yes. The original thought is to tight a namespace to a user or an
> > >> application. Under a namespace, application can manage the streams by
> > >> their
> > >> own. So that's why it was designed with a flat namespace.
> > >>
> > >>
> > >> > There is no concept about 'sub-namespace'. Although I
> > >> > probably can hack it by just naming the stream names in a filesystem
> > >> > path-like way.
> > >> >
> > >> > However I am still curious do you guys want to introduce any sort
of
> > >> naming
> > >> > hierarchy in the naming within a namespace. For example, can you
> have
> > a
> > >> > 'StreamSet', which is a set of streams? (like in filesystem, a
> > directory
> > >> > has a list of children). If you have similar hierarchical, it
> > definitely
> > >> > will simply my work.
> > >> >
> > >>
> > >> In the write proxy, we have a similar concept like 'StreamSet' to
> group
> > >> some physical DL streams into one virtual stream. However that was
> > mostly
> > >> used for exporting metrics for grouped virtual streams. We don't quite
> > >> emphasize the concept of 'virtual stream' in DL. As we tended to let
> the
> > >> application decide what the virtual stream looks like.
> > >>
> > >> However, for metadata organization and management, it might make sense
> > to
> > >> think of such hierarchy.
> > >>
> > >> What do you have in your mind about 'StreamSet'? Can you explain a
> > little
> > >> more?
> > >
> > >
> > > I was thinking a group of streams that might be used for same
> application
> > > but store different parts of data. It is like the 'virtual' stream that
> > you
> > > mentioned.
> > >
> > > - Gerrit
> > >
> > >
> > >>
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > > * Seal a distributedlog stream
> > >> > > > * Truncate a distributedlog stream
> > >> > > >
> > >> > >
> > >> > > Just to clarify this, the 'truncate' in DL is to trim the head
of
> > the
> > >> > > stream not the tail.
> > >> > > The 'truncate' in filesystem world is to a size of precisely
> > *length*
> > >> > > bytes, it is truncating the tail.
> > >> > >
> > >> > > Make sure we clarified it and are on same page.
> > >> > >
> > >> >
> > >> > Yes, we are on the same page.
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > I am looking for a more filesystem-like API. for example,
> > >> > > >
> > >> > > > * Get the status/attributes of a stream (like stat in
> filesystem)
> > >> > > >
> > >> > >
> > >> > > +1 for stream status/attributes. I think we might actually already
> > >> have
> > >> > > this in DL. since in kestrel, we use that for storing customized
> > >> > metadata.
> > >> > > It might make sense to formalize it into 'stream status'.
> > >> > >
> > >> >
> > >> > Gotcha.
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > > > * Rename a stream
> > >> > > >
> > >> > >
> > >> > > we've talked about this for a while. +1.
> > >> > >
> > >> > >
> > >> > > > * Symlink a stream
> > >> > >
> > >> > >
> > >> > > Symlink a stream is probably easy to do. +1 we've thought about
> that
> > >> for
> > >> > > having the flexibility to move stream between different storage
> > >> backend.
> > >> > > Symlink would help this.
> > >> > >
> > >> > > But a more fundamental thought here is symlinks for log segments.
> So
> > >> > when a
> > >> > > symlinked stream is deleted, the underneath log segments might
not
> > be
> > >> > > deleted until its link count decreased to zero.
> > >> > >
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > Another operations that I can think of might be useful.
> > >> > > >
> > >> > > > * Split/Fork a stream (it can be useful for dynamic data
> > >> partitioning)
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > Split and fork a stream sounds interesting. But it sounds like
a
> > more
> > >> > > high-level feature rather than storage primitives. Actually,
it
> > might
> > >> be
> > >> > a
> > >> > > good separate discussion feature.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > > * Merge/Concat streams
> > >> > > >
> > >> > >
> > >> > >
> > >> > > I think there is already one outstanding jira for concatenating
> two
> > DL
> > >> > > streams. Jia and Arvind are working on that.
> > >> > >
> > >> > > https://issues.apache.org/jira/browse/DL-46
> > >> >
> > >> >
> > >> > I will watch that lira.
> > >> >
> > >> >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > The above operations are based on my knowledge about DL.
Feel
> free
> > >> to
> > >> > add
> > >> > > > more.
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > - Gerrit
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message