bookkeeper-distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <si...@apache.org>
Subject Re: [DISCUSS] DL Stream Operation Primitives
Date Tue, 15 Nov 2016 10:14:14 GMT
On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram <gerritsundaram@gmail.com>
wrote:

> On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <sijie@apache.org> wrote:
>
> > I liked this topic. A better name might be 'stream storage primitives',
> as
> > we treat DL as a stream storage. Comments inline.
> >
> > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram <
> gerritsundaram@gmail.com>
> > wrote:
> >
> > > As what Sijie suggested in the other email thread, I started this email
> > > thread for discussing the stream operation primitives.
> > >
> > > The stream operations that I am aware of that DL supports are
> > >
> > > * Open a distributedlog stream
> > > * Delete a distributedlog stream
> > > * List all the distributedlog streams under a namespace
> > >
> >
> > Are you also looking for listing streams under a 'sub-namespace' - (or
> > streams have common prefix)? (Based on my understanding on your proposal,
> > you might need this for a filesystem-like API?)
> >
>
> Yes. However it seems like DL is more designed with flat namespace with
> just streams.


Ah, yes. The original thought is to tight a namespace to a user or an
application. Under a namespace, application can manage the streams by their
own. So that's why it was designed with a flat namespace.


> There is no concept about 'sub-namespace'. Although I
> probably can hack it by just naming the stream names in a filesystem
> path-like way.
>
> However I am still curious do you guys want to introduce any sort of naming
> hierarchy in the naming within a namespace. For example, can you have a
> 'StreamSet', which is a set of streams? (like in filesystem, a directory
> has a list of children). If you have similar hierarchical, it definitely
> will simply my work.
>

In the write proxy, we have a similar concept like 'StreamSet' to group
some physical DL streams into one virtual stream. However that was mostly
used for exporting metrics for grouped virtual streams. We don't quite
emphasize the concept of 'virtual stream' in DL. As we tended to let the
application decide what the virtual stream looks like.

However, for metadata organization and management, it might make sense to
think of such hierarchy.

What do you have in your mind about 'StreamSet'? Can you explain a little
more?


>
>
> >
> >
> > > * Seal a distributedlog stream
> > > * Truncate a distributedlog stream
> > >
> >
> > Just to clarify this, the 'truncate' in DL is to trim the head of the
> > stream not the tail.
> > The 'truncate' in filesystem world is to a size of precisely *length*
> > bytes, it is truncating the tail.
> >
> > Make sure we clarified it and are on same page.
> >
>
> Yes, we are on the same page.
>
>
> >
> >
> > >
> > > I am looking for a more filesystem-like API. for example,
> > >
> > > * Get the status/attributes of a stream (like stat in filesystem)
> > >
> >
> > +1 for stream status/attributes. I think we might actually already have
> > this in DL. since in kestrel, we use that for storing customized
> metadata.
> > It might make sense to formalize it into 'stream status'.
> >
>
> Gotcha.
>
>
> >
> >
> > > * Rename a stream
> > >
> >
> > we've talked about this for a while. +1.
> >
> >
> > > * Symlink a stream
> >
> >
> > Symlink a stream is probably easy to do. +1 we've thought about that for
> > having the flexibility to move stream between different storage backend.
> > Symlink would help this.
> >
> > But a more fundamental thought here is symlinks for log segments. So
> when a
> > symlinked stream is deleted, the underneath log segments might not be
> > deleted until its link count decreased to zero.
> >
> >
> >
> > >
> > > Another operations that I can think of might be useful.
> > >
> > > * Split/Fork a stream (it can be useful for dynamic data partitioning)
> > >
> >
> >
> >
> > Split and fork a stream sounds interesting. But it sounds like a more
> > high-level feature rather than storage primitives. Actually, it might be
> a
> > good separate discussion feature.
> >
> >
> >
> >
> > > * Merge/Concat streams
> > >
> >
> >
> > I think there is already one outstanding jira for concatenating two DL
> > streams. Jia and Arvind are working on that.
> >
> > https://issues.apache.org/jira/browse/DL-46
>
>
> I will watch that lira.
>
>
> >
> >
> >
> >
> > >
> > > The above operations are based on my knowledge about DL. Feel free to
> add
> > > more.
> >
> >
> > >
> > > - Gerrit
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message