bookkeeper-distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerrit Sundaram <gerritsunda...@gmail.com>
Subject Re: [DISCUSS] DL Stream Operation Primitives
Date Thu, 17 Nov 2016 09:38:53 GMT
Can you grant me the permissions for editing the wiki page?

- Gerrit

On Thu, Nov 17, 2016 at 1:37 AM, Gerrit Sundaram <gerritsundaram@gmail.com>
wrote:

>
>
> On Tue, Nov 15, 2016 at 2:14 AM, Sijie Guo <sijie@apache.org> wrote:
>
>> On Sat, Nov 12, 2016 at 2:30 AM, Gerrit Sundaram <
>> gerritsundaram@gmail.com>
>> wrote:
>>
>> > On Fri, Nov 11, 2016 at 1:09 PM, Sijie Guo <sijie@apache.org> wrote:
>> >
>> > > I liked this topic. A better name might be 'stream storage
>> primitives',
>> > as
>> > > we treat DL as a stream storage. Comments inline.
>> > >
>> > > On Wed, Nov 9, 2016 at 3:09 AM, Gerrit Sundaram <
>> > gerritsundaram@gmail.com>
>> > > wrote:
>> > >
>> > > > As what Sijie suggested in the other email thread, I started this
>> email
>> > > > thread for discussing the stream operation primitives.
>> > > >
>> > > > The stream operations that I am aware of that DL supports are
>> > > >
>> > > > * Open a distributedlog stream
>> > > > * Delete a distributedlog stream
>> > > > * List all the distributedlog streams under a namespace
>> > > >
>> > >
>> > > Are you also looking for listing streams under a 'sub-namespace' - (or
>> > > streams have common prefix)? (Based on my understanding on your
>> proposal,
>> > > you might need this for a filesystem-like API?)
>> > >
>> >
>> > Yes. However it seems like DL is more designed with flat namespace with
>> > just streams.
>>
>>
>> Ah, yes. The original thought is to tight a namespace to a user or an
>> application. Under a namespace, application can manage the streams by
>> their
>> own. So that's why it was designed with a flat namespace.
>>
>>
>> > There is no concept about 'sub-namespace'. Although I
>> > probably can hack it by just naming the stream names in a filesystem
>> > path-like way.
>> >
>> > However I am still curious do you guys want to introduce any sort of
>> naming
>> > hierarchy in the naming within a namespace. For example, can you have a
>> > 'StreamSet', which is a set of streams? (like in filesystem, a directory
>> > has a list of children). If you have similar hierarchical, it definitely
>> > will simply my work.
>> >
>>
>> In the write proxy, we have a similar concept like 'StreamSet' to group
>> some physical DL streams into one virtual stream. However that was mostly
>> used for exporting metrics for grouped virtual streams. We don't quite
>> emphasize the concept of 'virtual stream' in DL. As we tended to let the
>> application decide what the virtual stream looks like.
>>
>> However, for metadata organization and management, it might make sense to
>> think of such hierarchy.
>>
>> What do you have in your mind about 'StreamSet'? Can you explain a little
>> more?
>
>
> I was thinking a group of streams that might be used for same application
> but store different parts of data. It is like the 'virtual' stream that you
> mentioned.
>
> - Gerrit
>
>
>>
>> >
>> >
>> > >
>> > >
>> > > > * Seal a distributedlog stream
>> > > > * Truncate a distributedlog stream
>> > > >
>> > >
>> > > Just to clarify this, the 'truncate' in DL is to trim the head of the
>> > > stream not the tail.
>> > > The 'truncate' in filesystem world is to a size of precisely *length*
>> > > bytes, it is truncating the tail.
>> > >
>> > > Make sure we clarified it and are on same page.
>> > >
>> >
>> > Yes, we are on the same page.
>> >
>> >
>> > >
>> > >
>> > > >
>> > > > I am looking for a more filesystem-like API. for example,
>> > > >
>> > > > * Get the status/attributes of a stream (like stat in filesystem)
>> > > >
>> > >
>> > > +1 for stream status/attributes. I think we might actually already
>> have
>> > > this in DL. since in kestrel, we use that for storing customized
>> > metadata.
>> > > It might make sense to formalize it into 'stream status'.
>> > >
>> >
>> > Gotcha.
>> >
>> >
>> > >
>> > >
>> > > > * Rename a stream
>> > > >
>> > >
>> > > we've talked about this for a while. +1.
>> > >
>> > >
>> > > > * Symlink a stream
>> > >
>> > >
>> > > Symlink a stream is probably easy to do. +1 we've thought about that
>> for
>> > > having the flexibility to move stream between different storage
>> backend.
>> > > Symlink would help this.
>> > >
>> > > But a more fundamental thought here is symlinks for log segments. So
>> > when a
>> > > symlinked stream is deleted, the underneath log segments might not be
>> > > deleted until its link count decreased to zero.
>> > >
>> > >
>> > >
>> > > >
>> > > > Another operations that I can think of might be useful.
>> > > >
>> > > > * Split/Fork a stream (it can be useful for dynamic data
>> partitioning)
>> > > >
>> > >
>> > >
>> > >
>> > > Split and fork a stream sounds interesting. But it sounds like a more
>> > > high-level feature rather than storage primitives. Actually, it might
>> be
>> > a
>> > > good separate discussion feature.
>> > >
>> > >
>> > >
>> > >
>> > > > * Merge/Concat streams
>> > > >
>> > >
>> > >
>> > > I think there is already one outstanding jira for concatenating two DL
>> > > streams. Jia and Arvind are working on that.
>> > >
>> > > https://issues.apache.org/jira/browse/DL-46
>> >
>> >
>> > I will watch that lira.
>> >
>> >
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > The above operations are based on my knowledge about DL. Feel free
>> to
>> > add
>> > > > more.
>> > >
>> > >
>> > > >
>> > > > - Gerrit
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message