kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Yu <yohan.richard...@gmail.com>
Subject Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore
Date Sat, 14 Oct 2017 21:32:49 GMT
Thanks for the clarifications, Xavier.
I have removed most of the methods except for keys() and all() which has
been renamed to Guozhang Wang's suggestions.

Hope this helps.

On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xavier@confluent.io> wrote:

> Thanks for the KIP Richard, this is a very useful addition!
>
> As far as the API changes, I just have a few comments on the methods that
> don't seem directly related to the KIP title, and naming of course :).
> On the implementation, see my notes further down that will hopefully
> clarify a few things.
>
> Regarding the "bonus" methods:
> I agree with Guozhang that the KIP lacks proper motivation for adding the
> min, max, and allLatest methods.
> It is also not clear to me what min and max would really mean, what
> ordering do we refer to here? Are we first ordering by time, then key, or
> first by key, then time?
> The allLatest method might be useful, but I don't really see how it would
> be used in practice if we have to scan the entire range of keys for all the
> state stores, every single time.
>
> Maybe we could flesh the motivation behind those extra methods, but in the
> interest of time, and moving the KIP forward it might make sense to file a
> follow-up once we have more concrete use-cases.
>
> On naming:
> I also agree with Guozhang that "keys()" should be renamed. It feels a bit
> of a misnomer, since it not only returns keys, but also the values.
>
> As far as what to rename it to, I would argue we already have some
> discrepancy between key-value stores using range() vs. window stores using
> fetch().
> I assume we called the window method "fetch" instead of "get" because you
> might get back more than one window for the requested key.
>
> If we wanted to make things consistent with both existing key-value store
> naming and window store naming, we could do the following:
> Decide that "all" always refers to the entire range of keys, independent of
> the window and similarly "range" always refers to a particular range of
> keys, irrespective of the window.
> We can then prefix methods with "fetch" to indicate that more than one
> window may be returned for each key in the range.
>
> This would give us:
> - a new fetchAll() method for all the keys, which makes it clear that you
> might get back the same key in different windows
> - a new fetchAll(timeFrom, timeTo) method to get all the keys in a given
> time range, again with possibly more than one window per key
> - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, long,
> long)  and deprecate the old one to indicate a range of keys
>
> One inconsistency I noted: the "Proposed Changes" section in your KIP talks
> about a "range(timeFrom, timeTo)" method, I think you meant to refer to the
> all(from, to) method, but I'm sure you'll fix that once we decide on
> naming.
>
> On the implementation side:
> You mentioned that caching and rocksdb store have very different key/value
> structures, and while it appears to be that way on the surface, the
> structure between the two is actually very similar. Keys in the cache are
> prefixed with a segment ID to ensure the ordering in the cache stays
> consistent with the rocksdb implementation, which maintains multiple
> rocksdb instances, one for each segment. So we just "artificially" mirror
> the segment structure in the cache.
>
> The reason for keeping the ordering consistent is pretty simple: keep in
> mind that when we query a cached window store we are effectively querying
> both the cache and the persistent rocksdb store at the same time, merging
> results from both. To make that merge as painless as possible, we ensure
> the ordering is consistent when querying a range of keys in both stores.
>
> Also keep in mind CompositeReadonlyWindowStore, which wraps multiple window
> stores within a topology.
>
> Hope this clarifies some of the less trivial parts of caching window store.
>
> Cheers,
> Xavier
>
> On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wangguoz@gmail.com> wrote:
>
> > Richard, Matthias:
> >
> > 0. Could you describe a bit what are the possible use cases of
> `allLatest`,
> > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a minimum
> > necessary amount, to avoid a swamp of new APIs that no one would really
> use
> > but just complicated the internal code base.
> >
> > 1. One minor comment on the other two new APIs: could we rename `keys` to
> > `all` and `all` to `range` to be consistent with the other store's APIs?
> >
> > 2. One meta comment on the implementation details: since both `keys` and
> > `all` would likely touch multiple segments, we may need to use the
> internal
> > `SegmentIterator` class, but currently it always requires a Bytes from
> and
> > Bytes to for its constructor. What changes we need to make for that
> class?
> >
> >
> > Guozhang
> >
> >
> > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <yohan.richard.yu@gmail.com>
> > wrote:
> >
> > > We should split KAFKA-4499 into several sub-issues with 4499 being the
> > > parent issue.
> > > Adding the implementation to CachingWindowStore, RocksDBWindowStore,
> etc
> > > will each require the addition of a test and implementing the methods
> > which
> > > is not trivial.
> > > This way, it should be easier to manage the progress of the KIP.
> > >
> > >
> > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > > > Thanks for driving this and sorry for late response. With release
> > > > deadline it was pretty busy lately.
> > > >
> > > > Can you please add a description for the suggested method, what they
> > are
> > > > going to return? It's a little unclear to me atm.
> > > >
> > > > It would also be helpful to discuss, for which use case each method
> is
> > > > useful. This might also help to identify potential gaps for which
> > > > another API might be more helpful.
> > > >
> > > > Also, we should talk about provided guarantees when using those APIs
> > > > with regard to consistency -- not saying that we need to provide
> strong
> > > > guarantees, but he KIP should describe what user can expect.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > > Hello, I would like to solicit review and comment on this issue
> (link
> > > > > below):
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message