kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sophie Blee-Goldman <sop...@confluent.io>
Subject Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams
Date Tue, 04 Jun 2019 22:05:35 GMT
Hey Bruno,

I tend to agree with Guozhang on this matter although you do bring up some
good points that should be addressed. Regarding 1) I think it is probably
fairly uncommon in practice for users to leverage the individual store
names passed to RocksDBConfigSetter#setConfig in order to specify options
on a per-store basis. When this actually is used, it does seem likely that
users would be doing something like pattern matching the physical store
name prefix in order to apply configs to all physical stores (segments)
within a single logical RocksDBStore. As you mention this is something of a
hassle already as physical stores are created/deleted, while most likely
all anyone cares about is the prefix corresponding to the logical store. It
seems like rather than persist this hassle to the metric layer, we should
consider refactoring RocksDBConfigSetter to apply to a logical store rather
than a specific physical segment. Or maybe providing some kind of tooling
to at least make this easier on users, but that's definitely outside the
scope of this KIP.

Regarding 2) can you clarify your point about accessing stores uniformly?
While I agree there will definitely be variance in the access pattern of
different segments, I think it's unlikely that it will vary in any kind of
predictable or deterministic way, hence it is not that useful to know in
hindsight the difference reflected by the metrics.

Cheers,
Sophie

On Tue, Jun 4, 2019 at 2:09 PM Bruno Cadonna <bruno@confluent.io> wrote:

> Hi Guozhang,
>
> After some thoughts, I tend to be in favour of the option with metrics
> for each physical RocksDB instance for the following reasons:
>
> 1) A user already needs to be aware of segmented state stores when
> providing a custom RocksDBConfigSetter. In RocksDBConfigSetter one can
> specify settings for a store depending on the name of the store. Since
> segments (i.e. state store) in a segmented state store have names that
> share a prefix but have suffixes that are created at runtime, increase
> with time and are theoretically unbounded, a user needs to take
> account of the segments to provide the settings for all (i.e. matching
> the common prefix) or some (i.e. matching the common prefix and for
> example suffixes according to a specific pattern) of the segments of a
> specific segmented state store.
> 2) Currently settings for RocksDB can only be specified by a user per
> physical instance and not per logical instance. Deriving good settings
> for physical instances from metrics for a logical instance can be hard
> if the physical instances are not accessed uniformly. In segmented
> state stores segments are not accessed uniformly.
> 3) Simpler to implement and to get things done.
>
> Any thoughts on this from anybody?
>
> Best,
> Bruno
>
> On Thu, May 30, 2019 at 8:33 PM Guozhang Wang <wangguoz@gmail.com> wrote:
> >
> > Hi Bruno:
> >
> > Regarding 2) I think either way has some shortcomings: exposing the
> metrics
> > per rocksDB instance for window / session stores exposed some
> > implementation internals (that we use segmented stores) to enforce users
> to
> > be aware of them. E.g. what if we want to silently change the internal
> > implementation by walking away from the segmented approach? On the other
> > hand, coalescing multiple rocksDB instances' metrics into a single one
> per
> > each logical store also has some concerns as I mentioned above. What I'm
> > thinking is actually that, if we can customize the aggregation logic to
> > still has one set of metrics per each logical store which may be composed
> > of multiple rocksDB ones -- e.g. for `bytes-written-rate` we sum them
> > across rocksDBs, while for `memtable-hit-rate` we do weighted average?
> >
> > Regarding logging levels, I think have DEBUG is fine, but also that means
> > without manually turning it on users would not get it. Maybe we should
> > consider adding some dynamic setting mechanisms in the future to allow
> > users turn it on / off during run-time.
> >
> >
> > Guozhang
> >
> >
> >
> > On Tue, May 28, 2019 at 6:23 AM Bruno Cadonna <bruno@confluent.io>
> wrote:
> >
> > > Hi,
> > >
> > > Thank you for your comments.
> > >
> > > @Bill:
> > >
> > > 1. It is like Guozhang wrote:
> > > - rocksdb-state-id is for key-value stores
> > > - rocksdb-session-state-id is for session stores
> > > - rocksdb-window-state-id is for window stores
> > > These tags are defined in the corresponding store builders and I think
> > > it is a good idea to re-use them.
> > >
> > > 2. I could not find any exposed ticker or histogram to get the total
> > > and average number of compactions, although RocksDB dumps the number
> > > of compactions between levels in its log files. There is the
> > > NUM_SUBCOMPACTIONS_SCHEDULED histogram that gives you statistics about
> > > the number of subcompactions actually scheduled during a compaction,
> > > but that is not that what you are looking for. If they will expose the
> > > number of compaction in the future, we can still add the metrics you
> > > propose. I guess, the metric in this KIP that would indirectly be
> > > influenced by the number of L0 files would be write-stall-duration. If
> > > there are too many compactions this duration should increase. However,
> > > this metric is also influenced by memtable flushes.
> > >
> > > @John:
> > >
> > > I think it is a good idea to prefix the flush-related metrics with
> > > memtable to avoid ambiguity. I changed the KIP accordingly.
> > >
> > > @Dongjin:
> > >
> > > For the tag and compaction-related comments, please see my answers to
> Bill.
> > >
> > > I cannot follow your second paragraph. Are you saying that a tuning
> > > guide for RocksDB within Streams based on the metrics in this KIP is
> > > out of scope? I also think that it doesn't need to be included in this
> > > KIP, but it is worth to work on it afterwards.
> > >
> > > @Guozhang:
> > >
> > > 1. Thank you for the explanation. I missed that. I modified the KIP
> > > accordingly.
> > >
> > > 2. No, my plan is to record and expose the set of metrics for each
> > > RocksDB store separately. Each set of metrics can then be
> > > distinguished by its store ID. Do I miss something here?
> > >
> > > 3. I agree with you and Sophie about user education and that we should
> > > work on it after this KIP.
> > >
> > > 4. I agree also on the user API. However, I would like to open a
> > > separate KIP for it because I still need a bit of thinking to get it.
> > > I also think it is a good idea to do one step after the other to get
> > > at least the built-in RocksDB metrics into the next release.
> > > Do you think I chose too many metrics as built-in metrics for this
> > > KIP? What do others think?
> > >
> > > @Sophie:
> > >
> > > I tend to DEBUG level, but I do not have heart feelings about it. Do
> > > you mean to turn it on/off RocksDB metrics in the Streams
> > > configuration?
> > >
> > > Best,
> > > Bruno
> > >
> > > On Tue, May 21, 2019 at 8:02 PM Sophie Blee-Goldman <
> sophie@confluent.io>
> > > wrote:
> > > >
> > > > I definitely agree with Guozhang's "meta" comment: if it's possible
> to
> > > > allow users to pick and choose individual RocksDB metrics that would
> be
> > > > ideal. One further question is whether these will be debug or info
> level
> > > > metrics, or a separate level altogether? If there is a nontrivial
> > > overhead
> > > > associated with attaching RocksDB metrics it would probably be good
> to be
> > > > able to independently turn on/off Rocks metrics
> > > >
> > > > On Tue, May 21, 2019 at 9:00 AM Guozhang Wang <wangguoz@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Bruno,
> > > > >
> > > > > Thanks for the KIP, I have a few minor comments and a meta one
> which
> > > are
> > > > > relatively aligned with other folks:
> > > > >
> > > > > Minor:
> > > > >
> > > > > 1) Regarding the "rocksdb-state-id = [store ID]", to be consistent
> with
> > > > > other state store metrics (see
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams
> > > ),
> > > > > this tag should be either "rocksdb-window-state-id",
> > > > > "rocksdb-session-state-id" or "rocksdb-state-id". For window /
> session
> > > > > store backed by rocksDB, the tags should not be "rocksdb-state-id".
> > > > >
> > > > > 2) Also for window / session store, my take is that you plan to
> have
> > > > > multiple rocksDB behind the scene to report to the same set of
> > > metrics, is
> > > > > that right? My concern is that for such types of state stores,
> most of
> > > the
> > > > > access would be on the first segment rocksDB instance, and hence
> > > coalescing
> > > > > all of instances as a single set of metrics may dilute it.
> > > > >
> > > > > 3) I agree with @sophie@confluent.io <sophie@confluent.io>
that we
> > > should
> > > > > better have some documentation educating users what to do when see
> what
> > > > > anomalies in metrics; though I think this is a long endeavoring
> effort
> > > that
> > > > > may go beyond this KIP's scope, let's keep that as a separate
> umbrella
> > > JIRA
> > > > > to accumulate knowledge over time.
> > > > >
> > > > >
> > > > > Meta:
> > > > >
> > > > > 4) Instead of trying to enumerate all the ones that might be
> helpful,
> > > I'd
> > > > > recommend that we expose a user-friendly API in StreamsMetrics to
> allow
> > > > > users to add more metrics from RocksDB they'd like to have, while
> only
> > > > > keeping a small set of most-meaningful ones that are ubiquitously
> > > useful
> > > > > out-of-the-box. WDYT?
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 21, 2019 at 8:04 AM Dongjin Lee <dongjin@apache.org>
> > > wrote:
> > > > >
> > > > >> Hi Bruno,
> > > > >>
> > > > >> I just read the KIP. I think this feature is great. As far as
I
> know,
> > > most
> > > > >> Kafka users monitor the host resources, JVM resources, and Kafka
> > > metrics
> > > > >> only, not RocksDB for configuring the statistics feature is a
> little
> > > bit
> > > > >> tiresome. Since RocksDB impacts the performance of Kafka Streams,
> I
> > > > >> believe
> > > > >> providing these metrics with other metrics in one place is much
> > > better.
> > > > >>
> > > > >> However, I am a little bit not assured for how much information
> > > should be
> > > > >> provided to the users with the documentation - how the user can
> > > control
> > > > >> the
> > > > >>  RocksDB may on the boundary of the scope. How do you think?
> > > > >>
> > > > >> +1. I entirely agree to Bill's comments; I also think
> > > `rocksdb-store-id`
> > > > >> is
> > > > >> better than `rocksdb-state-id` and metrics on total compactions
> and an
> > > > >> average number of compactions is also needed.
> > > > >>
> > > > >> Regards,
> > > > >> Dongjin
> > > > >>
> > > > >> On Tue, May 21, 2019 at 2:48 AM John Roesler <john@confluent.io>
> > > wrote:
> > > > >>
> > > > >> > Hi Bruno,
> > > > >> >
> > > > >> > Looks really good overall. This is going to be an awesome
> addition.
> > > > >> >
> > > > >> > My only thought was that we have "bytes-flushed-(rate|total)
and
> > > > >> > flush-time-(avg|min|max)" metrics, and the description states
> that
> > > > >> > these are specifically for Memtable flush operations. What
do
> you
> > > > >> > think about calling it "memtable-bytes-flushed... (etc)"?
On one
> > > hand,
> > > > >> > I could see this being redundant, since that's the only
thing
> that
> > > > >> > gets "flushed" inside of Rocks. But on the other, we have
an
> > > > >> > independent "flush" operation in streams, which might be
> confusing.
> > > > >> > Plus, it might help people who are looking at the full "menu"
of
> > > > >> > hundreds of metrics. They can't read and remember every
> description
> > > > >> > while trying to understand the full list of metrics, so
going
> for
> > > > >> > maximum self-documentation in the name seems nice.
> > > > >> >
> > > > >> > But that's a minor thought. Modulo the others' comments,
this
> looks
> > > good
> > > > >> > to me.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > -John
> > > > >> >
> > > > >> > On Mon, May 20, 2019 at 11:07 AM Bill Bejeck <bbejeck@gmail.com
> >
> > > wrote:
> > > > >> > >
> > > > >> > > Hi Bruno,
> > > > >> > >
> > > > >> > > Thanks for the KIP, this will be a useful addition.
> > > > >> > >
> > > > >> > > Overall the KIP looks good to me, and I have two minor
> comments.
> > > > >> > >
> > > > >> > > 1. For the tags should, I'm wondering if rocksdb-state-id
> should
> > > be
> > > > >> > > rocksdb-store-id
> > > > >> > > instead?
> > > > >> > >
> > > > >> > > 2. With the compaction metrics, would it be possible
to add
> total
> > > > >> > > compactions and an average number of compactions? 
I've taken
> a
> > > look
> > > > >> at
> > > > >> > the
> > > > >> > > available RocksDB metrics, and I'm not sure.  But users
can
> > > control
> > > > >> how
> > > > >> > > many L0 files it takes to trigger compaction so if
it is
> > > possible; it
> > > > >> may
> > > > >> > > be useful.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Bill
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, May 20, 2019 at 9:15 AM Bruno Cadonna <
> bruno@confluent.io
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi Sophie,
> > > > >> > > >
> > > > >> > > > Thank you for your comments.
> > > > >> > > >
> > > > >> > > > It's a good idea to supplement the metrics with
> configuration
> > > option
> > > > >> > > > to change the metrics. I also had some thoughts
about it.
> > > However, I
> > > > >> > > > think I need some experimentation to get this
right.
> > > > >> > > >
> > > > >> > > > I added the block cache hit rates for index and
filter
> blocks
> > > to the
> > > > >> > > > KIP. As far as I understood, they should stay
at zero, if
> users
> > > do
> > > > >> not
> > > > >> > > > configure RocksDB to include index and filter
blocks into
> the
> > > block
> > > > >> > > > cache. Did you also understand this similarly?
I guess also
> in
> > > this
> > > > >> > > > case some experimentation would be good to be
sure.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Bruno
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Sat, May 18, 2019 at 2:29 AM Sophie Blee-Goldman
<
> > > > >> > sophie@confluent.io>
> > > > >> > > > wrote:
> > > > >> > > > >
> > > > >> > > > > Actually I wonder if it might be useful to
users to be
> able to
> > > > >> break
> > > > >> > up
> > > > >> > > > the
> > > > >> > > > > cache hit stats by type? Some people may
choose to store
> > > index and
> > > > >> > filter
> > > > >> > > > > blocks alongside data blocks, and it would
probably be
> very
> > > > >> helpful
> > > > >> > for
> > > > >> > > > > them to know who is making more effective
use of the
> cache in
> > > > >> order
> > > > >> > to
> > > > >> > > > tune
> > > > >> > > > > how much of it is allocated to each. I'm
not sure how
> common
> > > this
> > > > >> > really
> > > > >> > > > is
> > > > >> > > > > but I think it would be invaluable to those
who do.
> RocksDB
> > > > >> > performance
> > > > >> > > > can
> > > > >> > > > > be quite opaque..
> > > > >> > > > >
> > > > >> > > > > Cheers,
> > > > >> > > > > Sophie
> > > > >> > > > >
> > > > >> > > > > On Fri, May 17, 2019 at 5:01 PM Sophie Blee-Goldman
<
> > > > >> > sophie@confluent.io
> > > > >> > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hey Bruno!
> > > > >> > > > > >
> > > > >> > > > > > This all looks pretty good to me, but
one suggestion I
> have
> > > is
> > > > >> to
> > > > >> > > > > > supplement each of the metrics with
some info on how the
> > > user
> > > > >> can
> > > > >> > > > control
> > > > >> > > > > > them. In other words, which options
could/should they
> set in
> > > > >> > > > > > RocksDBConfigSetter should they discover
a particular
> > > > >> bottleneck?
> > > > >> > > > > >
> > > > >> > > > > > I don't think this necessarily needs
to go into the KIP,
> > > but I
> > > > >> do
> > > > >> > > > think it
> > > > >> > > > > > should be included in the docs somewhere
(happy to help
> > > build up
> > > > >> > the
> > > > >> > > > list
> > > > >> > > > > > of associated options when the time
comes)
> > > > >> > > > > >
> > > > >> > > > > > Thanks!
> > > > >> > > > > > Sophie
> > > > >> > > > > >
> > > > >> > > > > > On Fri, May 17, 2019 at 2:54 PM Bruno
Cadonna <
> > > > >> bruno@confluent.io>
> > > > >> > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > >> Hi all,
> > > > >> > > > > >>
> > > > >> > > > > >> this KIP describes the extension
of the Kafka Streams'
> > > metrics
> > > > >> to
> > > > >> > > > include
> > > > >> > > > > >> RocksDB's internal statistics.
> > > > >> > > > > >>
> > > > >> > > > > >> Please have a look at it and let
me know what you
> think.
> > > Since
> > > > >> I
> > > > >> > am
> > > > >> > > > not a
> > > > >> > > > > >> RocksDB expert, I am thankful for
any additional pair
> of
> > > eyes
> > > > >> that
> > > > >> > > > > >> evaluates this KIP.
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > >
> > > > >> >
> > > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams
> > > > >> > > > > >>
> > > > >> > > > > >> Best regards,
> > > > >> > > > > >> Bruno
> > > > >> > > > > >>
> > > > >> > > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Dongjin Lee*
> > > > >>
> > > > >> *A hitchhiker in the mathematical world.*
> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> <https://github.com/dongjinleekr>linkedin:
> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> speakerdeck.com/dongjin
> > > > >> <https://speakerdeck.com/dongjin>*
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > >
> >
> >
> > --
> > -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message