kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3740) Add configs for RocksDBStores
Date Fri, 27 May 2016 02:15:13 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303345#comment-15303345
] 

Guozhang Wang commented on KAFKA-3740:
--------------------------------------

Some thoughts on the default config values:

I think there are at least three use cases of RocksDB whose default configs need to be treated
differentially:

1. For pure key-value store with put / get / delete, this is used for KTable aggregation and
KStream aggregation (note that for now windowed KStream aggregation is using a range query,
which is sub-optimal, we should really change it to multiple gets to avoid flushing the cache).

2. For append-only puts and range queries, used for windowed KStream joins.

3. For update puts and range queries, non-key KTable-KTable joins: we are about to add this
support and am writing up a design proposal for it.

For example, for case 1) it should usually write-heavy, assuming we have a good cache hit
rate on top of RocksDB, then we should consider setting smaller number of levels config to
reduce write amplification; and for 2) and 3), we should turn off bloom filter by default
since it does not help for range queries.

References:

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
https://vimeo.com/album/2920922/video/98428203

> Add configs for RocksDBStores
> -----------------------------
>
>                 Key: KAFKA-3740
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3740
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Henry Cai
>              Labels: api, newbie
>
> Today most of the rocksDB configs are hard written inside {{RocksDBStore}}, or the default
values are directly used. We need to make them configurable for advanced users. For example,
some default values may not work perfectly for some scenarios: https://github.com/HenryCaiHaiying/kafka/commit/ccc4e25b110cd33eea47b40a2f6bf17ba0924576

> One way of doing that is to introduce a "RocksDBStoreConfigs" objects similar to "StreamsConfig",
which defines all related rocksDB options configs, that can be passed as key-value pairs to
"StreamsConfig".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message