kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: How to calculate the optimal value of `maintenance_manager_num_threads`
Date Mon, 27 Mar 2017 18:16:54 GMT
Hi Jason,

On Fri, Mar 24, 2017 at 1:39 AM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Hi,
> I'm using Apache Kudu 1.2 on CDH 5.10.
> Recently, after reading "Bulk write performance improvements for Kudu 1.4
> <https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit>"
> I've noticed that `maintenance_manager_num_threads` is 4 for the 5
> spinning disks.
Yes, but I wouldn't take that as necessarily optimal. I'm now doing some
tests with 8 threads as a comparison point.

> In my cluster, each node has 10 SATA disks with RAID 1+0 (WAL and Data
> directory located in the same partition). As Todd suggested, bulk loading
> is doing in PK sorted manner. I think CPU usage and System Load of my
> cluster is not high at this moment, so I think it could be increased a
> little bit more.
> Would someone please suggest the number of my environment?

Increasing the number of maintenance threads may help if you are falling
behind on compaction and flushes. For compaction, you can tell if you are
falling behind by looking at the "bloom_lookups_per_op" metric. For
flushes, you may be falling behind if you see a lot of "memory pressure
rejections". One area for improvement in our tooling is adding some more
scripts and tools to make these types of diagnosis easier.

In general, it's a tradeoff: more MM threads means more resource
consumption, but possibly better performance. The tradeoff may be
non-linear, though (i.e doubling MM threads won't double performance!)

As Kudu is still a young project, we're still gathering operational
experience from users around topics like this. It would be great if you can
share back any results you find with the community.


Todd Lipcon
Software Engineer, Cloudera

View raw message