kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Brannan <paul.bran...@thesystech.com>
Subject Re: Recommended bulk size ?
Date Tue, 28 Feb 2017 16:22:51 GMT
As you said, I expect it depends on many variables.  I ran a quick & dirty
experiment when first evaluating kudu 1.0 to see how flushing at varying
intervals affected insert rates.  I had one master and one tserver, each in
the default configuration, on an ext4 filesystem on a spinning disk.  The
table had two string columns "key" and "value", both part of the primary
key, each less than 30 bytes.  Here were the results:

Manual flush every insert: 100K inserts in 14.5s (~7K/s)
Manual flush every 100K: 1M inserts in 4.7s (~215K/s, w/ warnings about
"blocked reactor thread")
Manual flush every 10K: 1M inserts in 4.2s (~240K/s)
Auto flush background, no explicit flush: 1M inserts in 4.8s (w/ warnings
about "blocked reactor thread" and "thread stuck")
Auto flush background, explicit flush every 10K inserts: 1M inserts in 4.2s
Async flush every 10K inserts: 1M inserts in 2.8s (~350K/s)
Async flush every 1K inserts: 1M inserts in 2.7s (~370K/s)
Async flush every 100: 1M inserts in 3.3s (~300K/s)
Async flush every 10: 1M inserts in 10.6s (~95K/s)

Based on this experiment, I chose async flush with a 1K interval, because
beyond that there is diminishing return, and I don't want to run out of
mutation space.

On Tue, Feb 28, 2017 at 6:29 AM, Nicolas Fouché <nfouche@onfocus.io> wrote:

> Hi. Is there any recommendation on the number of operations in
> bulk/AUTO_FLUSH_BACKGROUND ? I guess it highly depends on the cluster size,
> the number of partitions hit by the operations, etc. But there could be
> some guidelines out there ?
> Looking at the code of the kudu client, it seems that the default size is
> 1000: `private int mutationBufferSpace = 1000;`.
> - Nicolas

View raw message