Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6D778200C29 for ; Tue, 28 Feb 2017 17:22:58 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6BF7C160B7C; Tue, 28 Feb 2017 16:22:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8B17C160B6A for ; Tue, 28 Feb 2017 17:22:57 +0100 (CET) Received: (qmail 57033 invoked by uid 500); 28 Feb 2017 16:22:56 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 57003 invoked by uid 99); 28 Feb 2017 16:22:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2017 16:22:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 109A418DBE9 for ; Tue, 28 Feb 2017 16:22:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.78 X-Spam-Level: * X-Spam-Status: No, score=1.78 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=thesystech-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Lc5mugenfr19 for ; Tue, 28 Feb 2017 16:22:53 +0000 (UTC) Received: from mail-qk0-f173.google.com (mail-qk0-f173.google.com [209.85.220.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CDD5C5F23A for ; Tue, 28 Feb 2017 16:22:52 +0000 (UTC) Received: by mail-qk0-f173.google.com with SMTP id n127so26086467qkf.0 for ; Tue, 28 Feb 2017 08:22:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thesystech-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=/puWqamYtQVQne81KlCPi8NsTkeM79PAbc6W6VI7JAw=; b=Th4ogw0kJscDa3/DSmmYvDnJm7pTbnwkPLTwU9wTyD9pa7J/W8eMFJ4iJG1E4zNO2D TzoLIFG1ugN9cpHaYtSp12UL98qvz2mdh46Clq0DxPXRv7NLdrgJplWZk0MA3AY/1xuJ 9KdPgbgXMzT7WY0clgzvEzllAS2sblzGa7Rlq0Goswo2FTiRttXTZV7fEfTmSuCxMqC8 AVXG9gMRx8Ui9EkmHz1tXSn8I7u/yaKgAPYXUAJeGtvl3l/q1wATf0+5mJmB7vfa8mF7 20kHSRvru45eDAZVWuHTZioGaUpgdS8hkpYSRiXkYqxjSc5Lz2ioznLs2/1yaebupPZs VPyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=/puWqamYtQVQne81KlCPi8NsTkeM79PAbc6W6VI7JAw=; b=IamrCCQetxhvWT/tlUMNw2MioNd65PvbLvQ1enurrzhpNhpMJyOq6Q5VJVbs5yBTei sNvgU7XP0GN1WeMvOt3OonbOdoL+UczXzYJuahf+ZTQNDLfUZIV9ks+DnLPjmXkyB3+K lhqZ2tNxIqRmev9eUj7ZlSjOpONfnwZFSOduI/n8KnV+UaVfN0HpZNk+ImQLcZEKXrQK KsE6bhhUX/R0EcWmb7lXqcXrpO+IaIT6oe3L6CNGM60/79uUwDH5VJ6jwrJ8u4JCadux U11WU/GQIwojKif44rfv82bsQBkKhyATc9+TQcNfriEJ1PFuJdLc8JlKuU8m9bR+/OkD tMMA== X-Gm-Message-State: AMke39lo3ml7C9EM0YF+FgnNfizNziy62lpNWec3zA21+AfWZLbS+UttdT+fJbLcXHmpA24HmCY6vLoyBpcILNqP X-Received: by 10.55.17.138 with SMTP id 10mr3832204qkr.202.1488298971995; Tue, 28 Feb 2017 08:22:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.97.199 with HTTP; Tue, 28 Feb 2017 08:22:51 -0800 (PST) In-Reply-To: <1459AEE0-965D-47F5-9318-2C788407BD7F@onfocus.io> References: <1459AEE0-965D-47F5-9318-2C788407BD7F@onfocus.io> From: Paul Brannan Date: Tue, 28 Feb 2017 11:22:51 -0500 Message-ID: Subject: Re: Recommended bulk size ? To: user@kudu.apache.org Content-Type: multipart/alternative; boundary=001a11458eb28be4660549999ba5 archived-at: Tue, 28 Feb 2017 16:22:58 -0000 --001a11458eb28be4660549999ba5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As you said, I expect it depends on many variables. I ran a quick & dirty experiment when first evaluating kudu 1.0 to see how flushing at varying intervals affected insert rates. I had one master and one tserver, each in the default configuration, on an ext4 filesystem on a spinning disk. The table had two string columns "key" and "value", both part of the primary key, each less than 30 bytes. Here were the results: Manual flush every insert: 100K inserts in 14.5s (~7K/s) Manual flush every 100K: 1M inserts in 4.7s (~215K/s, w/ warnings about "blocked reactor thread") Manual flush every 10K: 1M inserts in 4.2s (~240K/s) Auto flush background, no explicit flush: 1M inserts in 4.8s (w/ warnings about "blocked reactor thread" and "thread stuck") Auto flush background, explicit flush every 10K inserts: 1M inserts in 4.2s (~240K/s) Async flush every 10K inserts: 1M inserts in 2.8s (~350K/s) Async flush every 1K inserts: 1M inserts in 2.7s (~370K/s) Async flush every 100: 1M inserts in 3.3s (~300K/s) Async flush every 10: 1M inserts in 10.6s (~95K/s) Based on this experiment, I chose async flush with a 1K interval, because beyond that there is diminishing return, and I don't want to run out of mutation space. On Tue, Feb 28, 2017 at 6:29 AM, Nicolas Fouch=C3=A9 w= rote: > Hi. Is there any recommendation on the number of operations in > bulk/AUTO_FLUSH_BACKGROUND ? I guess it highly depends on the cluster siz= e, > the number of partitions hit by the operations, etc. But there could be > some guidelines out there ? > > > Looking at the code of the kudu client, it seems that the default size is > 1000: `private int mutationBufferSpace =3D 1000;`. > > - Nicolas > --001a11458eb28be4660549999ba5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
As you said, I expect it depends on many variables.= =C2=A0 I ran a quick & dirty experiment when first evaluating kudu 1.0 = to see how flushing at varying intervals affected insert rates.=C2=A0 I had= one master and one tserver, each in the default configuration, on an ext4 = filesystem on a spinning disk.=C2=A0 The table had two string columns "= ;key" and "value", both part of the primary key, each less t= han 30 bytes.=C2=A0 Here were the results:

Man= ual flush every insert: 100K inserts in 14.5s (~7K/s)
Manual = flush every 100K: 1M inserts in 4.7s (~215K/s, w/ warnings about "bloc= ked reactor thread")
Manual flush every 10K: 1M inserts = in 4.2s (~240K/s)
Auto flush background, no explicit flush: 1= M inserts in 4.8s (w/ warnings about "blocked reactor thread" and= "thread stuck")
Auto flush background, explicit fl= ush every 10K inserts: 1M inserts in 4.2s (~240K/s)
Async flu= sh every 10K inserts: 1M inserts in 2.8s (~350K/s)
Async flus= h every 1K inserts: 1M inserts in 2.7s (~370K/s)
Async flush = every 100: 1M inserts in 3.3s (~300K/s)
Async flush every 10:= 1M inserts in 10.6s (~95K/s)

Based on this experiment, I= chose async flush with a 1K interval, because beyond that there is diminis= hing return, and I don't want to run out of mutation space.


On Tue, Feb 28, 2017 at 6:29 AM, Nicolas Fouch=C3=A9 &l= t;nfouche@onfocus.i= o> wrote:

Hi. Is there any = recommendation on the number of operations in bulk/AUTO_FLUSH_BACKGROUND ? = I guess it highly depends on the cluster size, the number of partitions hit= by the operations, etc. But there could be some guidelines out there ?


Looking at the co= de of the kudu client, it seems that the default size is 1000: `private int= mutationBufferSpace =3D 1000;`.


- Nicolas

--001a11458eb28be4660549999ba5--