Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D7859200C60 for ; Mon, 24 Apr 2017 20:06:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D4586160B99; Mon, 24 Apr 2017 18:06:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F1331160B93 for ; Mon, 24 Apr 2017 20:06:27 +0200 (CEST) Received: (qmail 16272 invoked by uid 500); 24 Apr 2017 18:06:27 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 16262 invoked by uid 99); 24 Apr 2017 18:06:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Apr 2017 18:06:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 82A9A188A12 for ; Mon, 24 Apr 2017 18:06:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.397 X-Spam-Level: X-Spam-Status: No, score=-0.397 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qabdO-gIc6u4 for ; Mon, 24 Apr 2017 18:06:25 +0000 (UTC) Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 679415FC57 for ; Mon, 24 Apr 2017 18:06:24 +0000 (UTC) Received: by mail-oi0-f43.google.com with SMTP id x184so145434084oia.1 for ; Mon, 24 Apr 2017 11:06:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ScN8DGTe7hWFwe56T/5pW1XKs6vLw5gU0lRbGZoAUZo=; b=QG7LXEZgxIr9SnYqpgQSL5ohnmDcOgObhQG7nly9ejecCesGWjnAbxd3ozu5I+88i5 Fayvagr2LxCPNvJW/NS1yQSiiT3AFNZ2D6k8IhP8Lcjbp6JB7O/0Z8LdJdbJTiXq1oh5 X7fpJT/FNMEEMZkv5hE4ZEqswdnas7emhdaUasxEQbOggnE3nU95swQxyKikj2l9JIEc ZymlFhBqFNGMZm/l+Ujh0GDH5pZjiqqo8uwnXMLD2dTEy65nceq76VHM+dJ/5lRu2860 nIOSftih79lQxevg6k2/gM03Qxn0CEsvGybmnFQWOexlg2/TgCLWOqefwNDnlhqqFAyg iX/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ScN8DGTe7hWFwe56T/5pW1XKs6vLw5gU0lRbGZoAUZo=; b=ZlhCHTmttG/uNZwcjQwR90yXBY5OVVwJ2d75OPu3H2a/s1mbHrjJgzQ0AsDTJYh541 1f8uPk1IbXDL+U06oiwLQTnelA6/Xef81C+B40YNNBnKeembnQquPmKFvSh7dx8GDtlg ITNvKzidXKmiFixBd8CrD2NfVJaWi4egETLzrSVHqLgoPjNuXhzjgKZL14HV7IIfFou2 6eC6cLUwdqkqnMeqKl2wouPMWXmxr/HeVPBDv86n2uebdNLTHeK8WRkatmLHzw0BpANl vyxmDvcos69xfXexFMO7xfOTLZBUBQMUzsDN1t8RGAGtbOgdCtw4/XwwTbQ2AN7XucSO yMAg== X-Gm-Message-State: AN3rC/6TQCiu6uI7vvxNjvFFP6qiJDKcfyktmHew4Efzi9yovsB9FFjl hrTyfncLBWJ7+0s3kYbozrLhIzSYzQFs X-Received: by 10.157.7.69 with SMTP id 63mr13372291ote.170.1493057183039; Mon, 24 Apr 2017 11:06:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.236.161 with HTTP; Mon, 24 Apr 2017 11:06:02 -0700 (PDT) In-Reply-To: References: From: David Alves Date: Mon, 24 Apr 2017 11:06:02 -0700 Message-ID: Subject: Re: Bad insert performance of java kudu-client To: user Content-Type: multipart/alternative; boundary=001a113dd7ae064767054ded770e archived-at: Mon, 24 Apr 2017 18:06:29 -0000 --001a113dd7ae064767054ded770e Content-Type: text/plain; charset=UTF-8 I just became aware that AUTOFLUSH_BACKGROUND has been available for longer on the java client, so likely not the cause after all. So never mind my suggestion. -david On Mon, Apr 24, 2017 at 10:56 AM, Todd Lipcon wrote: > I vaguely recall some bug in earlier versions of the Java client where > 'shutdown' wouldn't properly block on the data being flushed. So it's > possible in 1.0.x and below, you're not actually measuring the full amount > of time to write all the data, whereas when the bug is fixed, you are. > > I'll see if I can repro this locally as well using your code. > > -Todd > > On Mon, Apr 24, 2017 at 10:49 AM, David Alves > wrote: > >> Hi Pavel >> >> Interesting, Thanks for sharing those numbers. >> I assume you weren't using AUTOFLUSH_BACKGROUND for the first versions >> you tested (don't think it was available then iirc). >> Could you try without in the last version and see how the numbers >> compare? >> We'd be happy to help track down the reason for this perf regression. >> >> Best >> David >> >> On Mon, Apr 24, 2017 at 4:58 AM, Pavel Martynov >> wrote: >> >>> Hi, I ran into the fact that I can not achieve high insertion speed and >>> I start to experiment with https://github.com/cloude >>> ra/kudu-examples/tree/master/java/insert-loadgen. >>> My slightly modified code (recreation of table on startup + duration >>> measuring): https://gist.github.com/xkrt/9405a2eeb98a56288b7 >>> c5a7d817097b4. >>> On every run I change kudu-client version, results: >>> >>> kudu-client-ver perf >>> 0.10 Duration: 626 ms, 79872/sec >>> 1.0.0 Duration: 622 ms, 80385 inserts/sec >>> 1.0.1 Duration: 630 ms, 79365 inserts/sec >>> 1.1.0 Duration: 11703 ms, 4272 inserts/sec >>> 1.3.1 Duration: 12317 ms, 4059 inserts/sec >>> >>> As can you see there was a great degradation between 1.0.1 and 1.1.0 >>> (about a ~20 times!). >>> What could be a problem, how can I fix it? (actually I interested in >>> kudu-spark, so probably using of kudu-client 1.0.1 is not right solution?). >>> >>> My test cluster: 3 hosts with master and tserver on each (3 masters and >>> 3 tservers overall). >>> No extra settings, flags used: >>> fs_wal_dir >>> fs_data_dirs >>> master_addresses >>> tserver_master_addrs >>> >>> >>> -- >>> with best regards, Pavel Martynov >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > --001a113dd7ae064767054ded770e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I just became aware that AUTOFLUSH_BACKGROUND has been ava= ilable for longer on the java client, so likely not the cause after all. So= never mind my suggestion.

-david

On Mon, Apr 24, 2017 at 10:= 56 AM, Todd Lipcon <todd@cloudera.com> wrote:
I vaguely recall some bug in earlier v= ersions of the Java client where 'shutdown' wouldn't properly b= lock on the data being flushed. So it's possible in 1.0.x and below, yo= u're not actually measuring the full amount of time to write all the da= ta, whereas when the bug is fixed, you are.

I'll see= if I can repro this locally as well using your code.

<= div>-Todd

=
On Mon, Apr 24, 2017 at 10:49 AM, David Alves <davidralves@gmail.com> wrote:
Hi Pavel

=C2=A0 Interesting, Tha= nks for sharing those numbers.
=C2=A0 I assume you weren't us= ing AUTOFLUSH_BACKGROUND for the first versions you tested (don't think= it was available then iirc).
=C2=A0 Could you try without in the= last version and see how the numbers compare?
=C2=A0 We'd be= happy to help track down the reason for this perf regression.
Best
David

On Mon, Apr 24, 2017 at 4:58 AM, = Pavel Martynov <mr.xkurt@gmail.com> wrote:
Hi, I ran into the fact that I can not a= chieve high insertion speed and I start=C2=A0to experiment with=C2=A0https://github.com/cloudera/kudu-examples/tree/= master/java/insert-loadgen.
My slightly modified code (recreat= ion of table on startup + duration measuring):=C2=A0https:= //gist.github.com/xkrt/9405a2eeb98a56288b7c5a7d817097b4.
On every=C2=A0run I change kudu-client version, results:
kudu-client-ver =C2=A0perf
0.10 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Duration: 626 ms, 79872/sec
1.0.0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Duration: 622 ms, 80385 inserts/sec
1.0.1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Duration: 630 ms, 7= 9365 inserts/sec
1.1.0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0D= uration: 11703 ms, 4272 inserts/sec
1.3.1 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0Duration: 12317 ms, 4059 inserts/sec
As can you see there was a great degradation between 1.0.1 and= 1.1.0 (about a ~20 times!).
What could be a problem, how can I f= ix it? (actually I interested in kudu-spark, so probably using of kudu-clie= nt 1.0.1 is not right solution?).

My test cluster:= 3 hosts with master and tserver on each (3 masters and 3 tservers overall)= .
No extra settings, flags used:
fs_wal_dir
fs_data_dirs
master_addresses
tserver_master= _addrs


--
with best regards, Pavel Mart= ynov




<= /div>--
Todd= Lipcon
Software Engineer, Cloudera

--001a113dd7ae064767054ded770e--