Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 07439200D2F for ; Wed, 1 Nov 2017 07:26:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 05CA1160BEA; Wed, 1 Nov 2017 06:26:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CBE08160BE6 for ; Wed, 1 Nov 2017 07:26:04 +0100 (CET) Received: (qmail 91832 invoked by uid 500); 1 Nov 2017 06:26:03 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 91820 invoked by uid 99); 1 Nov 2017 06:26:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Nov 2017 06:26:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C7DBE1808DA for ; Wed, 1 Nov 2017 06:26:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.401 X-Spam-Level: X-Spam-Status: No, score=-0.401 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 8fU3IgWbI8xM for ; Wed, 1 Nov 2017 06:26:00 +0000 (UTC) Received: from mail-ua0-f175.google.com (mail-ua0-f175.google.com [209.85.217.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 47D505F6BE for ; Wed, 1 Nov 2017 06:26:00 +0000 (UTC) Received: by mail-ua0-f175.google.com with SMTP id l40so895159uah.2 for ; Tue, 31 Oct 2017 23:26:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=yvc6MfG9e9Vi3nw/KdU8k/ue45ciZzQzXc/2y6ML0vk=; b=ZmBuZnqw8Q4TFY5kUNKC6feRrKPlJuABp3Y/OMH/FD5FeIgh1WA+TAWjcvOGrj+row Slsiq+0pVHCxGCDnqasKp49DdzHT2tNOXL/EoRv2QFM5jV1UtD0B20HSh46B4ood5LO9 a+1tsCcnTZrV8mAkoXWS6dAr0WpAG/25YiLfr94ZQOLCgkwrgS5RjJn+JoHVbtEQuFJO iEkc8w21bJujPpwKWzAJmINbqoszXZe1LQJ5DqAg3fCkTwuhChfPE4uXvvKBaVq/0ry8 WMSSSgvtqcsk6+HLsLSI9mQWxo1bDO1zNcwlJNKxnWQVKkC0Zpf9LG6cK8ykOM+8819S Ys8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=yvc6MfG9e9Vi3nw/KdU8k/ue45ciZzQzXc/2y6ML0vk=; b=Z9LSfzTVdALeR/5P9rUvHbbhs7JqdIS+lHN8Q9C2ZaciY7E/FLfxqpH44J+mHomaZq PW6qwZNItjJ3Sx9j65QEF6prYftv2iR3KB4K9+MtrgoNcKzZlATrBRiY0wlAHat8hYkv XXllHukdXcxAznzcWilrOy08xAQmagucDs3O99fPoImGPxznu1spQB2SZ6cdGZ9dGgwj VM+odngiKwxl5BfwKJNT98LGLFB3LGxjgen0o63zCwUfR6rvfpmMKo+BDEn7ySBxwKEo VoYF0X3B3oOKxZWyXjdHreWSa7KNzH/URumk5U/b+PaQ8m2PCF2NAiitVNSv2YFv+jfk GAUw== X-Gm-Message-State: AMCzsaUM0uI/Lukmqhebl1kjdrRc1U0ANsemc4vogmp1Fr+tTHHYauxd rxsh4dVHKhqu4F7O201a6RrqbafkhOUR+o6NUOzpgjfn X-Google-Smtp-Source: ABhQp+QA40Tm2WtTFUnoiB0sKz/MrWqa50dBbYslw3jknq2/qNtKL+ccWL2X2jf4aHcw+LCQZs2awS9cAxw5BVZ8Mm4= X-Received: by 10.176.20.143 with SMTP id d15mr3764443uae.127.1509517553530; Tue, 31 Oct 2017 23:25:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.63.134 with HTTP; Tue, 31 Oct 2017 23:25:32 -0700 (PDT) In-Reply-To: References: From: Todd Lipcon Date: Tue, 31 Oct 2017 23:25:32 -0700 Message-ID: Subject: Re: Low ingestion rate from Kafka To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="001a114542888fb6b8055ce5f18f" archived-at: Wed, 01 Nov 2017 06:26:06 -0000 --001a114542888fb6b8055ce5f18f Content-Type: text/plain; charset="UTF-8" On Tue, Oct 31, 2017 at 11:14 PM, Chao Sun wrote: > Thanks Zhen and Todd. > > Yes increasing the # of consumers will definitely help, but we also want > to test the best throughput we can get from Kudu. > Sure, but increasing the number of consumers can increase the throughput (without increasing the number of Kudu tablet servers). Currently, if you run 'top' on the TS nodes, do you see them using a high amount of CPU? Similar question for 'iostat -dxm 1' - high IO utilization? My guess is that at 15k/sec you are hardly utilizing the nodes, and you're mostly bound by round trip latencies, etc. > > I think the default batch size is 1000 rows? > In manual flush mode, it's up to you to determine how big your batches are. It will buffer until you call 'Flush()'. So you could wait until you've accumulated way more than 1000 to flush. > I tested with a few different options between 1000 and 200000, but always > got some number between 15K to 20K per sec. Also tried flush background > mode and 32 hash partitions but results are similar. > In your AUTO_FLUSH test, were you still calling Flush()? > The primary key is UUID + some string column though - they always come in > batches, e.g., 300 rows for uuid1 followed by 400 rows for uuid2, etc. > Given this, are you hash-partitioning on just the UUID portion of the PK? ie if your PK is (uuid, timestamp), you could hash-partitition on the UUID. This should ensure that you get pretty good batching of the writes. Todd > On Tue, Oct 31, 2017 at 6:25 PM, Todd Lipcon wrote: > >> In addition to what Zhen suggests, I'm also curious how you are sizing >> your batches in manual-flush mode? With 128 hash partitions, each batch is >> generating 128 RPCs, so if for example you are only batching 1000 rows at a >> time, you'll end up with a lot of fixed overhead in each RPC to insert just >> 1000/128 = ~8 rows. >> >> Generally I would expect an 8 node cluster (even with HDDs) to be able to >> sustain several hundred thousand rows/second insert rate. Of course, it >> depends on the size of the rows and also the primary key you've chosen. If >> your primary key is generally increasing (such as the kafka sequence >> number) then you should have very little compaction and good performance. >> >> -Todd >> >> On Tue, Oct 31, 2017 at 6:20 PM, Zhen Zhang wrote: >> >>> Maybe you can add your consumer number? In my opinion, more threads to >>> insert can give a better throughput. >>> >>> 2017-10-31 15:07 GMT+08:00 Chao Sun : >>> >>>> OK. Thanks! I changed to manual flush mode and it increased to ~15K / >>>> sec. :) >>>> >>>> Is there any other tuning I can do to further improve this? and also, >>>> how much would >>>> SSD help in this case (only upsert)? >>>> >>>> Thanks again, >>>> Chao >>>> >>>> On Mon, Oct 30, 2017 at 11:42 PM, Todd Lipcon >>>> wrote: >>>> >>>>> If you want to manage batching yourself you can use the manual flush >>>>> mode. Easiest would be the auto flush background mode. >>>>> >>>>> Todd >>>>> >>>>> On Oct 30, 2017 11:10 PM, "Chao Sun" wrote: >>>>> >>>>>> Hi Todd, >>>>>> >>>>>> Thanks for the reply! I used a single Kafka consumer to pull the data. >>>>>> For Kudu, I was doing something very simple that basically just >>>>>> follow the example here >>>>>> >>>>>> . >>>>>> In specific: >>>>>> >>>>>> loop { >>>>>> Insert insert = kuduTable.newInsert(); >>>>>> PartialRow row = insert.getRow(); >>>>>> // fill the columns >>>>>> kuduSession.apply(insert) >>>>>> } >>>>>> >>>>>> I didn't specify the flushing mode, so it will pick up the >>>>>> AUTO_FLUSH_SYNC as default? >>>>>> should I use MANUAL_FLUSH? >>>>>> >>>>>> Thanks, >>>>>> Chao >>>>>> >>>>>> On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon >>>>>> wrote: >>>>>> >>>>>>> Hey Chao, >>>>>>> >>>>>>> Nice to hear you are checking out Kudu. >>>>>>> >>>>>>> What are you using to consume from Kafka and write to Kudu? Is it >>>>>>> possible that it is Java code and you are using the SYNC flush mode? That >>>>>>> would result in a separate round trip for each record and thus very low >>>>>>> throughput. >>>>>>> >>>>>>> Todd >>>>>>> >>>>>>> On Oct 30, 2017 10:23 PM, "Chao Sun" wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision >>>>>>> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster. >>>>>>> The data are coming from Kafka at a rate of around 30K / sec, and >>>>>>> hash partitioned into 128 buckets. However, with default settings, Kudu can >>>>>>> only consume the topics at a rate of around 1.5K / second. This is a direct >>>>>>> ingest with no transformation on the data. >>>>>>> >>>>>>> Could this because I was using the default configurations? also we >>>>>>> are using Kudu on HDD - could that also be related? >>>>>>> >>>>>>> Any help would be appreciated. Thanks. >>>>>>> >>>>>>> Best, >>>>>>> Chao >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > -- Todd Lipcon Software Engineer, Cloudera --001a114542888fb6b8055ce5f18f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Oct 31, 2017 at 11:14 PM, Chao Sun <sunchao@uber.com> wrote:
Thanks Zhen = and Todd.=C2=A0

Yes increasing the # of consumers will d= efinitely help, but we also want to test the best throughput we can get fro= m Kudu.

Sure, but increasing th= e number of consumers can increase the throughput (without increasing the n= umber of Kudu tablet servers).

Currently, if you r= un 'top' on the TS nodes, do you see them using a high amount of CP= U? Similar question for 'iostat -dxm 1' - high IO utilization? My g= uess is that at 15k/sec you are hardly utilizing the nodes, and you're = mostly bound by round trip latencies, etc.
=C2=A0

I think the = default batch size is 1000 rows?

In manual flush mode, it's up to you to determine how big your= batches are. It will buffer until you call 'Flush()'. So you could= wait until you've accumulated way more than 1000 to flush.
= =C2=A0
I= tested with a few different options between 1000 and 200000, but always go= t some number between 15K to 20K per sec. Also tried flush background mode = and 32 hash partitions but results are similar.

In your AUTO_FLUSH test, were you still calling Flus= h()?
=C2=A0
=
The primary key is UUID + some string column though - they alway= s come in batches, e.g., 300 rows for uuid1 followed by 400 rows for uuid2,= etc.=C2=A0

Given this, a= re you hash-partitioning on just the UUID portion of the PK? ie if your PK = is (uuid, timestamp), you could hash-partitition on the UUID. This should e= nsure that you get pretty good batching of the writes.

=
Todd


On Tue, Oct 31, 2017 at 6:25 PM, Todd Lipcon <= ;todd@cloudera.com> wrote:
In = addition to what Zhen suggests, I'm also curious how you are sizing you= r batches in manual-flush mode? With 128 hash partitions, each batch is gen= erating 128 RPCs, so if for example you are only batching 1000 rows at a ti= me, you'll end up with a lot of fixed overhead in each RPC to insert ju= st 1000/128 =3D ~8 rows.

Generally I would expect an 8 n= ode cluster (even with HDDs) to be able to sustain several hundred thousand= rows/second insert rate. Of course, it depends on the size of the rows and= also the primary key you've chosen. If your primary key is generally i= ncreasing (such as the kafka sequence number) then you should have very lit= tle compaction and good performance.

-Todd

On Tue, Oct 31, 2017 at 6:20 PM, Zhen Zhang= <zhquake@gmail.com> wrote:
Maybe you can add your consumer number? In my opinion, m= ore=C2=A0threads to insert can give a better throughput.

2017-10-31 15:07 GMT+08:00 Chao Sun <sunchao@uber.co= m>:
OK. Th= anks! I changed to manual flush mode and it increased to ~15K / sec. :)
Is there any other tuning I can do to further improve this?= and also, how much would
SSD help in this case (only upsert)?

Thanks again,
Chao

On Mon, O= ct 30, 2017 at 11:42 PM, Todd Lipcon <todd@cloudera.com> wro= te:
If you want to mana= ge batching yourself you can use the manual flush mode. Easiest would be th= e auto flush background mode.

Todd
<= div class=3D"gmail_extra">
On Oct 30, 2017 11= :10 PM, "Chao Sun" <sunchao@uber.com> wrote:
Hi Todd,

Thanks= for the reply! I used a single Kafka consumer to pull the data.
= For Kudu, I was doing something very simple that basically just follow the = example here.
In specific:

loop = {
=C2=A0 Insert insert =3D kuduTable.newInsert();
=C2= =A0 PartialRow row =3D insert.getRow();
=C2=A0 // fill the column= s
=C2=A0 kuduSession.apply(insert)
}

I didn't specify the flushing mode, so it will pick up the AUTO_FLUSH= _SYNC as default?
should I=C2=A0use MANUAL_FLUSH?

Th= anks,
Chao

On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <todd@c= loudera.com> wrote:
Hey Chao,

Nice= to hear you are checking out Kudu.

What are you using to consume from Kafka and write to Kudu? Is = it possible that it is Java code and you are using the SYNC flush mode? Tha= t would result in a separate round trip for each record and thus very low t= hroughput.

Todd

On Oct 30, 2017 10:23 PM, "Chao Sun" <sunchao@uber.com> = wrote:
Hi= ,

We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, = revision af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster= .
The data are coming from Kafka at a rate of around 30K / sec, a= nd hash partitioned into 128 buckets. However, with default settings, Kudu = can only consume the topics at a rate of around 1.5K / second. This is a di= rect ingest with no transformation on the data.

Co= uld this because I was using the default configurations? also we are using = Kudu on HDD - could that also be related?

Any hel= p would be appreciated. Thanks.

Best,
Ch= ao







<= /div>-= -
Todd Lipcon
Software Engineer, = Cloudera




--
=
Todd Lipc= on
Software Engineer, Cloudera
--001a114542888fb6b8055ce5f18f--