Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7378D1030B for ; Thu, 5 Feb 2015 03:31:41 +0000 (UTC) Received: (qmail 92032 invoked by uid 500); 5 Feb 2015 03:31:40 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 91991 invoked by uid 500); 5 Feb 2015 03:31:40 -0000 Mailing-List: contact user-help@storm.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.apache.org Delivered-To: mailing list user@storm.apache.org Received: (qmail 91979 invoked by uid 99); 5 Feb 2015 03:31:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 03:31:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael@fullcontact.com designates 209.85.213.178 as permitted sender) Received: from [209.85.213.178] (HELO mail-ig0-f178.google.com) (209.85.213.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 03:31:36 +0000 Received: by mail-ig0-f178.google.com with SMTP id hl2so8884984igb.5 for ; Wed, 04 Feb 2015 19:30:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=r9gPEMtGmNqxUP+30kRLngg5kx25HU9vDFVLsBfvuso=; b=k3pscR+1T7u6SMKQwoIflNtESvxWAh9DOVAZGlIK12jGWVRsTIHlKABCv+/n/4lqxD j1MBcjSxpG9vS0wn7J8do6dRjQwGpuRVmEBN1NiF5YA7/7+21zKsKl+IwVilWK5LcIFH K7oD+G0+Dm6qAfdV2Az9yGr+/d7dcR917GEPrw+Yof83UzGndJhUE4ErEXacd8jwkDZv c7U6oY6eQrV56+B5wg9lq41Py2Fg7Rmj6EcrH3gTXckW7xKqOMflBnRUsD5aVUjPLei/ SopH61n6pEmBNSvVAi+Qln+P2dXNqeNgVGqOWahLDO6VK2ZcMoTABRi6kxbO7M8zmS6l cfRA== X-Gm-Message-State: ALoCoQlBYFMnHpV/zDg6twuYgkaQhmB4Dsz6Js6BdG3+Zyj8PpxnaNB7Mg9kL3qr/t1ao/tTgkma MIME-Version: 1.0 X-Received: by 10.107.19.38 with SMTP id b38mr1770037ioj.35.1423107026200; Wed, 04 Feb 2015 19:30:26 -0800 (PST) Received: by 10.64.32.99 with HTTP; Wed, 4 Feb 2015 19:30:26 -0800 (PST) In-Reply-To: References: Date: Wed, 4 Feb 2015 20:30:26 -0700 Message-ID: Subject: Re: kafkaspout is very slow From: Michael Rose To: "user@storm.apache.org" Content-Type: multipart/alternative; boundary=001a113eeb18c642a8050e4eecb3 X-Virus-Checked: Checked by ClamAV on apache.org --001a113eeb18c642a8050e4eecb3 Content-Type: text/plain; charset=UTF-8 How does your CPU look at 23000 tuples/s? Still low? Have you profiled to see if anything is blocking? Is your spout constantly doing work? *Michael Rose* Senior Platform Engineer *Full*Contact | fullcontact.com m: +1.720.837.1357 | t: @xorlev All Your Contacts, Updated and In One Place. Try FullContact for Free On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse wrote: > I bumped the kafka buffer/fetch sizes to > > kafka.fetch.size.bytes: 12582912 > kafka.buffer.size.bytes: 12582912 > > The throughput almost doubled (to about 23000 un-acked tuples/second). It > seems increasing the sizes for these two parameters further does not > improve the performance further. Is there anything else that I can try? > > On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse > wrote: > >> 100,000 records is about 12MB. >> I'll try bumping the numbers, by 100 fold to see if it makes any >> difference. >> thanks, >> -Clay >> >> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura < >> filipa.mendesmoura@gmail.com> wrote: >> >>> I would bump these numbers up by a lot: >>> >>> kafka.fetch.size.bytes: 102400 kafka.buffer.size.bytes: 102400 >>> >>> Say 10 or 100 times that or more. I dont know by heart how much I >>> increased those numbers on my topology. >>> >>> How many bytes are you writting per minute on kafka? Try dumping 1 >>> minute of messages to a file to figure out how many bytes that is.. >>> I am reading (sending data to the topic) about 100,000 records per >>> second. My kafka consumer can consume the 3 millions records in less than >>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even >>> get 1500 records per second from the topology. With ack disabled, I get >>> about 12000/second. >>> I don't lose any data, it is just the data is emitted from the spout to >>> the bolt very slowly. >>> >>> I did bump my buffer sizes but I am not sure if they are sufficient. >>> >>> topology.transfer.buffer.size: 2048 >>> topology.executor.buffer.size: 65536 >>> topology.receiver.buffer.size: 16 >>> topology.executor.send.buffer.size: 65536 >>> >>> kafka.fetch.size.bytes: 102400 >>> kafka.buffer.size.bytes: 102400 >>> >>> thanks >>> Clay >>> >>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura < >>> filipa.mendesmoura@gmail.com> wrote: >>> >>>> can you share a screenshot of the Storm UI for your spout? >>>> >>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse >>>> wrote: >>>> >>>>> I have this issue with any amount of load. Different max spout >>>>> pendings do not seem to make much a difference. I've lowered this parameter >>>>> to 100, still a little difference . At this point the bolt consuming the >>>>> data does no processing. >>>>> >>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis >>>>> wrote: >>>>> >>>>>> I'm not sure, that i understand your problem .. but here is few >>>>>> points: >>>>>> If you have large pending spout size and slow processing - you will >>>>>> see large latency at kafka spout probably. Spout emits message .. it stays >>>>>> in queue for long time (that will add latency) .. and finally is processed >>>>>> and ack received. You will see queue time + processing time in kafka spout >>>>>> latency. >>>>>> Take a look at load factors of your bolts - are they close to 1 or >>>>>> more ? and load factor of kafka spout. >>>>>> >>>>>> On 4 February 2015 at 21:19, Andrey Yegorov >>>>> > wrote: >>>>>> >>>>>>> have you tried increasing max spout pending parameter for the spout? >>>>>>> >>>>>>> builder.setSpout("kafka", >>>>>>> new KafkaSpout(spoutConfig), >>>>>>> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >>>>>>> //the maximum parallelism you can have on a KafkaSpout is >>>>>>> the number of partitions >>>>>>> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >>>>>>> >>>>>>> ---------- >>>>>>> Andrey Yegorov >>>>>>> >>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse < >>>>>>> clayteahouse@gmail.com> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> In my topology, kafka spout is responsible for over 85% of the >>>>>>>> latency. I have tried different spout max pending and played with the >>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the >>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high >>>>>>>> throughput with the simple kafka consumer. >>>>>>>> >>>>>>>> thank you for your feedback >>>>>>>> Clay >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > --001a113eeb18c642a8050e4eecb3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
How does your CPU look at 23000 tuples/s? Still low?
<= br>
Have you profiled to see if anything is blocking? Is your spo= ut constantly doing work?

Micha= el Rose
Senior Platform Engineer
FullContact=C2=A0|=C2=A0fullcontact.com
m: +1.720.837.1357=C2=A0| t:=C2=A0@xorlev<= /div>

All Your Contacts, Updated = and In One Place.

On Wed, Feb 4, 2015 at 8:20 PM, clay teahous= e <clayteahouse@gmail.com> wrote:
I bumped the kafka buffer/fetch sizes to=C2= =A0

kafka.fetch.size.bytes: =C2=A012582912
kafka.buffer.size.bytes: 12582912

The thro= ughput almost doubled (to about 23000 un-acked tuples/second). It seems inc= reasing the sizes for these two parameters further does not improve the per= formance further. Is there anything else that I can try?

On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <clayt= eahouse@gmail.com> wrote:
<= div dir=3D"ltr">100,000 records is about 12MB.
I'll try bumping the= numbers, by 100 fold to see if it makes any difference.
thanks,<= /div>
-Clay

On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <filipa.mendesmoura@gmail.com> wrote:

I would bump these numbers up by a lot:

kafka.fetch.size.bytes: 102400=C2=A0 =C2=A0 kafka.buffer.siz= e.bytes: 102400

Say 10 or 100 times that or more. I dont know by heart how m= uch I increased those numbers on my topology.

How many bytes are you writting per minute on kafka? Try dum= ping 1 minute of messages to a file to figure out how many bytes that is..<= /p>

I am reading (sending data to the topic) about 100,000 r= ecords per second. My kafka consumer can consume the 3 millions records in = less than 50 seconds. I have disabled the ack. But with the ack enabled, I = won't even get 1500 records per second from the topology. With ack disa= bled, I get about 12000/second.
I don't lose any data, it is just t= he data is emitted from the spout to the bolt very slowly.=C2=A0
=C2=A0I did bump my buffer sizes but I am not sure if they are suffi= cient.
=C2=A0 =C2=A0
=C2=A0 =C2=A0 topology.transfer.buf= fer.size: 2048
=C2=A0 =C2=A0 topology.executor.buffer.size: 65536=
=C2=A0 =C2=A0 topology.receiver.buffer.size: 16
=C2=A0= =C2=A0 topology.executor.send.buffer.size: 65536
=C2=A0=C2= =A0
=C2=A0 =C2=A0 kafka.fetch.size.bytes: 102400
= =C2=A0 =C2=A0 kafka.buffer.size.bytes: 102400

thanks
Clay

On Wed, Feb 4, 2015 at 4:24 PM, Filipa = Moura <filipa.mendesmoura@gmail.com> wrote:
can you share a =C2=A0screens= hot of the Storm UI for your spout?

On Wed, Feb 4, 2015 at 9:58 PM, clay teah= ouse <clayteahouse@gmail.com> wrote:
=C2=A0I have this issue with any amount of= load. Different max spout pendings do not seem to make much a difference. = I've lowered this parameter to 100, still a little difference . At this= point the bolt consuming the data does no processing.

On Wed, Feb 4, 2015 at 3:2= 6 PM, Haralds Ulmanis <haralds@evilezh.net> wrote:
I= 9;m not sure, that i understand your problem .. but here is few points:
=
If you have large pending spout size and slow processing - you will s= ee large latency at kafka spout probably. Spout emits message .. it stays i= n queue for long time (that will add latency) .. and finally is processed a= nd ack received. You will see queue time + processing time in kafka spout l= atency.
Take a look at load factors of your bolts - are they close= to 1 or more ? and load factor of kafka spout.

On 4 February 2015 at 21= :19, Andrey Yegorov <andrey.yegorov@gmail.com> wrote:=
have you tried increasi= ng max spout pending parameter for the spout?

build= er.setSpout("kafka",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0new KafkaSpout(spoutConfig= ),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //the maximum parallelism you can h= ave on a KafkaSpout is the number of partitions
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 .setMaxSpoutPending(TOPOLOGY_MAX_SPOUT_PENDING);

----= ------
Andrey Yegorov

On Tue, Feb 3, 2015 at 4:03 AM, clay teahous= e <clayteahouse@gmail.com> wrote:
Hi all,

In my topology, = =C2=A0kafka spout is responsible for over 85% of the latency. I have tried = different spout max pending and played with the buffer size and fetch size,= still no luck. Any hint on how to optimize the spout? The issue doesn'= t seem to be with the kafka side, as I see high throughput with the simple = kafka consumer.

thank you for your feedback
<= span>
Clay









--001a113eeb18c642a8050e4eecb3--