Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB53017C43 for ; Wed, 4 Feb 2015 22:00:06 +0000 (UTC) Received: (qmail 14907 invoked by uid 500); 4 Feb 2015 22:00:07 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 14861 invoked by uid 500); 4 Feb 2015 22:00:06 -0000 Mailing-List: contact user-help@storm.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.apache.org Delivered-To: mailing list user@storm.apache.org Received: (qmail 14851 invoked by uid 99); 4 Feb 2015 22:00:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2015 22:00:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of filipa.mendesmoura@gmail.com designates 209.85.215.54 as permitted sender) Received: from [209.85.215.54] (HELO mail-la0-f54.google.com) (209.85.215.54) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Feb 2015 21:59:41 +0000 Received: by mail-la0-f54.google.com with SMTP id s18so3017671lam.13 for ; Wed, 04 Feb 2015 13:58:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4g6S/Yu/Fvky1LJ1WcQmcUodz7ueddRw6g6JQUf9NAI=; b=H5cxPXCBzAtY0Q+UkRSKj4pjIhrnFl3T7f/uzBhS//eIRgcpzdjQxtTqU9GwYavSJV dvhNN+RSCBN3Msz+z3i/6nCvmANAUA/6vJqzqFoeuDP36L/CcSfc4lMwPWxh1o1hChwN HGHsNcqjmWuk/EgoeLi5ihRDxYZ5Zpi1qLn3j4ME++WYsRRdkCOKZWpvEK0VoInHjGED 71j06rPa1iq+7gwN6el/3euWps+102Izg2CKiHX61iuxXw4mliw4vjPV2fyzeg+dvkWj 7gGryS/C5JfuJLNxwtQAYdnLg9t2s+tnjql0AXliP3Fsu4JQQP2mYnJzz/g+b12zMEn/ ZQEg== MIME-Version: 1.0 X-Received: by 10.152.183.196 with SMTP id eo4mr602922lac.0.1423087134729; Wed, 04 Feb 2015 13:58:54 -0800 (PST) Received: by 10.114.67.136 with HTTP; Wed, 4 Feb 2015 13:58:54 -0800 (PST) In-Reply-To: References: Date: Wed, 4 Feb 2015 21:58:54 +0000 Message-ID: Subject: Re: kafkaspout is very slow From: Filipa Moura To: user@storm.apache.org Content-Type: multipart/alternative; boundary=001a113456d4266d2c050e4a4bd5 X-Virus-Checked: Checked by ClamAV on apache.org --001a113456d4266d2c050e4a4bd5 Content-Type: text/plain; charset=UTF-8 How many messages are you reading per second? I had a few problems with my spout originally but it was either because 1) was not acking the messages and because of max pending they weren't been thrown away from the "queue" 2) buffer size and fetch size was too small: have you tried to figure out how many bytes you write from Kafka and increase the sizes to that size? this helped in my case. 3) was trying to read too far from the past when I restarted the topology so ended up consuming only latest offset. With the above tweaks I was able to increase my throughput to 9 times more..it obviously depends on size of messages but this helped me.. as Haralds suggested, have a look at the dashboard and try to understand where the problem is.. On Wed, Feb 4, 2015 at 9:26 PM, Haralds Ulmanis wrote: > I'm not sure, that i understand your problem .. but here is few points: > If you have large pending spout size and slow processing - you will see > large latency at kafka spout probably. Spout emits message .. it stays in > queue for long time (that will add latency) .. and finally is processed and > ack received. You will see queue time + processing time in kafka spout > latency. > Take a look at load factors of your bolts - are they close to 1 or more ? > and load factor of kafka spout. > > On 4 February 2015 at 21:19, Andrey Yegorov > wrote: > >> have you tried increasing max spout pending parameter for the spout? >> >> builder.setSpout("kafka", >> new KafkaSpout(spoutConfig), >> TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >> .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT) >> //the maximum parallelism you can have on a KafkaSpout is the >> number of partitions >> .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*); >> >> ---------- >> Andrey Yegorov >> >> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse >> wrote: >> >>> Hi all, >>> >>> In my topology, kafka spout is responsible for over 85% of the latency. >>> I have tried different spout max pending and played with the buffer size >>> and fetch size, still no luck. Any hint on how to optimize the spout? The >>> issue doesn't seem to be with the kafka side, as I see high throughput with >>> the simple kafka consumer. >>> >>> thank you for your feedback >>> Clay >>> >>> >> > --001a113456d4266d2c050e4a4bd5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
How many messages are you reading per second?=C2=A0
I = had a few problems with my spout originally but it was either because
=
1) was not acking the messages and because of max pending they weren&#= 39;t been thrown away from the "queue"
2) buffer size a= nd fetch size was too small: have you tried to figure out how many bytes yo= u write from Kafka and increase the sizes to that size? this helped in my c= ase.
3) was trying to read too far from the past when I restarted= the topology so ended up consuming only latest offset.

With the above tweaks I was able to increase my throughput to 9 times= more..it obviously depends on size of messages but this helped me..
<= div>as Haralds suggested, have a look at the dashboard and try to understan= d where the problem is..


On Wed, Feb 4, 2015 at 9:26 PM, Haralds U= lmanis <haralds@evilezh.net> wrote:
I'm not sure, that i understan= d your problem .. but here is few points:
If you have large pendin= g spout size and slow processing - you will see large latency at kafka spou= t probably. Spout emits message .. it stays in queue for long time (that wi= ll add latency) .. and finally is processed and ack received. You will see = queue time + processing time in kafka spout latency.
Take a look a= t load factors of your bolts - are they close to 1 or more ? and load facto= r of kafka spout.

On 4 February 2015 at 21= :19, Andrey Yegorov <andrey.yegorov@gmail.com> wrote:=
have you tried increasi= ng max spout pending parameter for the spout?

build= er.setSpout("kafka",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0new KafkaSpout(spoutConfig= ),
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //the maximum parallelism you can h= ave on a KafkaSpout is the number of partitions
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 .setMaxSpoutPending(TOPOLOGY_MAX_SPOUT_PENDING);

----= ------
Andrey Yegorov

On Tue, Feb 3, 2015 at 4:03 AM, clay teahous= e <clayteahouse@gmail.com> wrote:
Hi all,

In my topology, = =C2=A0kafka spout is responsible for over 85% of the latency. I have tried = different spout max pending and played with the buffer size and fetch size,= still no luck. Any hint on how to optimize the spout? The issue doesn'= t seem to be with the kafka side, as I see high throughput with the simple = kafka consumer.

thank you for your feedback
<= span>
Clay




--001a113456d4266d2c050e4a4bd5--