Mailing-List: contact user-help@storm.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@storm.apache.org
Received-SPF: pass (nike.apache.org: domain of filipa.mendesmoura@gmail.com
 designates 209.85.215.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAJ8fQQaTD1siO_KToQ9DK3PW9o-C-B8C1NJBKQ+UR7NdMp2u8Q@mail.gmail.com>
References: 
 <CABcMBhBLPkqz7kK_PH8R0KGn+mhG35-hd3-owtNCrzBRhO1xFQ@mail.gmail.com>
	<CAJuQM_5H5o5cauMwwYA+d9xzFHEzmqjn5SrognQ+uTogzhJ7Yg@mail.gmail.com>
	<CAJ8fQQaTD1siO_KToQ9DK3PW9o-C-B8C1NJBKQ+UR7NdMp2u8Q@mail.gmail.com>
Date: Wed, 4 Feb 2015 21:58:54 +0000
Message-ID: 
 <CA+ngJuAAwM6rztQ6dNnZs62F2FaM_wYPsTOvobvmeBAFDUr07w@mail.gmail.com>
Subject: Re: kafkaspout is very slow
From: Filipa Moura <filipa.mendesmoura@gmail.com>
To: user@storm.apache.org
Content-Type: multipart/alternative; boundary=001a113456d4266d2c050e4a4bd5

--001a113456d4266d2c050e4a4bd5
Content-Type: text/plain; charset=UTF-8

How many messages are you reading per second?
I had a few problems with my spout originally but it was either because
1) was not acking the messages and because of max pending they weren't been
thrown away from the "queue"
2) buffer size and fetch size was too small: have you tried to figure out
how many bytes you write from Kafka and increase the sizes to that size?
this helped in my case.
3) was trying to read too far from the past when I restarted the topology
so ended up consuming only latest offset.

With the above tweaks I was able to increase my throughput to 9 times
more..it obviously depends on size of messages but this helped me..
as Haralds suggested, have a look at the dashboard and try to understand
where the problem is..


On Wed, Feb 4, 2015 at 9:26 PM, Haralds Ulmanis <haralds@evilezh.net> wrote:

> I'm not sure, that i understand your problem .. but here is few points:
> If you have large pending spout size and slow processing - you will see
> large latency at kafka spout probably. Spout emits message .. it stays in
> queue for long time (that will add latency) .. and finally is processed and
> ack received. You will see queue time + processing time in kafka spout
> latency.
> Take a look at load factors of your bolts - are they close to 1 or more ?
> and load factor of kafka spout.
>
> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com>
> wrote:
>
>> have you tried increasing max spout pending parameter for the spout?
>>
>> builder.setSpout("kafka",
>>                        new KafkaSpout(spoutConfig),
>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           //the maximum parallelism you can have on a KafkaSpout is the
>> number of partitions
>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>
>> ----------
>> Andrey Yegorov
>>
>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In my topology,  kafka spout is responsible for over 85% of the latency.
>>> I have tried different spout max pending and played with the buffer size
>>> and fetch size, still no luck. Any hint on how to optimize the spout? The
>>> issue doesn't seem to be with the kafka side, as I see high throughput with
>>> the simple kafka consumer.
>>>
>>> thank you for your feedback
>>> Clay
>>>
>>>
>>
>

--001a113456d4266d2c050e4a4bd5
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">How many messages are you reading per second?=C2=A0<div>I =
had a few problems with my spout originally but it was either because</div>=
<div>1) was not acking the messages and because of max pending they weren&#=
39;t been thrown away from the &quot;queue&quot;</div><div>2) buffer size a=
nd fetch size was too small: have you tried to figure out how many bytes yo=
u write from Kafka and increase the sizes to that size? this helped in my c=
ase.</div><div>3) was trying to read too far from the past when I restarted=
 the topology so ended up consuming only latest offset.</div><div><br></div=
><div>With the above tweaks I was able to increase my throughput to 9 times=
 more..it obviously depends on size of messages but this helped me..</div><=
div>as Haralds suggested, have a look at the dashboard and try to understan=
d where the problem is..</div><div><br></div></div><div class=3D"gmail_extr=
a"><br><div class=3D"gmail_quote">On Wed, Feb 4, 2015 at 9:26 PM, Haralds U=
lmanis <span dir=3D"ltr">&lt;<a href=3D"mailto:haralds@evilezh.net" target=
=3D"_blank">haralds@evilezh.net</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><div><div>I&#39;m not sure, that i understan=
d your problem .. but here is few points:<br></div>If you have large pendin=
g spout size and slow processing - you will see large latency at kafka spou=
t probably. Spout emits message .. it stays in queue for long time (that wi=
ll add latency) .. and finally is processed and ack received. You will see =
queue time + processing time in kafka spout latency.<br></div>Take a look a=
t load factors of your bolts - are they close to 1 or more ? and load facto=
r of kafka spout. <br></div><div class=3D"HOEnZb"><div class=3D"h5"><div cl=
ass=3D"gmail_extra"><br><div class=3D"gmail_quote">On 4 February 2015 at 21=
:19, Andrey Yegorov <span dir=3D"ltr">&lt;<a href=3D"mailto:andrey.yegorov@=
gmail.com" target=3D"_blank">andrey.yegorov@gmail.com</a>&gt;</span> wrote:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">have you tried increasi=
ng max spout pending parameter for the spout?<div><br></div><div><div>build=
er.setSpout(&quot;kafka&quot;,</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0new KafkaSpout(spoutConfig=
),</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)</div><div>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)</div>=
<div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //the maximum parallelism you can h=
ave on a KafkaSpout is the number of partitions</div><div>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 .setMaxSpoutPending(<b>TOPOLOGY_MAX_SPOUT_PENDING</b>);</=
div></div></div><div class=3D"gmail_extra"><br clear=3D"all"><div><div>----=
------<br>Andrey Yegorov</div></div>
<br><div class=3D"gmail_quote">On Tue, Feb 3, 2015 at 4:03 AM, clay teahous=
e <span dir=3D"ltr">&lt;<a href=3D"mailto:clayteahouse@gmail.com" target=3D=
"_blank">clayteahouse@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr">Hi all,<div><br></div><div>In my topology, =
=C2=A0kafka spout is responsible for over 85% of the latency. I have tried =
different spout max pending and played with the buffer size and fetch size,=
 still no luck. Any hint on how to optimize the spout? The issue doesn&#39;=
t seem to be with the kafka side, as I see high throughput with the simple =
kafka consumer.</div><div><br></div><div>thank you for your feedback</div><=
span><font color=3D"#888888"><div>Clay</div><div><br></div></font></span></=
div>
</blockquote></div><br></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a113456d4266d2c050e4a4bd5--