Mailing-List: contact users-help@trafficserver.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@trafficserver.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Date: Thu, 21 Mar 2013 22:23:00 +0000 (UTC)
From: Igor =?utf-8?Q?Gali=C4=87?= <i.galic@brainsware.org>
To: users@trafficserver.apache.org
Message-ID: <187551007.383708.1363904580283.JavaMail.root@brainsware.org>
In-Reply-To: 
 <CAKDbnM4XMGJTEgYBJ+QKkFDED3-xPqs+AEseqopyHyZ+PWLUeQ@mail.gmail.com>
References: 
 <CAKDbnM73eNMR4EPtQjEXirVaV84o-vBG4-uq_rqBdWkgEP7bcQ@mail.gmail.com>
 <CAKDbnM6R=7EqP800c=QsQGt+dCd34jLR3RkxqzJeg7Mzczd8UA@mail.gmail.com>
 <208911362.382249.1363856263623.JavaMail.root@brainsware.org>
 <CAKDbnM4pUJKKZsFTwN-0YNgyicgLdv_j7L-fXN-inf2hvGWvmg@mail.gmail.com>
 <CAKDbnM4jY3Cvzpd76EcZZeK1DPRe18+g6dA77aWuMN3wuC6G=w@mail.gmail.com>
 <1E842388-69D4-4F51-831D-33CEAEA890C9@gmail.com>
 <CAKDbnM4XMGJTEgYBJ+QKkFDED3-xPqs+AEseqopyHyZ+PWLUeQ@mail.gmail.com>
Subject: Re: ATS performs poorly proxying larger files
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_383707_898000989.1363904580281"
Thread-Topic: ATS performs poorly proxying larger files
Thread-Index: FJ1tgHuITU5cUZx9twI+yrQ+498Glw==

------=_Part_383707_898000989.1363904580281
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

This may be useful:=20

http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274814/thread=20

----- Original Message -----

> Hi Yongming,

> I haven't changed the networking configuraton but I've also noticed
> that once the first core is at 100% utilization the server won't
> answer all ping requests anymore and has packet loss. This might be
> a sign that all network traffic is handled by the first core isn't
> it?

> You can find a screenshot of the threading output of top here:
> http://i.imgur.com/X3te2Ru.png

> Best Regards
> Philip

> 2013/3/21 Yongming Zhao < ming.zym@gmail.com >

> > well, due to the high network traffic, have you make the 10Ge NIC
> > irq
> > balanced to multiple cpu?
>=20

> > and can you show us the threading CPU usage in the top?
>=20

> > thanks
>=20

> > =E5=9C=A8 2013-3-21=EF=BC=8C=E4=B8=8B=E5=8D=887:42=EF=BC=8CPhilip < fli=
ps01@gmail.com > =E5=86=99=E9=81=93=EF=BC=9A
>=20

> > > I've just upgraded to ATS 3.3.1-dev. The problem still is the
> > > same:
> > > http://i.imgur.com/1pHWQy7.png
> >=20
>=20

> > > The load goes on one core. (The server is only running ATS)
> >=20
>=20

> > > 2013/3/21 Philip < flips01@gmail.com >
> >=20
>=20

> > > > Hi Igor,
> > >=20
> >=20
>=20

> > > > I am using ATS 3.2.4, Debian 6 (Squeeze) and a 3.2.13 Kernel.
> > >=20
> >=20
>=20

> > > > I was using the "traffic_line -r" command to see the number of
> > > > origin
> > > > connections growing and htop/atop to see that only one core is
> > > > 100%
> > > > utilized. I've already tested the following changes to the
> > > > configuration:
> > >=20
> >=20
>=20

> > > > proxy.config.accept_threads -> 0
> > >=20
> >=20
>=20

> > > > proxy.config.exec_thread.autoconfig -> 0
> > >=20
> >=20
>=20
> > > > proxy.config.exec_thread.limit -> 120
> > >=20
> >=20
>=20

> > > > They had no effect there is still the one core that becomes
> > > > 100%
> > > > utilized and turns out to be a bottleneck.
> > >=20
> >=20
>=20

> > > > Best Regards
> > >=20
> >=20
>=20
> > > > Philip
> > >=20
> >=20
>=20

> > > > 2013/3/21 Igor Gali=C4=87 < i.galic@brainsware.org >
> > >=20
> >=20
>=20

> > > > > Hi Philip,
> > > >=20
> > >=20
> >=20
>=20

> > > > > Let's start with some simple data mining:
> > > >=20
> > >=20
> >=20
>=20

> > > > > which version of ATS are you running?
> > > >=20
> > >=20
> >=20
>=20
> > > > > What OS/Distro/version are you running it on?
> > > >=20
> > >=20
> >=20
>=20

> > > > > Are you looking at stats_over_http's output to determine
> > > > > what's
> > > > > going
> > > > > on in ATS?
> > > >=20
> > >=20
> >=20
>=20

> > > > > -- i
> > > >=20
> > >=20
> >=20
>=20

> > > > > > I have noticed the following strange behavior: Once the
> > > > > > number
> > > > > > of
> > > > > > origin connections start to increase and the proxying speed
> > > > > > collapses the first core is at 100% utilization while the
> > > > > > others
> > > > > > are
> > > > > > not even close to that. It seems like the origin requests
> > > > > > are
> > > > > > handled by the first core only. Is this expected behavior
> > > > > > that
> > > > > > can
> > > > > > be changed by editing the configuration or is this a bug?
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > 2013/3/20 Philip < flips01@gmail.com >
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > Hi,
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > I am running ATS on a pretty large server with two
> > > > > > > physical
> > > > > > > 6
> > > > > > > core
> > > > > > > XEON CPUs and 22 raw device disks. I want to use that
> > > > > > > server
> > > > > > > as
> > > > > > > a
> > > > > > > frontend for several fileservers. It is currently
> > > > > > > configured
> > > > > > > to
> > > > > > > be
> > > > > > > infront of two file-servers. The load on the ATS server
> > > > > > > is
> > > > > > > pretty
> > > > > > > low. About 1-4% disk utilization and 500Mbps of outgoing
> > > > > > > traffic.
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > Once I direct the traffic of the third file server
> > > > > > > towards
> > > > > > > ATS
> > > > > > > something strange happens:
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > - The number of origin connection increases continually.
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20
> > > > > > > - Requests that hit ATS and are not cached are served
> > > > > > > really
> > > > > > > slow
> > > > > > > to
> > > > > > > the client (about 35 kB/s) while requests that are served
> > > > > > > from
> > > > > > > the
> > > > > > > cache are blazingly fast.
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > The ATS server has a dedicated 10Gbps port that is not
> > > > > > > maxed
> > > > > > > out,
> > > > > > > no
> > > > > > > CPU core is maxxed, there is no swapping, there are no
> > > > > > > error
> > > > > > > logs
> > > > > > > and also the origin servers are not heavy utilized. It
> > > > > > > feels
> > > > > > > like
> > > > > > > there are not enough workers to process the origin
> > > > > > > requests.
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > Is there anything I can do to check if my theory is right
> > > > > > > and
> > > > > > > a
> > > > > > > way
> > > > > > > to increase the number of origin workers?
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > > > Best Regards
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20
> > > > > > > Philip
> > > > > >=20
> > > > >=20
> > > >=20
> > >=20
> >=20
>=20

> > > > > --
> > > >=20
> > >=20
> >=20
>=20
> > > > > Igor Gali=C4=87
> > > >=20
> > >=20
> >=20
>=20

> > > > > Tel: +43 (0) 664 886 22 883
> > > >=20
> > >=20
> >=20
>=20
> > > > > Mail: i.galic@brainsware.org
> > > >=20
> > >=20
> >=20
>=20
> > > > > URL: http://brainsware.org/
> > > >=20
> > >=20
> >=20
>=20
> > > > > GPG: 6880 4155 74BD FD7C B515 2EA5 4B1D 9E08 A097 C9AE
> > > >=20
> > >=20
> >=20
>=20

--=20
Igor Gali=C4=87=20

Tel: +43 (0) 664 886 22 883=20
Mail: i.galic@brainsware.org=20
URL: http://brainsware.org/=20
GPG: 6880 4155 74BD FD7C B515 2EA5 4B1D 9E08 A097 C9AE=20

------=_Part_383707_898000989.1363904580281
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"font-family: Verdana; font-size: 12pt; color: #00=
0000"><div>This may be useful:<br></div><div><br></div><div><a href=3D"http=
://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274814/thread" data-m=
ce-href=3D"http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274814=
/thread">http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274814/t=
hread</a></div><div><br></div><hr id=3D"zwchr"><blockquote style=3D"border-=
left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-wei=
ght:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Ari=
al,sans-serif;font-size:12pt;">Hi Yongming,<br><div><br></div>I haven't cha=
nged the networking configuraton but I've also noticed that once the first =
core is at 100% utilization the server won't answer all ping requests anymo=
re and has packet loss. This might be a sign that all network traffic is ha=
ndled by the first core isn't it?<br>
<br>You can find a screenshot of the threading output of top here: <a href=
=3D"http://i.imgur.com/X3te2Ru.png" target=3D"_blank">http://i.imgur.com/X3=
te2Ru.png</a><br><div><br></div>Best Regards<br>Philip<br><div><br></div><d=
iv class=3D"gmail_quote">2013/3/21 Yongming Zhao <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:ming.zym@gmail.com" target=3D"_blank">ming.zym@gmail.com</a>&=
gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word"><div>well, due to the high network traf=
fic, have you make the 10Ge NIC irq &nbsp;balanced to multiple cpu?</div><d=
iv><br></div><div>and can you show us the threading CPU usage in the top?&n=
bsp;</div>

<div><br></div>thanks<div><br><div><div>=E5=9C=A8 2013-3-21=EF=BC=8C=E4=B8=
=8B=E5=8D=887:42=EF=BC=8CPhilip &lt;<a href=3D"mailto:flips01@gmail.com" ta=
rget=3D"_blank">flips01@gmail.com</a>&gt; =E5=86=99=E9=81=93=EF=BC=9A</div>=
<div><div><br><blockquote>I've just upgraded to ATS 3.3.1-dev. The problem =
still is the same: <a href=3D"http://i.imgur.com/1pHWQy7.png" target=3D"_bl=
ank">http://i.imgur.com/1pHWQy7.png</a><br>

<br>The load goes on one core. (The server is only running ATS)<br><div><br=
></div>
<div class=3D"gmail_quote">2013/3/21 Philip <span dir=3D"ltr">&lt;<a href=
=3D"mailto:flips01@gmail.com" target=3D"_blank">flips01@gmail.com</a>&gt;</=
span><br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex">


Hi Igor,<br><div><br></div>I am using ATS 3.2.4, Debian 6 (Squeeze) and a 3=
.2.13 Kernel.<br><div><br></div>I was using the "traffic_line -r" command t=
o see the number of origin connections growing and htop/atop to see that on=
ly one core is 100% utilized. I've already tested the following changes to =
the configuration:<br>


<br>proxy.config.accept_threads -&gt; 0<br><div><br></div>proxy.config.exec=
_thread.autoconfig -&gt; 0<br>proxy.config.exec_thread.limit -&gt; 120<br><=
div><br></div>They had no effect there is still the one core that becomes 1=
00% utilized and turns out to be a bottleneck.<br>


<br>Best Regards<span><span color=3D"#888888" data-mce-style=3D"color: #888=
888;" style=3D"color: #888888;"><br>Philip</span></span><div><div><br><div>=
<br></div><div class=3D"gmail_quote">2013/3/21 Igor Gali=C4=87 <span dir=3D=
"ltr">&lt;<a href=3D"mailto:i.galic@brainsware.org" target=3D"_blank">i.gal=
ic@brainsware.org</a>&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div><div style=3D"font-size:12pt;font-family:Verdana"><div>Hi Philip,</div=
><div><br></div><div>Let's start with some simple data mining:&nbsp;</div><=
div><br></div><div>which version of ATS are you running?</div><div>What OS/=
Distro/version are you running it on?</div>


<div><br></div><div>Are you looking at stats_over_http's output to determin=
e what's going on in ATS?</div><div><br></div><div>-- i</div><div><br></div=
><hr><div><blockquote style=3D"padding-left:5px;font-size:12pt;font-style:n=
ormal;margin-left:5px;font-family:Helvetica,Arial,sans-serif;text-decoratio=
n:none;font-weight:normal;border-left:2px solid #1010ff">


I have noticed the following strange behavior: Once the number of origin co=
nnections start to increase and the proxying speed collapses the first core=
 is at 100% utilization while the others are not even close to that. It see=
ms like the origin requests are handled by the first core only. Is this exp=
ected behavior that can be changed by editing the configuration or is this =
a bug?<br>


<br><div><br></div><br><div class=3D"gmail_quote">2013/3/20 Philip <span di=
r=3D"ltr">&lt;<a href=3D"mailto:flips01@gmail.com" target=3D"_blank">flips0=
1@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi,<br><div><br></div>I am running ATS on a pretty large server with two ph=
ysical 6 core XEON CPUs and 22 raw device disks. I want to use that server =
as a frontend for several fileservers. It is currently configured to be inf=
ront of two file-servers. The load on the ATS server is pretty low. About 1=
-4% disk utilization and 500Mbps of outgoing traffic.<br>


<br>Once I direct the traffic of the third file server towards ATS somethin=
g strange happens:<br><div><br></div>- The number of origin connection incr=
eases <span style=3D"text-indent:0px;letter-spacing:normal;font-variant:nor=
mal;text-align:left;font-style:normal;display:inline!important;font-weight:=
normal;float:none;line-height:normal;text-transform:none;font-size:13px;whi=
te-space:nowrap;font-family:arial,sans-serif;word-spacing:0px">continually.=
<br>


- Requests that hit ATS and are not cached are served really slow to the cl=
ient (about 35 kB/s) while requests that are served from the cache are blaz=
ingly fast.<br><div><br></div>The ATS server has a dedicated 10Gbps port th=
at is not maxed out, no CPU core is maxxed, there is no swapping, there are=
 no error logs and also the origin servers are not heavy utilized. It feels=
 like there are not enough workers to process the origin requests.<br>


<br>Is there anything I can do to check if my theory is right and a way to =
increase the number of origin workers?<br><div><br></div>Best Regards<span>=
<span style=3D"color:#888888"><br>Philip<br></span></span></span>
</blockquote></div><br>
</blockquote><div><br><div><br></div></div><div><br></div></div><span><span=
 color=3D"#888888" data-mce-style=3D"color: #888888;" style=3D"color: #8888=
88;"><div>-- <br></div><div><span></span>Igor Gali=C4=87<br><div><br></div>=
Tel: <a href=3D"tel:%2B43%20%280%29%20664%20886%2022%20883" target=3D"_blan=
k">+43 (0) 664 886 22 883</a><br>


Mail: <a href=3D"mailto:i.galic@brainsware.org" target=3D"_blank">i.galic@b=
rainsware.org</a><br>URL: <a href=3D"http://brainsware.org/" target=3D"_bla=
nk">http://brainsware.org/</a><br>GPG: 6880 4155 74BD FD7C B515 &nbsp;2EA5 =
4B1D 9E08 A097 C9AE<br>


<span></span><br></div></span></span></div></div></blockquote></div><br>
</div></div></blockquote></div><br>
</blockquote></div></div></div><br></div></div></blockquote></div><br>
</blockquote><div><br><br></div><div><br></div><div>-- <br></div><div><span=
 name=3D"x"></span>Igor Gali=C4=87<br><div><br></div>Tel: +43 (0) 664 886 2=
2 883<br>Mail: i.galic@brainsware.org<br>URL: http://brainsware.org/<br>GPG=
: 6880 4155 74BD FD7C B515 &nbsp;2EA5 4B1D 9E08 A097 C9AE<br><span name=3D"=
x"></span><br></div></div></body></html>
------=_Part_383707_898000989.1363904580281--