Return-Path: X-Original-To: apmail-trafficserver-users-archive@www.apache.org Delivered-To: apmail-trafficserver-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2696017910 for ; Mon, 12 Jan 2015 03:51:55 +0000 (UTC) Received: (qmail 32731 invoked by uid 500); 12 Jan 2015 03:51:56 -0000 Delivered-To: apmail-trafficserver-users-archive@trafficserver.apache.org Received: (qmail 32607 invoked by uid 500); 12 Jan 2015 03:51:56 -0000 Mailing-List: contact users-help@trafficserver.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@trafficserver.apache.org Delivered-To: mailing list users@trafficserver.apache.org Received: (qmail 32596 invoked by uid 99); 12 Jan 2015 03:51:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Jan 2015 03:51:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of portl4t.cn@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Jan 2015 03:51:30 +0000 Received: by mail-ie0-f169.google.com with SMTP id y20so23480758ier.0 for ; Sun, 11 Jan 2015 19:49:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=U6eShJj1iqVbheJopKe4v/BZ6T364SLzUzg7KibFt/o=; b=vqZlVqaqtnQ1bwZKUpEx0RoVuq0qrcXw/mEuqhz9UP2oD6NngKTHL7LBrLS0h+yRd9 0G8WeeT0kodzDWfHOup3qW7EkokZj+7bvWehpD8hkF38omTDHQAnNtkEMjyjEkN0PihG EXicPGIsMHq1gmbR+F6JuIbjHvbENktPwqhizBXxNsuBeKUfLTj/CSLbQfOxBUzLzZXK X7gJjlRr0xXIcM83qKGYjmM3tNoLpGpNtulDXf1S7xk/5zOOzlBPG2Zm5f4WCdqN7/Q6 QAJVU5MNKQxRKATON4prxb5I2TTcisjHGhs5OsvJj/iHybkTQO+idkru+aHdth+r8V1x bWbQ== MIME-Version: 1.0 X-Received: by 10.107.39.77 with SMTP id n74mr4193143ion.40.1421034599121; Sun, 11 Jan 2015 19:49:59 -0800 (PST) Received: by 10.107.160.11 with HTTP; Sun, 11 Jan 2015 19:49:59 -0800 (PST) In-Reply-To: References: Date: Mon, 12 Jan 2015 11:49:59 +0800 Message-ID: Subject: Re: Interim cache - High CPU usage From: gang li To: users@trafficserver.apache.org Content-Type: multipart/alternative; boundary=001a11405f7c7e8cd7050c6c6664 X-Virus-Checked: Checked by ClamAV on apache.org --001a11405f7c7e8cd7050c6c6664 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I don't think it is a good idea to use interim cache if the size of the cache objects are very large. As the ssd disk is only 120GB, the objects on the HDD will be migrated to the ssd frequently, and the ssd storage is meaningless as it will be overwritten quickly, and this will increase the consumption of cpu and io. Can you give more infos from the perf, such as call graph. On Mon, Jan 12, 2015 at 4:52 AM, Daniel Biazus wrote: > Hi Guys, > > We' ve been using ATS as a reverse proxy, and a few week ago We started t= o > use the interim cache feature in more intense way, caching objects with t= he > average size of 200 MB and max size of 1 GB. > > *We have ~ 1 TB HDD as a default storage:* > > cat /etc/trafficserver/storage.config > > # ATS - Storage > /dev/sda6 volume=3D1 > > *And also, a 120 GB SDD storage as interim cache:* > > LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1 > > After 20 ~ 30 minutes in production with this configuration, We could > notice a sudden CPU high usage, increasing up to 65 %, considering that o= ur > regular usage is 10 %. However the throughput still stable in 250 Mbps pe= r > box. > > We've found the following behavior using the perf top tool: > > 88.18% traffic_server [.] > _Z15write_to_net_ioP10NetHandlerP18UnixNetVConnectionP7EThread > 0.32% traffic_server [.] _ZN10NetHandler12mainNetEventEiP5Even= t > 0.30% [kernel] [k] update_sd_lb_stats > 0.29% [e1000e] [k] e1000e_check_ltr_demote > 0.25% [kernel] [k] __ticket_spin_lock > 0.24% traffic_server [.] _ZN7EThread13process_eventEP5Eventi > 0.21% [kernel] [k] timerqueue_add > 0.17% libc-2.12.so [.] epoll_wait > 0.17% libpcre.so.0.0. [.] 0x00000000000100dd > 0.14% [kernel] [k] __schedule > > 1) This behavior is easily reproduced caching* large objects with interim > cache active*. > 2) With interim cache *disabled*, this behavior* is not reproduced.* > > As you can see, at the perf top output, the write_to_net_*io *function > is responsible for this heavy CPU usage. We would like to hear of you guy= s, > if anyone has faced a issue like that, or if you have any clues about thi= s > possible bug. > > Thanks & Regards, > > -- > > Daniel Biazus > infrastructure Engineering > Azion Technologies > Porto Alegre, Brasil +55 51 3012 3005 | +55 51 82279032 > Miami, USA +1 305 704 8816 > > Quaisquer informa=C3=A7=C3=B5es contidas neste e-mail e anexos podem ser > confidenciais e privilegiadas, protegidas por sigilo legal. Qualquer form= a > de utiliza=C3=A7=C3=A3o deste documento depende de autoriza=C3=A7=C3=A3o = do emissor, sujeito as > penalidades cab=C3=ADveis. > > Any information in this e-mail and attachments may be confidential and > privileged, protected by legal confidentiality. The use of this document > require authorization by the issuer, subject to penalties. > > --001a11405f7c7e8cd7050c6c6664 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I don't think it is a good idea to use interim ca= che if the size of the cache objects are very large. As the ssd disk is onl= y 120GB,=C2=A0 the objects on the HDD will be migrated to the ssd frequentl= y, and the ssd storage is meaningless as it will be overwritten quickly, an= d this will increase the consumption of cpu and io.

Can you gi= ve more infos from the perf, such as call graph.


On Mon, Jan 12, 2015 at 4:52 A= M, Daniel Biazus <daniel.biazus@azion.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Hi Guys,

= We' ve been using ATS as a reverse proxy, and a few week ago We started= to use the interim cache feature in more intense way, caching objects with= the average size of 200 MB and max size of 1 GB.=C2=A0

We have ~ 1 TB HDD as a default storage:

cat /etc/trafficserver/storage.config

# ATS - Storage
/dev/sda6 volume=3D1

And also, a 120 GB SDD storage as interim cache:

LOCAL proxy.config.cache.interim.storage STRING /dev/sdc1
=

After 20 ~ 30 minutes in production with this configura= tion, We could notice a sudden CPU high usage, increasing up to 65 %, consi= dering that our regular usage is 10 %. However the throughput still stable = in 250 Mbps per box.=C2=A0

We've found the fol= lowing behavior using the perf top tool:

=C2=A0 =C2=A088.18% =C2=A0traf= fic_server =C2=A0 =C2=A0 =C2=A0[.] _Z15write_to_net_ioP10NetHandlerP18UnixN= etVConnectionP7EThread
=C2=A0 =C2=A0 =C2=A00.32% =C2=A0tra= ffic_server =C2=A0 =C2=A0 =C2=A0[.] _ZN10NetHandler12mainNetEventEiP5Event<= /div>
=C2=A0 =C2=A0 =C2=A00.30% =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 [k] update_sd_lb_stats
=C2=A0 =C2=A0 =C2=A00.29= % =C2=A0[e1000e] =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [k] e1000e_check_ltr_de= mote
=C2=A0 =C2=A0 =C2=A00.25% =C2=A0[kernel] =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 [k] __ticket_spin_lock
=C2=A0 =C2=A0 =C2= =A00.24% =C2=A0traffic_server =C2=A0 =C2=A0 =C2=A0[.] _ZN7EThread13process_= eventEP5Eventi
=C2=A0 =C2=A0 =C2=A00.21% =C2=A0[kernel] =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 [k] timerqueue_add
=C2=A0 =C2= =A0 =C2=A00.17% =C2=A0li= bc-2.12.so=C2=A0=C2=A0 =C2=A0 =C2=A0 [.] epoll_wait
=C2=A0 = =C2=A0 =C2=A00.17% =C2=A0libpcre.so.0.0. =C2=A0 [.] 0x00000000000100dd
=C2=A0 =C2=A0 =C2=A00.14% =C2=A0[kernel] =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 [k] __schedule

1) This behavior is e= asily reproduced caching large objects with interim cache active.
2) With interim cache disabled, this behavior is not repr= oduced.

=C2=A0As you can see, at the per= f top output, the=C2=A0write_to_net_io function is responsible for this heavy CPU usage. We would lik= e to hear of you guys, if anyone has faced a issue like that, or if you hav= e any clues about this possible bug.

Thanks & Regards,

--


Daniel B= iazus
infrastructure Engineering
A= zion Technologies
Porto Alegre, Brasil=C2= =A0+55 51 3012 3005=C2=A0|=C2=A0+55 51 82279032
=
Miami, USA +1 305 704 8816

Quaisqu= er informa=C3=A7=C3=B5es contidas neste e-mail e anexos podem ser confidenc= iais e privilegiadas, protegidas por sigilo legal. Qualquer forma de utiliz= a=C3=A7=C3=A3o deste documento depende de autoriza=C3=A7=C3=A3o do emissor,= sujeito as penalidades cab=C3=ADveis.

Any information in this e-mail and attachments may be confidenti= al and privileged, protected by legal confidentiality. The use of this docu= ment require authorization by the issuer, subject to penalties.



--001a11405f7c7e8cd7050c6c6664--