Return-Path: Delivered-To: apmail-trafficserver-users-archive@www.apache.org Received: (qmail 43796 invoked from network); 17 Sep 2010 02:10:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Sep 2010 02:10:02 -0000 Received: (qmail 31129 invoked by uid 500); 17 Sep 2010 02:10:02 -0000 Delivered-To: apmail-trafficserver-users-archive@trafficserver.apache.org Received: (qmail 31038 invoked by uid 500); 17 Sep 2010 02:10:01 -0000 Mailing-List: contact users-help@trafficserver.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@trafficserver.apache.org Delivered-To: mailing list users@trafficserver.apache.org Received: (qmail 31030 invoked by uid 99); 17 Sep 2010 02:10:01 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Sep 2010 02:10:01 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pranavadesai@gmail.com designates 74.125.82.170 as permitted sender) Received: from [74.125.82.170] (HELO mail-wy0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Sep 2010 02:09:38 +0000 Received: by wyf19 with SMTP id 19so2035220wyf.29 for ; Thu, 16 Sep 2010 19:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=x1e3wYLXb5LQcUbddHsmJrZRGdoXrfpNgwNTEMZzV/Y=; b=eNrwrjU0BciE8mc9zvaIpD0bMSRjNqQdpmBd/VogZDqoHIvuwh4cchseL6+zv2TB6e yK3WROsIQM0f3HL5vNM5kqid09v7hV4kslaeI1eycEfJWaPFIcB6CtSnnK2XiS9jBwJu lR01THJJwCmkSSf+n/Kvswtt6F1XHu86eW/QI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=HnmuQZip24PT3VKgmglHz64mEujBspJB2aYSR9qiiuz0jp6zN0no/e1NGaYYJ7uZ0w qo1R6rbDElZlyQn74FgrDiQfCH280Ttk3TXW7yao/dM/Iw7gZcvHpT1hC9xLUWZC/K8+ P0MDhucyiZgYxJ9uTlCF0gJ3GhqEmLaVa4Vx4= MIME-Version: 1.0 Received: by 10.227.147.82 with SMTP id k18mr3623070wbv.64.1284689358277; Thu, 16 Sep 2010 19:09:18 -0700 (PDT) Received: by 10.227.131.3 with HTTP; Thu, 16 Sep 2010 19:09:18 -0700 (PDT) In-Reply-To: <4C92C6B5.1070909@apache.org> References: <4C92C6B5.1070909@apache.org> Date: Thu, 16 Sep 2010 19:09:18 -0700 Message-ID: Subject: Re: Need some help tuning for large file performance From: Pranav Desai To: Leif Hedstrom Cc: users@trafficserver.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Sep 16, 2010 at 6:39 PM, Leif Hedstrom wrote: > =A0On 09/16/2010 05:04 PM, Pranav Desai wrote: >> >> Hi! >> >> I am running some performance test with large files. As mentioned in >> one of the earlier threads I am using curl-loader for testing with >> randomization in the URL to stress the cache. >> >> Version: 2.0.1 >> >> Config: >> CONFIG proxy.config.cache.ram_cache.size LLONG 2097152000 >> CONFIG proxy.config.cache.ram_cache_cutoff LLONG 100048576 > > First thing, can you make sure when serving a single 15MB object out of > cache, that it serves it out of RAM cache, and that it doesn't hit the di= sk > at all (other than logs, but might want to turn that off, to make sure th= e > only disk I/O is cache)? We had a problem in the past where it'd hit the > disk for certain large object even though they should fit in RAM (that > should be fixed / gone though). Initially with the default cutoff value of 1MB, I didnt see any 'Bytes Used' under RAM in cache-stats. So I figured cutoff might be the value that is like max object size to be put in mem. After increasing that value to 100MB, I started seeing those values bump up. I will still reconfirm that everything is served from RAM. > >> storage.config >> /mnt/cache/trafficserver 60368709120 >> the file is 15MB in size. > > The first thing I'd recommend (which holds true for all ATS versions) is = to > switch to the raw device cache. The on-filesystem cache is primarily > intended for testing / development, real usage should use a raw device (f= or > direct I/O). The raw device cache should be superior in performance and > reliability. > Done. Using 2 disks. Do you recommend a raid config for better performance = ? >> * The url randomness is just a number within that range in the URL. >> * There are 500 clients each access the URL 50 times. >> >> * So in the best case scenario with only a single URL, I can get 700+ >> Mbps and I think I can get more if I use 2 client machines and more >> network cards. Currently the testbed is limited to 1Gbps. >> * As I can increase the randomness, so essentially there are 2000 >> unique URLs, the performance drops significantly. > > This is not entirely surprising. This version of ATS (v2.0.x) partitions = the > disk(s) into 8GB partitions, and each such partition has it's own disk > position "pointer". It'd be interesting to see if you get the same > performance up to 8GB cache size, and then notice a drop in performance w= hen > going from (say) 8GB to 15GB. This "problem" is completely eliminated in = ATS > v2.1.x (where each partition will be up to 0.5PB). > I see. So does it have to seek > > Yes, you definitely want to increase that, I'd recommend trying maybe 16 = - > 24 I/O threads per disk (spindle), and see if it makes a noticeable. Make > sure that if your disk is RAIDed (e.g. RAID1), that you adjust the I/O > threads accordingly (ATS has no way of knowing how many spindles are > actually behind a RAIDed disk, so it treats it as one). The setting would= be > > =A0 =A0CONFIG proxy.config.cache.threads_per_disk INT 16 > > (for example). I don't think it's in the default records.config, so you'l= l > have to add it manually I think. Another interesting configuration is > > =A0 =A0proxy.config.cache.min_average_object_size > > (default is 8000), which doesn't really affect performance, but if you kn= ow > that your cache is going to hold much larger objects than that, it can sa= ve > a large amount of memory increasing this (since it reduces the in-memory > directory size). > > > There might also be some network related kernel tuning that could improve > the situation a bit, I'd expect you to be able to drive the full GigE unl= ess > disk is becoming the bottleneck.' > Here are my tcp mem parameters. And req/sec isnt of concern here so I should be ok with the listen queue and backlogs. If you have any particular setting in mind please let me know. net.ipv4.tcp_mem =3D 1339776 1786368 2679552 net.ipv4.tcp_wmem =3D 4096 87380 8388608 net.ipv4.tcp_rmem =3D 4096 87380 8388608 > > We really have to fix TS-441 though, if you can figure out some reliable > (and hopefully easy) way of reproducing it, that would help tremendously. > I think I can reproduce it but under load, so it might be a bit difficult to debug it especially with all the threads. I will try to get to a simpler test case to reproduce it. Maybe I can run traffic_server alone with a single network and io thread ? How do you guys normally debug it ? > Cheers, > > -- leif > >