trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pranav Desai <>
Subject Re: Need some help tuning for large file performance
Date Fri, 17 Sep 2010 02:09:18 GMT
On Thu, Sep 16, 2010 at 6:39 PM, Leif Hedstrom <> wrote:
>  On 09/16/2010 05:04 PM, Pranav Desai wrote:
>> Hi!
>> I am running some performance test with large files. As mentioned in
>> one of the earlier threads I am using curl-loader for testing with
>> randomization in the URL to stress the cache.
>> Version: 2.0.1
>> Config:
>> CONFIG proxy.config.cache.ram_cache.size LLONG 2097152000
>> CONFIG proxy.config.cache.ram_cache_cutoff LLONG 100048576
> First thing, can you make sure when serving a single 15MB object out of
> cache, that it serves it out of RAM cache, and that it doesn't hit the disk
> at all (other than logs, but might want to turn that off, to make sure the
> only disk I/O is cache)? We had a problem in the past where it'd hit the
> disk for certain large object even though they should fit in RAM (that
> should be fixed / gone though).

Initially with the default cutoff value of 1MB, I didnt see any 'Bytes
Used' under RAM in cache-stats. So I figured cutoff might be the value
that is like max object size to be put in mem. After increasing that
value to 100MB, I started seeing those values bump up.
I will still reconfirm that everything is served from RAM.

>> storage.config
>> /mnt/cache/trafficserver 60368709120
>> the file is 15MB in size.
> The first thing I'd recommend (which holds true for all ATS versions) is to
> switch to the raw device cache. The on-filesystem cache is primarily
> intended for testing / development, real usage should use a raw device (for
> direct I/O). The raw device cache should be superior in performance and
> reliability.

Done. Using 2 disks. Do you recommend a raid config for better performance ?

>> * The url randomness is just a number within that range in the URL.
>> * There are 500 clients each access the URL 50 times.
>> * So in the best case scenario with only a single URL, I can get 700+
>> Mbps and I think I can get more if I use 2 client machines and more
>> network cards. Currently the testbed is limited to 1Gbps.
>> * As I can increase the randomness, so essentially there are 2000
>> unique URLs, the performance drops significantly.
> This is not entirely surprising. This version of ATS (v2.0.x) partitions the
> disk(s) into 8GB partitions, and each such partition has it's own disk
> position "pointer". It'd be interesting to see if you get the same
> performance up to 8GB cache size, and then notice a drop in performance when
> going from (say) 8GB to 15GB. This "problem" is completely eliminated in ATS
> v2.1.x (where each partition will be up to 0.5PB).

I see. So does it have to seek

> Yes, you definitely want to increase that, I'd recommend trying maybe 16 -
> 24 I/O threads per disk (spindle), and see if it makes a noticeable. Make
> sure that if your disk is RAIDed (e.g. RAID1), that you adjust the I/O
> threads accordingly (ATS has no way of knowing how many spindles are
> actually behind a RAIDed disk, so it treats it as one). The setting would be
>    CONFIG proxy.config.cache.threads_per_disk INT 16
> (for example). I don't think it's in the default records.config, so you'll
> have to add it manually I think. Another interesting configuration is
>    proxy.config.cache.min_average_object_size
> (default is 8000), which doesn't really affect performance, but if you know
> that your cache is going to hold much larger objects than that, it can save
> a large amount of memory increasing this (since it reduces the in-memory
> directory size).
> There might also be some network related kernel tuning that could improve
> the situation a bit, I'd expect you to be able to drive the full GigE unless
> disk is becoming the bottleneck.'

Here are my tcp mem parameters. And req/sec isnt of concern here so I
should be ok with the listen queue and backlogs. If you have any
particular setting in mind please let me know.

net.ipv4.tcp_mem = 1339776      1786368 2679552
net.ipv4.tcp_wmem = 4096        87380   8388608
net.ipv4.tcp_rmem = 4096        87380   8388608

> We really have to fix TS-441 though, if you can figure out some reliable
> (and hopefully easy) way of reproducing it, that would help tremendously.

I think I can reproduce it but under load, so it might be a bit
difficult to debug it especially with all the threads. I will try to
get to a simpler test case to reproduce it. Maybe I can run
traffic_server alone with a single network and io thread ? How do you
guys normally debug it ?

> Cheers,
> -- leif

View raw message