trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Libbey <mlib...@apache.org>
Subject Re: thundering herd best practises
Date Thu, 09 Jul 2015 17:43:02 GMT
Thanks Sudheer-I read through the comments in TS-3549, but I don't grok what we are supposed
to do in ATS 5.3x+ to get the almost Stale While Revalidate configured. Seems like this would
be a great place to modify -- HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation (and
probably also need any new options in records.config — Apache Traffic Server 6.0.0 documentation
|   |
|   |   |   |   |   |
| HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentationFuzzy Revalidation¶ Traffic
Server can be set to attempt to revalidate an object before it becomesstale in cache. records.config
contains the settings:  |
|  |
| View on docs.trafficserver.apache.org | Preview by Yahoo |
|  |
|   |



|   |
|   |   |   |   |   |
| records.config — Apache Traffic Server 6.0.0 documentationrecords.config The records.config
file (by default, located in/usr/local/etc/trafficserver/) is a list of configurable variables
used bythe Traffic Server software.  |
|  |
| View on docs.trafficserver.apache.org | Preview by Yahoo |
|  |
|   |


miles
 


     On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <sudheerv@yahoo-inc.com> wrote:
   

 There's no way to completely avoid multiple concurrent requests to the origin, without using
something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core. 
ATS 5.3.x+ supports an almost-SWR like solution with TS-3549. A complete SWR solution (in
the core ATS) is planned to be implemented with [TS-3587] Support stale-while-revalidate
in the core - ASF JIRA. There are a number of timers and other settings that are relevant
to the issues you mentioned (e.g TS-3622).
If you absolutely do not care about latency, you may try the existing stale-while-revalidate plugin.
I've not used it myself (we have an internal more efficient version of the same plugin) but,
I've heard that, the plugin doesn't work as desired.
(PS: you may need to be careful, since with read-while-write, we've experienced requests taking
longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any
kind of request, let alone the HLS use case).
Thanks,
Sudheer




 


     On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <zajakala@gmail.com> wrote:
   

 Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding
origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS
is:
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set
by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM 

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything? 
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio.
However there are cases when increasing the number of incoming requests to ATS causes it to
flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the
request).

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make
ALL clients requesting a file (but the first one) wait until the first one retrieves headers
and because of enable_read_while_writer they would then serve the retrieved file. However
I'm seeing in squid.blog that sometimes during 100ms or more there are multiple TCP_MISS and
origin server requests for the same file. I tried tweaking values of open_read timeout and
retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.


My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can
make client wait even 5s if necessary), but I don't want to hit origin. Is this possible?
Do I need to adjust the settings? Or is there some reason that this cannot be achieved on
high number of requests?

I would greatly appreciate any suggestions!

Thanks
Mateusz

Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine
with 2x10Gbps eth. No observable load problems with >1K requests /s.

   

  
Mime
View raw message