trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Torluemke, Mark" <>
Subject Re: thundering herd best practises
Date Fri, 17 Jul 2015 13:28:12 GMT
Hi Mat,

I don’t think I saw it mentioned below, so I have to bring it up — read_while_writer does
not work properly in ATS v3.0.x. Additionally, you really want the other concurrency fixes
that I believe just went into ATS v5.3.0. All the other advice Sudheer gave is good, but we’ve
found that RWW functioning properly makes the biggest difference.


From: Sudheer Vinukonda <<>>
Reply-To: "<>" <<>>
Date: Wednesday, July 15, 2015 at 2:11 PM
To: "<>" <<>>
Subject: Re: thundering herd best practises

"The settings for open_read_retry come into play only when open read fails (i.e. before the
dirent for the cache object is created)."

Correction/Clarification - Before a dirent is created, the scenario should be just a regular
cache miss - Open read retry is not performed on a regular cache miss.

The scenario when the open read retry is applied is slightly more subtle, in that the dirent
is created, but, the open read fails due to either write_vc still not closed. TS-3622<>
and TS-3767<> are important fixes to include
related to these scenarios. Without these fixes, we've observed the read-while-write can get
stuck indefinitely (until an eventual inactivity timer at the txn level occurs, which could
be quite far away).



On Friday, July 10, 2015 9:51 AM, Sudheer Vinukonda <<>>

A "dirent" (proxy.process.cache.direntries.used) is basically the index for the object's location
in the cache (similar to inode).

The settings for open_read_retry come into play only when open read fails (i.e. before the
dirent for the cache object is created).

The behavior you described ("I'd expect the second Txn to see that there is write lock (so
object is being fetched) and WAIT - not go to origin") is precisely what read-while-writer
(rww) does, but, like I wrote in the last email, it doesn't kick in, until the object's response
headers are validated. There's a small window before rww kicks in, during which one of the
following could occur for multiple concurrent requests for the same object:

  a) open read fails --> open_read_retry would help in this case
  b) open read is successful, but, open write fails,
       *) rww is not kicked in yet --> use open_write_fail_action (max_open_write_retries
may also help (not sure)).
       *) rww kicks in ----> use rww to collapse the connections.



On Friday, July 10, 2015 9:19 AM, Mateusz Zajakala <<>>

Still, your comments are very helpful and much appreaciated! Your explanation is interesting,
however contrary to my expectations of "open read retry".

Docs state:
"While an object is being fetched from the origin server, subsequent requests would wait proxy.config.http.cache.open_read_retry_time<>
milliseconds before checking if the object can be served from cache. If the object is still
being fetched, the subsequent requests will retry proxy.config.http.cache.max_open_read_retries<>

So I'd expect the second Txn to see that there is write lock (so object is being fetched)
and WAIT - not go to origin. You say however, that the second Thx will be successful in obtaining
the read lock (because "dirent" is available, what is dirent?). This could explain the leakage,
but then I don't understand under what circumstances "open_read_retry" would kick in (if at

On Fri, Jul 10, 2015 at 6:07 PM, Sudheer Vinukonda <<>>
Here's my understanding based on what I've noticed in my code reading and tests:

When a request is received, the Txn (transaction) associated with it, first tries a cache
open read (basically, a simple lookup for the dirent). If the open read fails (on a cache
miss), the Txn tries a open write (basically, gets the write lock for the object) and goes
onto the origin to download the object. At this point the dirent for the object is created
and the write lock held by this Txn.

If a second request comes in at this point, the Txn associated with it tries an open read,
and, it doesn't fail (since, the dirent is already available). However, then the object in
cache is not in a state to kick read-while-writer in yet. Without the write lock, the Txn
would then, simply disable cache and goes to the origin.  The logic for a cache stale is more
or less similar.

This is where the new feature "open_write_fail_action" comes into play, to either return an
error (or a stale copy, if it's available). We haven't experimented with the cache_open_fail_max_write_retries
and perhaps, that might make things better too.



Disclaimer: I'm *not* an expert on ATS cache internals, so, I could well be stating something
that may not be entirely accurate.

On Friday, July 10, 2015 8:37 AM, Mateusz Zajakala <<>>

Thanks Sudheer!

However, I'm still not sure about what happens under the hood. Let's say we have 2 clients
requesting a file for the first time.

1) client 1, TCP_MISS, go to origin
2) very soon after - client 2, TCP_MISS. Now, if 1) already managed to get the headers, then
we can serve the file ( read-while-writer ). But if NOT, then there should be open read, so
we wait retry x timeout (I tried setting it to as much as 20 x 200 ms). During this time 1)
should finish download of the file, or at least get the headers to allow read-while-writer.
3) same scenario as in 2) should apply to any other incoming client requests for the same

Is this not the expected behaviour? Maybe I'm missing something, but it seems that after one
connection starts retrieval of origin data others should not repeat this. However, with very
high loads I still see leakage of requests to origin, and I'm not sure how exactly this happens.

Could it happen because client 2 arrives after client 1, but still before client 1 managed
to open read session to origin, so "open read" does not kick in? I have no idea how synchronization
is done between multiple requests for the same file, but I imagine one of them has to start
reading as the first one and this info would be available to others trying to read (and they
would then be stopped on open_read_retry)?

On Fri, Jul 10, 2015 at 5:12 PM, Sudheer Vinukonda <<>>
You may want to read through the below:

"While some other HTTP proxies permit clients to begin reading the response immediately upon
the proxy receiving data from the origin server, ATS does not begin allowing clients to read
until after the complete HTTP response headers have been read and processed. This is a side-effect
of ATS making no distinction between a cache refresh and a cold cache, which prevents knowing
whether a response is going to be cacheable.

As non-cacheable responses from an origin server are generally due to that content being unique
to different client requests, ATS will not enable read-while-writer functionality until it
has determined that it will be able to cache the object."

As explained in that doc, read-while-writer doesn't get kicked in until the response headers
for an object are received and validated. For a live streaming scenario, this leaves a tiny
window large enough (due to the large number of concurrent requests) to leak more than a single
request to the origin, despite enabling read-while-writer.

The open read retry settings do help to reduce this problem to a large extent, by attempting
to retry the read. There's also a setting <proxy.config.http.cache.max_open_write_retries>
that can be tuned to further improve this situation.

However, despite all the above tuning, we still noticed multiple requests leaking (although
significantly lower than without the tuning). Hence the need for the new feature Open Write
Fail Action<>.
With this setting, you can configure to return a 502 error on a cache miss, but, when there's
an ongoing concurrent request for the same object. This lets the client (player) reattempt
the request, by when the original concurrent request would have filled the cache. With this
feature, we don't see TCP_MISS more than once at any given instant for the same object anymore.

Let me know if you have more questions.



On Friday, July 10, 2015 12:19 AM, Mateusz Zajakala <<>>

Thanks for the explanation. While SWR does seem like a very useful feaure I don't think this
can help in my specific case.

In HLS the only object that expires often is the playlist manifest with very small size (hundreds
of bytes). I don't think we're having a problem with revalidation of these files. However
sometimes we are seeing origin flooded with requests for video segments (1-2 MB). These are
never revalidatoins, according to these are all TCP_MISS.

Take for example the following log:

1436442291.878 60 TCP_MISS/200 668669 GET
- DIRECT/<> video/m2pt -
1436442292.095 12 TCP_MISS/200 668669 GET
- DIRECT/<> video/m2pt -
1436442292.133 17 TCP_MISS/200 668669 GET
- DIRECT/<> video/m2pt -

As you can see we have three following requests for the same file. Each of them takes a short
time to process, they are separated in time, however all of them are TCP_MISS. With my setting
I'd expect a TCP_MISS on the first retrieval, and then clean TCP_HITs. And this is how it
usually works (even with high loads), only once in a while we see more requests getting through
to origin. When this happens origin slows down, procesing time is longer, more requests are
TCP_MISS and very soon we're killing origin with enormous traffic.

Is there any way to avoid this? Shouldn't open_read_retry take care of this?

I'm quite new to ATS and caching in general, so correct me if I misunderstood something..


On Fri, Jul 10, 2015 at 4:36 AM, Sudheer Vinukonda <<>>
I've updated the settings and the feature description in the relevant places. Also, it looks
like these are available in 6.0.0 (and are not in 5.3.x).



On Thursday, July 9, 2015 10:44 AM, Miles Libbey <<>>

Thanks Sudheer-
I read through the comments in TS-3549<>,
but I don't grok what we are supposed to do in ATS 5.3x+ to get the almost Stale While Revalidate
configured. Seems like this would be a great place to modify -- HTTP Proxy Caching — Apache
Traffic Server 6.0.0 documentation<>
(and probably also need any new options in records.config — Apache Traffic Server 6.0.0

HTTP Proxy Caching — Apache Traffic Server 6.0.0 documentation<>
Fuzzy Revalidation¶ Traffic Server can be set to attempt to revalidate an object before it
becomes stale in cache. records.config contains the settings:

View on<>

Preview by Yahoo

records.config — Apache Traffic Server 6.0.0 documentation<>
records.config The records.config file (by default, located in /usr/local/etc/trafficserver/)
is a list of configurable variables used by the Traffic Server software.

View on<>

Preview by Yahoo


On Thursday, July 9, 2015 7:57 AM, Sudheer Vinukonda <<>>

There's no way to completely avoid multiple concurrent requests to the origin, without using
something like the SWR (Stale-While-Revalidate) solution. You may want to take a look at Stale-While-Revalidate-in-the-core<>.

ATS 5.3.x+ supports an almost-SWR like solution with TS-3549<>.
A complete SWR solution (in the core ATS) is planned to be implemented with [TS-3587] Support
stale-while-revalidate in the core - ASF JIRA<>.
There are a number of timers and other settings that are relevant to the issues you mentioned
(e.g TS-3622<>).

If you absolutely do not care about latency, you may try the existing stale-while-revalidate<>
plugin. I've not used it myself (we have an internal more efficient version of the same plugin)
but, I've heard that, the plugin doesn't work as desired.

(PS: you may need to be careful, since with read-while-write, we've experienced requests taking
longer than 60 sec+, without the above optimizations, which is absolutely ridiculous for any
kind of request, let alone the HLS use case).



On Thursday, July 9, 2015 4:17 AM, Mateusz Zajakala <<>>

Hi everyone,

I'd like to get some insight into how I can configure and fine-tune ATS to eliminate flooding
origin server with requests on TCP_MISS and to make sure I undestand what I'm doing.

I hope this is the right place to ask :)

Case: we have origin server serving HLS video chunks + playlists. What this means for ATS
- we know that exactly every request is cacheable
- expiry time for playlists is very short (10s), video chunks a little longer (this is set
by origin)
- we know the size of objects (1-2MB per video file)
- we do all of our caching in RAM

We use ATS as reverse proxy with the following records config:
CONFIG proxy.config.http.cache.required_headers INT 0
- does this make ATS cache everything?
CONFIG proxy.config.cache.enable_read_while_writer INT 1
- we don't want to wait until chunk is served to one client, we want to serve them in parallel
CONFIG proxy.config.http.background_fill_active_timeout INT 0
CONFIG proxy.config.http.background_fill_completed_threshold FLOAT 0.000000
- accoring to docs these allow download to cache to finish if client that initiated disconnects
CONFIG proxy.config.http.cache.max_open_read_retries INT 5
CONFIG proxy.config.http.cache.open_read_retry_time INT 100
- this is KEY - we need to have collapsed forwarding!
CONFIG proxy.config.cache.ram_cache.size INT 20G
- put everything in RAM

All others are defaults. Now with these settings we are getting a respectable 99,1% hit ratio.
However there are cases when increasing the number of incoming requests to ATS causes it to
flood origin on TCP_MISS (origin responds with 200, so if-modified-since is not part of the

Now, I would imagine that setting max_open_read_retries + open_read_retry_time would make
ALL clients requesting a file (but the first one) wait until the first one retrieves headers
and because of enable_read_while_writer they would then serve the retrieved file. However
I'm seeing in that sometimes during 100ms or more there are multiple TCP_MISS and
origin server requests for the same file. I tried tweaking values of open_read timeout and
retries but without sucess.

Request serving time on TCP_MISS is usually less than 10ms. We have a good link to origin.

My goal would be to have a "perfect" collapsed forwarding. I don't care about latency (I can
make client wait even 5s if necessary), but I don't want to hit origin. Is this possible?
Do I need to adjust the settings? Or is there some reason that this cannot be achieved on
high number of requests?

I would greatly appreciate any suggestions!


Ps. We are using CentOS 6 + Epel 6 official ATS 3.0.4 (ancient!) on 40-core, 64-GB RAM machine
with 2x10Gbps eth. No observable load problems with >1K requests /s.

View raw message