trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Hedstrom <zw...@apache.org>
Subject Re: proxy.config.cache.ram_cache.size query from eBay
Date Fri, 21 Nov 2014 17:47:36 GMT

> On Nov 21, 2014, at 10:11 AM, Lerner, Steve <slerner@ebay.com> wrote:
> 
> Leif,
>  
> For this eval config:
>  
> I am referring to Full Clustering https://docs.trafficserver.apache.org/en/latest/admin/cluster-howto.en.html
<https://docs.trafficserver.apache.org/en/latest/admin/cluster-howto.en.html>
> 
> We have two of these, 11 machines each.
>  
> AND we are using load balancing to ‘stripe’ URLs across the 22 machines, so each
one only gets a fixed named ‘range’ of URLs i.e. A-B goes on machine 1, C-D on machine
2, etc…
>  
> The clustering should prevent duplicate objects from happening despite load balancing


Interesting. What sort of hardware load balancing do you do? Hardware SLBs used to be notoriously
poor at dealing with failures, and rebalance the entire dataset (i.e. not consistent hashing).
If that’s the case, I’d be concerned what happens when an ATS box goes down, and the SLB
might rebalance everything? Is that the problem case that you are trying to address with the
ATS cache clustering? (If so, just don’t do L7 SLB hashing :).

It sounds in your setup that you don’t care which ATS box the SLB sends each ‘range’
to, as long as it always goes to the same? I mean, there’s really no (easy) way for your
SLB to know if URL A-B is actually cached on machine 1 or not. This means that your (potentially)
expensive L7 URL load balancing in SLB has little or no value. It’s no better off than just
sending it to any other random box in the cluster. Your effective cache hit ratio would be
roughly the same. Now, if you could somehow coerce the SLB such that it hashes URLs the same
as ATS does, then you’d be in good shape.

To summarize, if you are set on using SLB and L7 URL hashing (which can get very expensive
and resource intensive), I’d probably just stick to that, and not use ATS cache clustering
at all. This would in fact give you a much better direct cache hit ratio, and less backend
traffic between the ATS proxies  If you do, also make sure that the SLB is not rehashing the
entire data set on a single host failure. If it does, you’re in a heap of trouble every
time a box dies.

If you decide to use ATS cache clustering, then I’d probably just turn off the SLB, or turn
it down to a simple L3 round-robin. This will give you the same overall cache hit ratio, but
at the expense of having more backend traffic between the ATS boxes.

I can’t honestly think of a reason why doing hashing on both layers would yield any better
cache results? In a worst case scenario it’d probably be worse than using either one of
the two. But, if you have such results, it’d be really interesting if you could share that!

Cheers,

— leif


Mime
View raw message