hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Moore (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-1395) Call the storage implementation only once on a cache miss
Date Tue, 03 Sep 2013 13:38:52 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756612#comment-13756612

Jon Moore commented on HTTPCLIENT-1395:

Hi Nikola,

I agree that minimizing the number of calls to cache storage would be a useful improvement.
I did want to note, however, that the current code does not expect zero latency to the cache
storage layer. In fact, the memcached storage implementation expects the cache to be located
across the network and the ehcache implementation expects that cache entries might be spilled
to disk.

The reason there are multiple calls to the cache storage layer is explicitly *because* some
of the processing might take extra latency and the cache may have been updated since we last
checked it--particularly in a cache miss case where it is possible some *other* request filled
in the cache before we did, while we were waiting for an origin request to complete. The caching
module doesn't do any synchronization between requests, other than at the cache storage implementation,
which is external. This allows multiple application servers (for example) to share a common
cache storage (e.g. memcached farm) while maintaining proper HTTP caching semantics.

The cache *does* make an assumption that access to the cache storage layer is an order of
magnitude (or more) faster than making a request to the origin. Remember that HTTP is designed
to operate in a WAN environment. It sounds like in your case making 3 calls to the cache storage
layer is *slower* than calling the origin--is that right?

In any event, I do think there are some opportunities for improvement here. In particular,
in looking through the code again, I need to refresh my memory as to why, if we have a cache
miss, we re-check whether there are variants present before calling the backend. I believe
that might be the only cache lookup we can avoid (as the later one to check if a more recent
entry exists after getting the backend response is necessary for proper cache behavior). If
there's a patch to be had here, it certainly should be storage implementation-agnostic, as
Oleg suggests.

> Call the storage implementation only once on a cache miss
> ---------------------------------------------------------
>                 Key: HTTPCLIENT-1395
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1395
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpCache
>    Affects Versions: 4.2.5
>            Reporter: Nikola Petrov
>            Priority: Minor
>             Fix For: 4.3.1
>         Attachments: call-storage-implementation-once-4.2-branch.patch, call-storage-implementation-once.patch,
> I am trying to use the httpclient-cache component with a Cassandra backend. Everything
seems good except that HttpCacheStorage#getEntry is getting called 3 times the first time
resulting in a performance bottleneck. There might be a way to handle this in the Storage
implementation by caching the recently queried values but I think that a better place is in
the CachingHttpClient class. The current code expects zero latency to the storage backend(the
current implementations are all memory based) but here is a patch that fixes the problem.
Some notes:
> * I am using the code from the 4.2.5 release(but can port the code to the current trunk)

> * test is provided in org.apache.http.impl.client.cache.TestCachingHttpClient
> * BasicHttpCache is patched to expose methods that check if the key is found or if a
proper variant is found - without this there is no way to say if there was a real cache miss
or the specific variant is missing
> * CachingHttpClient is checking if the current HttpCache implementation is BasicHttpCache
so it can use the new methods - I didn't want to change the interface because this will add
breaking changes to the API
> * This exposes the alreadyHaveNewerCacheEntry method so implementations can control if
the client should check for a more recent version in the cache

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message