hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikola Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-1395) Call the storage implementation only once on a cache miss
Date Tue, 03 Sep 2013 14:12:51 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756645#comment-13756645

Nikola Petrov commented on HTTPCLIENT-1395:

Hi Jon,

I agree with you on all points. Here are my notes(maybe somewhat specific to my usecase)

* Yep, sometimes making 3 calls to the cache storage layer will be slower than just sending
a request to the server(given that the network to the HTTP server is fast enough)
* {quote}
I need to refresh my memory as to why, if we have a cache miss, we re-check whether there
are variants present before calling the backend
{quote} as far as I could see, the method getCachedEntry didn't expose that information and
returned null on *both* no variant and no root entry
* {quote} I believe that might be the only cache lookup we can avoid (as the later one to
check if a more recent entry exists after getting the backend response is necessary for proper
cache behavior){quote}
In my case(a web crawler), there is another layer/component that checks if the current URI
is already processed by another worker thread so this is not needed. I agree that the default
should be to check if the response is older than the one in the cache but the API user should
be able to control the checking.

> Call the storage implementation only once on a cache miss
> ---------------------------------------------------------
>                 Key: HTTPCLIENT-1395
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1395
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpCache
>    Affects Versions: 4.2.5
>            Reporter: Nikola Petrov
>            Priority: Minor
>             Fix For: 4.3.1
>         Attachments: call-storage-implementation-once-4.2-branch.patch, call-storage-implementation-once.patch,
> I am trying to use the httpclient-cache component with a Cassandra backend. Everything
seems good except that HttpCacheStorage#getEntry is getting called 3 times the first time
resulting in a performance bottleneck. There might be a way to handle this in the Storage
implementation by caching the recently queried values but I think that a better place is in
the CachingHttpClient class. The current code expects zero latency to the storage backend(the
current implementations are all memory based) but here is a patch that fixes the problem.
Some notes:
> * I am using the code from the 4.2.5 release(but can port the code to the current trunk)

> * test is provided in org.apache.http.impl.client.cache.TestCachingHttpClient
> * BasicHttpCache is patched to expose methods that check if the key is found or if a
proper variant is found - without this there is no way to say if there was a real cache miss
or the specific variant is missing
> * CachingHttpClient is checking if the current HttpCache implementation is BasicHttpCache
so it can use the new methods - I didn't want to change the interface because this will add
breaking changes to the API
> * This exposes the alreadyHaveNewerCacheEntry method so implementations can control if
the client should check for a more recent version in the cache

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org

View raw message