hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikola Petrov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-1395) Call the storage implementation only once on a cache miss
Date Tue, 03 Sep 2013 14:12:51 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756645#comment-13756645
] 

Nikola Petrov commented on HTTPCLIENT-1395:
-------------------------------------------

Hi Jon,

I agree with you on all points. Here are my notes(maybe somewhat specific to my usecase)

* Yep, sometimes making 3 calls to the cache storage layer will be slower than just sending
a request to the server(given that the network to the HTTP server is fast enough)
* {quote}
I need to refresh my memory as to why, if we have a cache miss, we re-check whether there
are variants present before calling the backend
{quote} as far as I could see, the method getCachedEntry didn't expose that information and
returned null on *both* no variant and no root entry
* {quote} I believe that might be the only cache lookup we can avoid (as the later one to
check if a more recent entry exists after getting the backend response is necessary for proper
cache behavior){quote}
In my case(a web crawler), there is another layer/component that checks if the current URI
is already processed by another worker thread so this is not needed. I agree that the default
should be to check if the response is older than the one in the cache but the API user should
be able to control the checking.

-- 
Nikola
                
> Call the storage implementation only once on a cache miss
> ---------------------------------------------------------
>
>                 Key: HTTPCLIENT-1395
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1395
>             Project: HttpComponents HttpClient
>          Issue Type: Improvement
>          Components: HttpCache
>    Affects Versions: 4.2.5
>            Reporter: Nikola Petrov
>            Priority: Minor
>             Fix For: 4.3.1
>
>         Attachments: call-storage-implementation-once-4.2-branch.patch, call-storage-implementation-once.patch,
call-storage-implementation-once-trunk.patch
>
>
> I am trying to use the httpclient-cache component with a Cassandra backend. Everything
seems good except that HttpCacheStorage#getEntry is getting called 3 times the first time
resulting in a performance bottleneck. There might be a way to handle this in the Storage
implementation by caching the recently queried values but I think that a better place is in
the CachingHttpClient class. The current code expects zero latency to the storage backend(the
current implementations are all memory based) but here is a patch that fixes the problem.
Some notes:
> * I am using the code from the 4.2.5 release(but can port the code to the current trunk)

> * test is provided in org.apache.http.impl.client.cache.TestCachingHttpClient
> * BasicHttpCache is patched to expose methods that check if the key is found or if a
proper variant is found - without this there is no way to say if there was a real cache miss
or the specific variant is missing
> * CachingHttpClient is checking if the current HttpCache implementation is BasicHttpCache
so it can use the new methods - I didn't want to change the interface because this will add
breaking changes to the API
> * This exposes the alreadyHaveNewerCacheEntry method so implementations can control if
the client should check for a more recent version in the cache

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message