hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Moore (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HTTPCLIENT-1347) gzip responses doubly cached
Date Tue, 14 Jan 2014 02:47:55 GMT

    [ https://issues.apache.org/jira/browse/HTTPCLIENT-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870335#comment-13870335
] 

Jon Moore commented on HTTPCLIENT-1347:
---------------------------------------

Hi Adam,

Sorry this has been confusing--it's also clearly been confusing for us, too (it's been probably
almost 3 years since we've touched the variant stuff). I do think this part of the implementation
is probably in need of a rewrite.

Before I dive into the response, I did just want to highlight one thing I noticed in your
test code, which is that you have your clients wrapped CachingHttpClient around DecompressingHttpClient
around DefaultHttpClient, but this will result in your caching unzipped responses, which you
probably don't want. You should do DecompressingHttpClient around CachingHttpClient around
DefaultHttpClient. However, this has been changed in 4.3, where the processing stack gets
set up in the right order for you.

In your case, I believe you want to implement both a ResourceFactory (for the bodies) in conjunction
with an HttpCacheStorage (for the headers). If you look in the BasicHttpCache, you will see
that the ResourceFactory.copy method gets called when storing a variant entry; you could implement
this as a lightweight clone operation (reusing a filename, or a soft link on the file system)
so the body would only be stored once.

You are right, though, that the cache entry (headers) get stored twice. The reason for this
is a bit of historical accident, but at the time we added support for processing variants,
we weren't able to store request headers with the entries without breaking backwards compatibility
on the HttpCacheEntry interface. This meant that when you retrieved an entry using the URL
as the cache key, you couldn't tell if you could return it or not if it had a Vary header,
because you didn't know what request headers had been used to fetch that entry in the first
place, or whether they were the same as the current request's header. We ended up storing
variants using the relevant headers and values along with the URL as the variant cache key
so that we could tell if we had a matching variant or not. [Note that the contract for the
HttpCacheStorage is pretty much a key-value store, so the storage should not really be caring
whether the keys are URLs or not.]

So, long story short: the storage API of the caching module is probably due for a backwards-incompatible
overhaul. In the meantime, you may be able to get most of what you want by treating the headers
and bodies separately. Hope that helps.

> gzip responses doubly cached
> ----------------------------
>
>                 Key: HTTPCLIENT-1347
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1347
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpCache
>    Affects Versions: 4.2.5
>         Environment: ARCH Linux kernel 3.8.8-1
> node.js 0.8.22
>            Reporter: Adam Patacchiola
>             Fix For: 4.4 Final
>
>         Attachments: Screen Shot 2014-01-11 at 7.11.36 PM.png, Screen Shot 2014-01-13
at 3.56.19 PM.png, Showing_entry_pointer.png, httpClientCacheTest.tar.gz, httpClientTestServer.js
>
>
> Compressed responses are cached twice. 
> Run the attached server (node.js 0.8.22) and client tests. Create an "assets" directory
under where you are running the server and add two files named 1 and 2 ( < 1000000 bytes)
. You will see that after the test is run the cache dump output displays 2 sets of entries
for each request, each containing the full content length of the file.
> Changing the implementation of HttpCacheStorage updateEntry to not update non existent
entries (as I believe the correct implementation should do) throws exceptions. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Mime
View raw message