lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
Date Sat, 26 Jan 2008 02:09:35 GMT

    [ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562803#action_12562803
] 

Hoss Man commented on SOLR-127:
-------------------------------

bq. Ad 2.: Whatever we choose: Two things must be linked: changed index and/or changed config
must change the Etag and the Last-Modified 

I'm not sure that this is strictly true ... if something changes the Etag, then the Last-Modified
should also change, but if the Last-Modified changes the Etag doesn't necessarily have to
change.  consider use cases where solrconfig.xml never changes: we can use openTime for Last-Modified
(in case we have to rollback to an older index), and indexVersion for the ETag - bouncing
the server will change the Last-Mod because a new searcher is opened, but the Etag won't change
becuase the index hasn't changed.

here's what i'm thinking...
* two new options (we can pobably think of better names for these)...
*# lastModFrom="openTime|dirLastMod" ... default is dirLastMod
*# cacheHeaderSeed="[some date format]" ... default is epoch
* headers are commuted as...
** Last-Modified = the max(lastModFrom, cacheHeaderSeed) ... where lastModFrom is computed
using the specified value
** ETag is a hashcode of the indexVersion and cacheHeaderSeed
* resulting behavior...
** Users who aren't pick get the default where slaves with identical snapshots will have identical
Etags and Last-Mod headers.  
** Changing configs by default won't immediately change the Etag or Last-Mod header ... if
you've got an index that changes semi regularly you can just touch the index to get new headers,
or you can add the cacheHeaderSeed option with a timestamp value to force new headers on startup.
** if you are supper paranoid about making sure your headers are always a perfect reflection
of reality (even if you rollback your index to an older copy) use lastModFrom="openTime" and
update the  cacheHeaderSeed option every time you change your config ... downside being that
in multi-slave setups every machine will generate a different Last-Mod (but the ETags should
be the same)

...thoughts?

bq. One comment only: change must-revalidate="" to must-revalidate="true/false" . For no-store/no-cache
as well.

yeah, that's what i was thinking originally, except i wanted to leave out any special knowledge
about what the attributes were (ie: know hardcoded list of directive names) .. any XML attribute
in the config would automatically becomes a directive in the header value, if it had a value
in the config, itwould have a directive value in the header..

{code}
<cacheControl max-age="23" no-cache="" no-store="" must-revalidate="" private="Foo" asdf=""
qwert="666" />
...becomes...
Cache-Control: max-age="23", no-cache, must-revalidate, private="Foo", asdf, qwert="666"
{code}

...that way we don't have to worry about any HTTP extensions, people can put anything they
freaking want in their Cache-Control header. What i forgot until today though is that the
numeric directives in the Cache-Control header aren't suppose to be quoted (ie: max-age=23
... not max-age="23")  ... so that won't work very easily either.

So then started thinking maybe we use the named list syntax, and let the data type tell us
wether or not the value should be quoted (<str>) or not (<int>) ... but that seems
awfully verbose for something this simple ... so now i'm wondering if maybe we should just
make it be one big string and use a regex to look for max-age so we can set the Expires header
as well.

I'm liking the simple string + regex approach personally.



> Make Solr more friendly to external HTTP caches
> -----------------------------------------------
>
>                 Key: SOLR-127
>                 URL: https://issues.apache.org/jira/browse/SOLR-127
>             Project: Solr
>          Issue Type: Wish
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>             Fix For: 1.3
>
>         Attachments: CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch,
HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch,
HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch,
HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch,
HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged me about
the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might
be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers
to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't
do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current IndexReder was
open (if not the Date on the IndexReader) and use the same info to determing how to respond
to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years i've been
using Solr, is because with the internal caching, i've yet to need to put a proxy cache in
front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message