cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Stevens" <at...@hotmail.com>
Subject Re: Put Lucene generator on a diet?
Date Mon, 27 Feb 2006 15:51:48 GMT
>From: "Antonio Fiol Bonnín" <antonio.fiol@gmail.com>
>Date: Sat, 25 Feb 2006 11:26:28 +0100
>
>2006/2/23, Andrew Stevens <ats37@hotmail.com>:
> > Looking through the Lucene block's search generator recently, it 
>occurred to
> > me that a fair amount of the code in there was redundant - all the stuff 
>for
> > breaking the hits up into pages, and only returning one page full of 
>actual
> > hits, seems to me to be duplicating the FilterTransformer.  In fact, in 
>the
> > site I've been working on most recently, we've been using just that
> > configuration - set the hits/page count on the generator to -1 (so it
> > returns everything) and add the filter transformer into the pipeline 
>after
> > it to do the paging.
>
>I suppose that it's there because 99% of searches are paginated and
>very few could be cached... Er... Is the search generator cacheable?

Good question, I've no idea.  I don't see why it shouldn't be, though.  If 
nothing has been updated in the index files (which ought to be determinable 
using org.apache.lucene.store.Directory's list() & fileModified(String) 
methods, or from the file system timestamps), then searching for the same 
query ought to produce the same results.  Moreover, depending on how the 
pipeline's set up, I'm guessing it ought to be possible to cache all the 
hits on the first request and re-use them if the user clicks though to the 
subsequent pages? (thus avoiding calling Lucene repeatedly with the same 
query)

> > So I'm wondering, are there good reasons for the search generator to 
>include
> > the same functionality, or would there be any interest in a patch that
> > strips it out?  The only possiblity that's occurred to me so far is 
>perhaps
> > it has this in there for performance?  The number of hits isn't likely 
>to be
> > an issue in my particular case (less than a few hundred pages in total 
>on
> > this site), but I guess it wouldn't be too good if Cocoon had to stream 
>(and
> > maybe cache) several million hits.  On the other hand, I can't imagine
> > anyone would ever page through all of those anyway, so perhaps just 
>having a
> > configurable upper limit on the hit count would be sufficient?
>
>I agree on the upper limit, if this reduces memory usage.
>
>WRT removing the included pagination, I am not against doing so for
>2.2, but definitely it should not be done for 2.1.X as it would break
>backwards compatibility. Wouldn't it?

Absolutely.  Actually, I was only thinking of doing this for trunk (2.2), 
but neglected to say as much.  There's some other tweaks I've been 
considering for a 2.1.x patch (e.g. adding sitemap parameters that override 
some of the configuration settings), but they're all fully backwards 
compatible.


Andrew.



Mime
View raw message