lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <cedric...@gmail.com>
Subject Re: performance on filtering against thousands of different publications
Date Wed, 15 Aug 2007 05:51:07 GMT
>
> Some options:
> 1) Try minimise leaping around the disk - maybe sorting your selected terms will help.
Look at methods in TermEnum and TermDocs which you can use to build your own bitset from your
(sorted) list of terms.

Thanks, I'll try this method.


> 2) Can you add higher-level terms to your index? Are the publications sources grouped
into stable "sets"? If so, why not index the content with a "publicationSet" field too and
use that in filters instead of querying using the members of the set (individual publication
sources).

Although some sets are stable. They do change from time to time. Then
we would have to reindex at least those documents that change sets.
This is something we would like to avoid.


> 3) The <CachedFilter> tag in contrib's XMLQueryParser already implements an LRU
policy for caching nested filters or queries.

I would definitely look into this one as well.

Thanks,
Cedric

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message