lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Optimizing unordered queries
Date Mon, 29 Jun 2009 10:28:08 GMT
On Sun, Jun 28, 2009 at 9:08 PM, Nigel<> wrote:
>> Unfortunately the TermInfos must still be hit to look up the
>> freq/proxOffset in the postings files.
> But for that data you only have to hit the TermInfos for the terms you're
> searching, correct?  So, assuming that there are vastly more terms in the
> index than you ever actually use in a query, we could rely on the LRU cache
> to keep the queried TermInfos around, rather than loading all of them
> up-front.  This was a hypothesis based on some tracing through the code but
> not a lot of knowledge of Lucene internals, so please steer me back to
> reality if necessary.  (-:

Right, it's only the terms in your query that need to be looked up.

There's already an LRU cache for the Term -> TermInfos lookup, in
TermInfosReader (hard wired to size 1024).  It was created as a "short
term" cache so that queries that look up the same term twice (eg once
for idf and once to get freq/prox postings position), would result in
only one real lookup.  You could try privately experimenting w/ that
value to see if you can get a reasonable it rate?

Lucene only loads the "indexed" terms (= every 128th) into RAM, by default.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message