lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types
Date Thu, 30 Jul 2009 11:10:01 GMT
On Thu, Jul 30, 2009 at 6:28 AM, Richard Marr<> wrote:
> Yeah, having this stuff stored centrally behind the IndexReader seems
> like a better idea than having it in client classes. My shallow
> knowledge of the code isn't helping me explain why it's not performing
> though.
> Out of interest, how come it's a per-thread cache? I don't understand
> all the issues involved but that surprised me.

Good question... making it thread private seems rather wasteful since
at heart this information (Term -> TermInfo) is constant across
threads and so we're wasting RAM.

Also, it's a non-trivial amount of RAM that we're tying up once the
cache is full: 1024 times maybe ~120 bytes per TermInfo on a 64bit jre
= ~120 KB, and it's somewhat devilish/unexpected ("principle of least
surprise") for Lucene to "do this" to any threads that come through

I think one reason was to avoid having to synchronize on the lookups,
though with magic similar to LUCENE-1607 we could presumably make it

Plus, the original motivation for this (LUCENE-1195) was because
queries in general look up the same term at least 2 times during their
execution (weight (idf computation), get postings), and so I think we
wanted to ensure that a single thread doing its query would not see
its terms evicted (due to many other threads coming through) by the
2nd time it needed to use them.  But if we made the central cache
"large enough", perhaps growing if it detects many threads, then this
(other threads evicted my entries before I finished my query)
shouldn't be a problem in practice.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message