lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads
Date Mon, 23 Nov 2009 13:06:39 GMT


Michael McCandless commented on LUCENE-2075:

bq. To your patch: Looks good, I would only add @Overrides to the DoubleBarrelCache.

Ahh right will do -- not quite in Java 5 mode yet ;)

bq. What do we do with Yonik/mine's cache?

Solr's ConcurrentLRUCache makes me somewhat nervous, that it can blow up under high (admittedly,
rather synthetic, by today's standards) load.

bq. Have you tried aut with NRQ, too?

I haven't; that'd be great if you could & report back.

bq. Robert maybe you can try this patch plus automaton patch and see if you see this same
odd behavior?

confirmed, though on my machine, it is 4 second avg *N versus 6 second avg *N 

Weird -- I can't explain why this full scan is faster but the skipping scan is not.

bq. I havent looked at the code, but fyi, even the smart mode is always "in term order" traversal,
its just skipping over terms.

Right -- but that skipping variant is now pulling from cache.  Let's see what NRQ results
look like... though it does quite a bit less seeking than eg the ????NNN query.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>                 Key: LUCENE-2075
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>         Attachments:, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch,
LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message