lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Marr <>
Subject Re: [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types
Date Thu, 30 Jul 2009 10:28:59 GMT
Yeah, having this stuff stored centrally behind the IndexReader seems
like a better idea than having it in client classes. My shallow
knowledge of the code isn't helping me explain why it's not performing

Out of interest, how come it's a per-thread cache? I don't understand
all the issues involved but that surprised me.

2009/7/30 Michael McCandless (JIRA) <>:
>    [
> Michael McCandless commented on LUCENE-1690:
> --------------------------------------------
> OK now I feel silly -- this cache is in fact very similar to the caching that Lucene
already does, internally!  Sorry I didn't catch this overlap sooner.
> In there's an LRU cache, default size 1024, that holds
recently retrieved terms and their TermInfo.  It uses oal.util.cache.SimpleLRUCache.
> There are some important differences from this new cache in MLT.  EG, it holds the entire
TermInfo, not just the docFreq.  Plus, it's a central cache for any & all term lookups
that go through the SegmentReader.  Also, it's stored in thread-private storage, so each
thread has its own cache.
> But, now I'm confused: how come you are not already seeing the benefits of this cache?
 You ought to see MLT queries going faster.  This core cache was first added in 2.4.x; it
looks like you were testing against 2.4.1 (from the "Affects Version" on this issue).
>> Morelikethis queries are very slow compared to other search types
>> -----------------------------------------------------------------
>>                 Key: LUCENE-1690
>>                 URL:
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: contrib/*
>>    Affects Versions: 2.4.1
>>            Reporter: Richard Marr
>>            Priority: Minor
>>         Attachments: LruCache.patch, LUCENE-1690.patch, LUCENE-1690.patch
>>   Original Estimate: 2h
>>  Remaining Estimate: 2h
>> The MoreLikeThis object performs term frequency lookups for every query.  From my
testing that's what seems to take up the majority of time for MoreLikeThis searches.
>> For some (I'd venture many) applications it's not necessary for term statistics to
be looked up every time. A fairly naive opt-in caching mechanism tied to the life of the MoreLikeThis
object would allow applications to cache term statistics for the duration that suits them.
>> I've got this working in my test code. I'll put together a patch file when I get
a minute. From my testing this can improve performance by a factor of around 10.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Richard Marr
07976 910 515

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message