lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Marr <richard.m...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types
Date Thu, 30 Jul 2009 10:28:59 GMT
Yeah, having this stuff stored centrally behind the IndexReader seems
like a better idea than having it in client classes. My shallow
knowledge of the code isn't helping me explain why it's not performing
though.

Out of interest, how come it's a per-thread cache? I don't understand
all the issues involved but that surprised me.




2009/7/30 Michael McCandless (JIRA) <jira@apache.org>:
>
>    [ https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737059#action_12737059
]
>
> Michael McCandless commented on LUCENE-1690:
> --------------------------------------------
>
> OK now I feel silly -- this cache is in fact very similar to the caching that Lucene
already does, internally!  Sorry I didn't catch this overlap sooner.
>
> In oal.index.TermInfosReader.java there's an LRU cache, default size 1024, that holds
recently retrieved terms and their TermInfo.  It uses oal.util.cache.SimpleLRUCache.
>
> There are some important differences from this new cache in MLT.  EG, it holds the entire
TermInfo, not just the docFreq.  Plus, it's a central cache for any & all term lookups
that go through the SegmentReader.  Also, it's stored in thread-private storage, so each
thread has its own cache.
>
> But, now I'm confused: how come you are not already seeing the benefits of this cache?
 You ought to see MLT queries going faster.  This core cache was first added in 2.4.x; it
looks like you were testing against 2.4.1 (from the "Affects Version" on this issue).
>
>> Morelikethis queries are very slow compared to other search types
>> -----------------------------------------------------------------
>>
>>                 Key: LUCENE-1690
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1690
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>          Components: contrib/*
>>    Affects Versions: 2.4.1
>>            Reporter: Richard Marr
>>            Priority: Minor
>>         Attachments: LruCache.patch, LUCENE-1690.patch, LUCENE-1690.patch
>>
>>   Original Estimate: 2h
>>  Remaining Estimate: 2h
>>
>> The MoreLikeThis object performs term frequency lookups for every query.  From my
testing that's what seems to take up the majority of time for MoreLikeThis searches.
>> For some (I'd venture many) applications it's not necessary for term statistics to
be looked up every time. A fairly naive opt-in caching mechanism tied to the life of the MoreLikeThis
object would allow applications to cache term statistics for the duration that suits them.
>> I've got this working in my test code. I'll put together a patch file when I get
a minute. From my testing this can improve performance by a factor of around 10.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Richard Marr
richard.marr@gmail.com
07976 910 515

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message