lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1690) Morelikethis queries are very slow compared to other search types
Date Thu, 30 Jul 2009 10:01:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737059#action_12737059
] 

Michael McCandless commented on LUCENE-1690:
--------------------------------------------

OK now I feel silly -- this cache is in fact very similar to the caching that Lucene already
does, internally!  Sorry I didn't catch this overlap sooner.

In oal.index.TermInfosReader.java there's an LRU cache, default size 1024, that holds recently
retrieved terms and their TermInfo.  It uses oal.util.cache.SimpleLRUCache.

There are some important differences from this new cache in MLT.  EG, it holds the entire
TermInfo, not just the docFreq.  Plus, it's a central cache for any & all term lookups
that go through the SegmentReader.  Also, it's stored in thread-private storage, so each thread
has its own cache.

But, now I'm confused: how come you are not already seeing the benefits of this cache?  You
ought to see MLT queries going faster.  This core cache was first added in 2.4.x; it looks
like you were testing against 2.4.1 (from the "Affects Version" on this issue).

> Morelikethis queries are very slow compared to other search types
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1690
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Richard Marr
>            Priority: Minor
>         Attachments: LruCache.patch, LUCENE-1690.patch, LUCENE-1690.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The MoreLikeThis object performs term frequency lookups for every query.  From my testing
that's what seems to take up the majority of time for MoreLikeThis searches.  
> For some (I'd venture many) applications it's not necessary for term statistics to be
looked up every time. A fairly naive opt-in caching mechanism tied to the life of the MoreLikeThis
object would allow applications to cache term statistics for the duration that suits them.
> I've got this working in my test code. I'll put together a patch file when I get a minute.
From my testing this can improve performance by a factor of around 10.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message