lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-502) TermScorer caches values unnecessarily
Date Fri, 03 Mar 2006 18:13:39 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368770 ] 

Doug Cutting commented on LUCENE-502:
-------------------------------------

It is not clear to me that your uses are typical uses.  These optimizations were added because
they made big improvements.  They were not premature.  In some cases JVM's may have evolved
so that some of them are no longer required.  But some of them may still make significant
improvements for lots of users.  We really need a benchmark suite to better understand the
effects of things like this...


> TermScorer caches values unnecessarily
> --------------------------------------
>
>          Key: LUCENE-502
>          URL: http://issues.apache.org/jira/browse/LUCENE-502
>      Project: Lucene - Java
>         Type: Improvement
>   Components: Search
>     Versions: 1.9
>     Reporter: Steven Tamm
>  Attachments: TermScorer.patch
>
> TermScorer aggressively caches the doc and freq of 32 documents at a time for each term
scored.  When querying for a lot of terms, this causes a lot of garbage to be created that's
unnecessary.  The SegmentTermDocs from which it retrieves its information doesn't have any
optimizations for bulk loading, and it's unnecessary.
> In addition, it has a SCORE_CACHE, that's of limited benefit.  It's caching the result
of a sqrt that should be placed in DefaultSimilarity, and if you're only scoring a few documents
that contain those terms, there's no need to precalculate the SQRT, especially on modern VMs.
> Enclosed is a patch that replaces TermScorer with a version that does not cache the docs
or feqs.  In the case of a lot of queries, that saves 196 bytes/term, the unnecessary disk
IO, and extra SQRTs which adds up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message