lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-502) TermScorer caches values unnecessarily
Date Thu, 13 Nov 2008 03:43:44 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Miller updated LUCENE-502:
-------------------------------

    Attachment: LUCENE-503.patch

Are we interested in this optimization?

Here is an attempted patch. 

Two issues:

1. Seems it might be better to try and use IDF to determine which scorer to use (TermScorer
or LowFreqTermScorer) rather than doc freq so that doc freq doesn't need to be accessed twice.

2. I don't know at what 'level' the LowFreqTermScorer should be cut out for the TermScorer.
Some benching may help.

> TermScorer caches values unnecessarily
> --------------------------------------
>
>                 Key: LUCENE-502
>                 URL: https://issues.apache.org/jira/browse/LUCENE-502
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 1.9
>            Reporter: Steven Tamm
>            Priority: Minor
>         Attachments: LUCENE-503.patch, TermScorer.patch
>
>
> TermScorer aggressively caches the doc and freq of 32 documents at a time for each term
scored.  When querying for a lot of terms, this causes a lot of garbage to be created that's
unnecessary.  The SegmentTermDocs from which it retrieves its information doesn't have any
optimizations for bulk loading, and it's unnecessary.
> In addition, it has a SCORE_CACHE, that's of limited benefit.  It's caching the result
of a sqrt that should be placed in DefaultSimilarity, and if you're only scoring a few documents
that contain those terms, there's no need to precalculate the SQRT, especially on modern VMs.
> Enclosed is a patch that replaces TermScorer with a version that does not cache the docs
or feqs.  In the case of a lot of queries, that saves 196 bytes/term, the unnecessary disk
IO, and extra SQRTs which adds up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message