lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1603) Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
Date Tue, 14 Apr 2009 17:49:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698851#action_12698851
] 

Michael McCandless commented on LUCENE-1603:
--------------------------------------------

I was thinking that this count is a good way to measure how much net work was done, hence
the switch to sum.  EG you could compare that count vs the count you get after having optimized
the index to get a sense of how much you gained by optimizing.

Whereas now, with the count only showing the # terms from the last segment searched, is not
really useful at all.

bq. Are queries also rewritten per segment with the new Searchers? If not, one could use the
BooleanQuery variant, if he wants to have real term numbers on unoptimized index.

They are rewritten at the MultiReader level, so you're right one could use that to get "number
of unique terms" vs "amount of work (seeks) done".

If we do change it, ow about "get/clearTotalNumberOfTerms()"?

> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1603
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1603.patch
>
>
> This is a patch, that is needed for the MultiTermQuery-rewrite of TrieRange (LUCENE-1602):
> - Make the private members protected, to have access to them from the very special TrieRangeTermEnum

> - Fix a small inconsistency (docFreq() now only returns a value, if a valid term is existing)
> - Improvement of MultiTermFilter.getDocIdSet to return DocIdSet.EMPTY_DOCIDSET, if the
TermEnum is empty (less memory usage) and faster.
> - Add the getLastNumberOfTerms() to MultiTermQuery for statistics on different multi
term queries and how may terms they affect, using this new functionality, the improvement
of TrieRange can be shown (extract from test case there, 10000 docs index, long values):
> {code}
> [junit] Average number of terms during random search on 'field8':
> [junit]  Trie query: 244.2
> [junit]  Classical query: 3136.94
> [junit] Average number of terms during random search on 'field4':
> [junit]  Trie query: 38.3
> [junit]  Classical query: 3018.68
> [junit] Average number of terms during random search on 'field2':
> [junit]  Trie query: 18.04
> [junit]  Classical query: 3539.42
> {code}
> All core tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message