lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements
Date Wed, 07 Oct 2009 16:48:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763118#action_12763118
] 

Robert Muir commented on LUCENE-1951:
-------------------------------------

here are some stats for rewriting wildcards that should be prefix.
i query on a field with about ~10M numeric terms (a unique database id), average length 10
characters or so.
i copied this into ramdirectory to try to rule out i/o a bit (its only 1GB index and i use
4GB heap)

I look for all the ones starting with "1" (about 1.5 million of these). I did 3 runs, 100
queries each.
here are average times for each.

||Run||wildcardquery("1*")||prefixquery("1")
|1|1181ms|973ms
|2|1179ms|966ms
|3|1079ms|963ms

So, its not a big optimization, but seems consistent, and maybe more important if avg term
length is longer: in this case wildcard's comparison function might have to do even more work.

I'll work on a patch to fix the boost/constant score and include a prefixquery rewrite for
this case.


> wildcardquery rewrite improvements
> ----------------------------------
>
>                 Key: LUCENE-1951
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1951
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Robert Muir
>            Priority: Minor
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite to a constant
score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to rewrite to prefix
query.
> both will enumerate the same number of terms, but prefixquery has a simpler comparison
function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message