lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1424) Add ConstantScorePrefixQuery and ConstantScoreWildcardQuery
Date Sat, 25 Oct 2008 12:57:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642615#action_12642615
] 

Mark Harwood commented on LUCENE-1424:
--------------------------------------

>> Are the score differences caused by the rewrite-to-BooleanQuery implementations ever
"useful"?

So we need to consider what we are losing - TF, IDF, coordination, length norm, doc boosts.

I can only think of one use case which relates to coordination factor.

If you have a "category" field for a product e.g. given Lucene docs for these books:

Title:            Lucene in Action
Category:   /Books/Computing/Languages/Java
                    /Books/Computing/InformationRetrieval

Title:           The Long Tail
Category:  /Books/Business/Internet
                   /Books/Computing

You might then use a wildcard search of /Books/Computing/* and "Lucene in Action" would rank
higher than "The Long Tail" because a BooleanQuery would score a higher coordination factor
suggesting LIA got more hits under this "/Books/Computing.." category. There would still be
the issue of IDF potentially skewing results but the coordination factor is potentially useful
here. 

I think in general IDF tends to be useless for "auto-expanded" terms e.g. Wildcard, fuzzy
etc. Incidentally, we still see that IDF issue in fuzzy queries ranking rare mis-spellings
higher but that's another issue (one I resolved in contrib's FuzzyLikeThisQuery).

I suppose one other consideration is for people who have created any doc boosts e.g. trying
to use this to boost by date.

I don't think any of these cases necessarily outweigh the benefit to be obtained from switching
"wildcard/prefix to constant score queries"


Cheers,
Mark







> Add ConstantScorePrefixQuery and ConstantScoreWildcardQuery
> -----------------------------------------------------------
>
>                 Key: LUCENE-1424
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1424
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Mark Miller
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1424.patch
>
>
> If we want to be able to highlight these queries, they need to be added to Lucene core
or contrib (solr's WildCardFilter can be used to create the ConstantScoreWildcardQuery). They
are very useful anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message