lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-4516) Highlighting while querying on field:* highlights every value in the field.
Date Fri, 01 Mar 2013 02:43:12 GMT
Erick Erickson created SOLR-4516:
------------------------------------

             Summary: Highlighting while querying on field:* highlights every value in the
field.
                 Key: SOLR-4516
                 URL: https://issues.apache.org/jira/browse/SOLR-4516
             Project: Solr
          Issue Type: Improvement
            Reporter: Erick Erickson
            Priority: Minor


A query like 
q=*:*&hl=on&.....

doesn't attempt to highlight anything, as well it shouldn't. But 
q=field1:*&hl=on&...

does try to highlight. Of course it highlights every last term in the highlight fields, and
is also very slow. 

Re-forming the query as 
q=*:*&fq=field1:*&hl=on&.... 
gets around the problem and is a better query anyway, but it still seems like trying to highlight
in the above case is wrong.

Comments from the dev list

Jack Krupansky:
If you want to add a highlight option to suppress or limit highlighting for wildcard terms
(or any multi-term query, including fuzzy query), that would seem reasonable, but I’d hate
to lose the highlighting for useful wildcards such as field1:invest*.
 
Maybe if it was something like &hl.maxMultiTerms=15, that would provide the best of both
worlds – a reasonable default to prevent really slow highlighting, but still give reasonable
highlighting in reasonable cases, and give you the ultimate control to completely turn off
all multi-term expansion highlighting if you so choose.


Me:
I was mostly thinking of this specific case, but a more general solution makes sense. I can
still argue that the case of field:* shouldn't ever try to highlight, but field:some* could,
as you say, actually be useful....

Mostly I'm drawing attention to the difference between *:* and field:*. I think we should
be consistent across both.

Jack:
Could I subvert your “fix” by writing field1:* as field1:** or field1:?* ?
 
*:* is simply a shorthand for “MatchAllDocs”, with no implication that it is referencing
any field values, while field1:* is an explicit wildcard query, so they are not really comparable
other than at a superficial lexical level.
 
That said, somewhere there is a Jira that I filed that attempts to have * treated as a faster
filter query for matching all docs that have any value (non-null) in a field. Your proposal
makes more sense in that context since it is clear that * is semantically distinct from a
true wildcard.
 
Back to my question above, I think it’s okay if only strict single-asterisk wildcard is
covered by your change. Any other wildcard or fuzzy query would continue to behave as before
– although adding my suggested limit on term expansion might still be worthwhile. And I
might still argue that your fix should be an option even if the default is as you have suggested.
 
But, all these comments should be placed on a Jira!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message