lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Created] (SOLR-4516) Highlighting while querying on field:* highlights every value in the field.
Date Fri, 01 Mar 2013 02:43:12 GMT
Erick Erickson created SOLR-4516:

             Summary: Highlighting while querying on field:* highlights every value in the
                 Key: SOLR-4516
             Project: Solr
          Issue Type: Improvement
            Reporter: Erick Erickson
            Priority: Minor

A query like 

doesn't attempt to highlight anything, as well it shouldn't. But 

does try to highlight. Of course it highlights every last term in the highlight fields, and
is also very slow. 

Re-forming the query as 
gets around the problem and is a better query anyway, but it still seems like trying to highlight
in the above case is wrong.

Comments from the dev list

Jack Krupansky:
If you want to add a highlight option to suppress or limit highlighting for wildcard terms
(or any multi-term query, including fuzzy query), that would seem reasonable, but I’d hate
to lose the highlighting for useful wildcards such as field1:invest*.
Maybe if it was something like &hl.maxMultiTerms=15, that would provide the best of both
worlds – a reasonable default to prevent really slow highlighting, but still give reasonable
highlighting in reasonable cases, and give you the ultimate control to completely turn off
all multi-term expansion highlighting if you so choose.

I was mostly thinking of this specific case, but a more general solution makes sense. I can
still argue that the case of field:* shouldn't ever try to highlight, but field:some* could,
as you say, actually be useful....

Mostly I'm drawing attention to the difference between *:* and field:*. I think we should
be consistent across both.

Could I subvert your “fix” by writing field1:* as field1:** or field1:?* ?
*:* is simply a shorthand for “MatchAllDocs”, with no implication that it is referencing
any field values, while field1:* is an explicit wildcard query, so they are not really comparable
other than at a superficial lexical level.
That said, somewhere there is a Jira that I filed that attempts to have * treated as a faster
filter query for matching all docs that have any value (non-null) in a field. Your proposal
makes more sense in that context since it is clear that * is semantically distinct from a
true wildcard.
Back to my question above, I think it’s okay if only strict single-asterisk wildcard is
covered by your change. Any other wildcard or fuzzy query would continue to behave as before
– although adding my suggested limit on term expansion might still be worthwhile. And I
might still argue that your fix should be an option even if the default is as you have suggested.
But, all these comments should be placed on a Jira!

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message