lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-13367) Highlighting fails for Range queries on Multi-valued String fields
Date Fri, 07 Jun 2019 11:59:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858553#comment-16858553
] 

Jan Høydahl commented on SOLR-13367:
------------------------------------

Hi

I wonder if the reason is a bug in the Admin UI query section which sets parameter "hightlightMultiTerm"
instead of the correct "highlightMultiTerm". Can you try your query again manually from browser
address bar with correct param? See [https://github.com/apache/lucene-solr/pull/704] for
a fix

> Highlighting fails for Range queries on Multi-valued String fields
> ------------------------------------------------------------------
>
>                 Key: SOLR-13367
>                 URL: https://issues.apache.org/jira/browse/SOLR-13367
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 7.5, 7.7.1
>         Environment: RedHat Linux v7
> Java 1.8.0_201
>            Reporter: Karl Wolf
>            Priority: Major
>             Fix For: 5.1
>
>
> Range queries against multi-valued string fields produces useless highlighting, even
though "hl.highlightMultiTerm":"true"
> I have uncovered what I believe is a bug. At the very lease it is a difference in behavior
between Solr v5.1.0 and v7.5.0 (and v7.7.1).
> I have a multi-valued string Field defined in my schema as:
>     <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
>     <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true"
/>
> I am using a query containing a Range clause and I am using highlighting to get the list
of values that actually matched the range query.
> All examples below were using the appropriate Solr Admin Server SolrCore Query page.
> ***************************************************************************
> First, a correctly working example of a range query using Solr v5.1.0 which produces
useful results:
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 366,
>     "params": {
>       "q": "MyStringField:[A TO B}",
>       "hl": "true",
>       "indent": "true",
>       "hl.preserveMulti": "true",
>       "fl": "MyStringField,MyUniqueID",
>       "hl.requireFieldMatch": "true",
>       "hl.usePhraseHighlighter": "true",
>       "hl.fl": "MyStringField",
>       "wt": "json",
>       "hl.highlightMultiTerm": "true",
>       "_": "1553275722025"
>     }
>   },
>   "response": {
>     "numFound": 999,
>     "start": 0,
>     "docs": [
>       {
>         "MyStringField": [
>           "Stanley, Wendell M.",
>           "Avery, Roy"
>         ],
>         "MyUniqueID": "UniqueID1"
>       },
>       {
>         "MyStringField": [
>           "Avery, Roy"
>         ],
>         "MyUniqueID": "UniqueID2"
>       },
> *** lots more docs correctly found
>     ]
>   },
> *** we get to the highlighting portion of the response
> *** this indicates which values of each MyStringField
> *** that actually matched the query
>   "highlighting": {
>     "UniqueID1": {
>       "MyStringField": [
>         "<em>Avery, Roy</em>"
>       ]
>     },
>     "UniqueID2": {
>       "MyStringField": [
>         "<em>Avery, Roy</em>"
>       ]
>     },
>     "UniqueID3": {
>       "MyStringField": [
>         "<em>American Institute of Biological Sciences</em>",
>         "<em>Albritton, Errett C.</em>"
>       ]
>     },
> ... etc.
>  *** lots more useful highlight values. Note the two matching values
>  *** for document UniqueID3. 
> }
> ***************************************************************************
> * THE PROBLEM
> * Now using newer versions of Solr
> ***************************************************************************
> Using the exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the 
> response is basically the same including the number of documents found
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":245,
>     "params":{
>       "q":"MyStringField:[A TO B}",
>       "hl":"on",
>       "hl.preserveMulti":"true",
>       "fl":"MyUniqueID, MyStringField",
>       "hl.requireFieldMatch":"true",
>       "hl.fl":"MyStringField",
>       "hightlightMultiTerm":"true",
>       "wt":"json",
>       "_":"1553105129887",
>       "usePhraseHighLighter":"true"}},
>   "response":{"numFound":999,"start":0,"docs":[
> *** The problem is with the lighlighting portion of the results, which is effectively
empty. 
> *** There is no way to know what values in each document that actually matched the query:
>   "highlighting":{
>     "UniqueID1":{},
>     "UniqueID2":{},
>     "UniqueID3":{},
> ... etc.
> *** NOTE: The source data is the same for all of the tested Solr versions and the Solr
indexes
> *** were properly rebuilt for each Solr version. 
> ***************************************************************************
> Changing the request to using the "unified" highlighter: "hl.method=unified", the highlighting
looks like:
>   "highlighting":{
>     "UniqueID1":{
>       "MyStringField":[]},
>     "UniqueID2":{
>       "MyStringField":[]},
>     "UniqueID3":{
>       "MyStringField":[]},
> ... etc.
> *** The highlighting now properly lists the matching field but still no useful values
are listed.
> ***************************************************************************
> NOTE: if I change the query from using a Range clause to using a Wildcard query: q="MyStringField:A*"
> the highlighting is correct in both Solr v7.5.0 and v7.7.1: These are GOOD results!
>   "highlighting":{
>     "UniqueID1": {
>       "MyStringField": ["<em>Avery, Roy</em>"]},
>     "UniqueID2": {
>       "MyStringField": ["<em>Avery, Roy</em>"]},
>     "UniqueID3": {
>       "MyStringField": [
>         "<em>American Institute of Biological Sciences</em>",
>         "<em>Albritton, Errett C.</em>"
>       ]
>     },
> ... etc.
> *** This makes me think there is some problem with the way a Range query
> *** feeds the search results to the Solr Highlighter code.
> ***************************************************************************
> All attempts to vary the hl specs or any other query parameters do not solve the problem.
> The wildcard query is my current work around but there still is a problem with
> range queries:
> In summary, there is some incompatibility among:
> 	1) A multi-valued string field AND
> 	2) A range query against that field AND
> 	3) The result Highlighting. It is effectively empty.
> I don't know when this issue was first introduced. I have recently been updating from
5.1.0
> to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening
> versions but I gave up to save my sanity.
> You should be able to reproduce this issue using any multi-valued, indexed and stored
string field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message