lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] [Updated] (SOLR-13367) Highlighting fails for Range queries on Multi-valued String fields
Date Fri, 07 Jun 2019 12:01:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-13367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jan Høydahl updated SOLR-13367:
-------------------------------
    Description: 
Range queries against multi-valued string fields produces useless highlighting, even though
"hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference in behavior
between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a multi-valued string Field defined in my schema as:

<fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
 <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true"
/>

I am using a query containing a Range clause and I am using highlighting to get the list of
values that actually matched the range query.

All examples below were using the appropriate Solr Admin Server SolrCore Query page.

 

  was:
Range queries against multi-valued string fields produces useless highlighting, even though
"hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference in behavior
between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a multi-valued string Field defined in my schema as:

    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
    <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true"
/>

I am using a query containing a Range clause and I am using highlighting to get the list of
values that actually matched the range query.

All examples below were using the appropriate Solr Admin Server SolrCore Query page.

***************************************************************************
First, a correctly working example of a range query using Solr v5.1.0 which produces useful
results:

{
  "responseHeader": {
    "status": 0,
    "QTime": 366,
    "params": {
      "q": "MyStringField:[A TO B}",
      "hl": "true",
      "indent": "true",
      "hl.preserveMulti": "true",
      "fl": "MyStringField,MyUniqueID",
      "hl.requireFieldMatch": "true",
      "hl.usePhraseHighlighter": "true",
      "hl.fl": "MyStringField",
      "wt": "json",
      "hl.highlightMultiTerm": "true",
      "_": "1553275722025"
    }
  },
  "response": {
    "numFound": 999,
    "start": 0,
    "docs": [
      {
        "MyStringField": [
          "Stanley, Wendell M.",
          "Avery, Roy"
        ],
        "MyUniqueID": "UniqueID1"
      },
      {
        "MyStringField": [
          "Avery, Roy"
        ],
        "MyUniqueID": "UniqueID2"
      },
*** lots more docs correctly found
    ]
  },
*** we get to the highlighting portion of the response
*** this indicates which values of each MyStringField
*** that actually matched the query

  "highlighting": {
    "UniqueID1": {
      "MyStringField": [
        "<em>Avery, Roy</em>"
      ]
    },
    "UniqueID2": {
      "MyStringField": [
        "<em>Avery, Roy</em>"
      ]
    },
    "UniqueID3": {
      "MyStringField": [
        "<em>American Institute of Biological Sciences</em>",
        "<em>Albritton, Errett C.</em>"
      ]
    },
... etc.

 *** lots more useful highlight values. Note the two matching values
 *** for document UniqueID3. 
}


***************************************************************************
* THE PROBLEM
* Now using newer versions of Solr
***************************************************************************
Using the exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the 
response is basically the same including the number of documents found

{
  "responseHeader":{
    "status":0,
    "QTime":245,
    "params":{
      "q":"MyStringField:[A TO B}",
      "hl":"on",
      "hl.preserveMulti":"true",
      "fl":"MyUniqueID, MyStringField",
      "hl.requireFieldMatch":"true",
      "hl.fl":"MyStringField",
      "hightlightMultiTerm":"true",
      "wt":"json",
      "_":"1553105129887",
      "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

*** The problem is with the lighlighting portion of the results, which is effectively empty.

*** There is no way to know what values in each document that actually matched the query:

  "highlighting":{
    "UniqueID1":{},
    "UniqueID2":{},
    "UniqueID3":{},
... etc.

*** NOTE: The source data is the same for all of the tested Solr versions and the Solr indexes
*** were properly rebuilt for each Solr version. 

***************************************************************************
Changing the request to using the "unified" highlighter: "hl.method=unified", the highlighting
looks like:

  "highlighting":{
    "UniqueID1":{
      "MyStringField":[]},
    "UniqueID2":{
      "MyStringField":[]},
    "UniqueID3":{
      "MyStringField":[]},
... etc.

*** The highlighting now properly lists the matching field but still no useful values are
listed.

***************************************************************************
NOTE: if I change the query from using a Range clause to using a Wildcard query: q="MyStringField:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1: These are GOOD results!

  "highlighting":{
    "UniqueID1": {
      "MyStringField": ["<em>Avery, Roy</em>"]},
    "UniqueID2": {
      "MyStringField": ["<em>Avery, Roy</em>"]},
    "UniqueID3": {
      "MyStringField": [
        "<em>American Institute of Biological Sciences</em>",
        "<em>Albritton, Errett C.</em>"
      ]
    },
... etc.

*** This makes me think there is some problem with the way a Range query
*** feeds the search results to the Solr Highlighter code.

***************************************************************************
All attempts to vary the hl specs or any other query parameters do not solve the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

In summary, there is some incompatibility among:

	1) A multi-valued string field AND
	2) A range query against that field AND
	3) The result Highlighting. It is effectively empty.

I don't know when this issue was first introduced. I have recently been updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening
versions but I gave up to save my sanity.

You should be able to reproduce this issue using any multi-valued, indexed and stored string
field.



> Highlighting fails for Range queries on Multi-valued String fields
> ------------------------------------------------------------------
>
>                 Key: SOLR-13367
>                 URL: https://issues.apache.org/jira/browse/SOLR-13367
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 7.5, 7.7.1
>         Environment: RedHat Linux v7
> Java 1.8.0_201
>            Reporter: Karl Wolf
>            Priority: Major
>             Fix For: 5.1
>
>
> Range queries against multi-valued string fields produces useless highlighting, even
though "hl.highlightMultiTerm":"true"
> I have uncovered what I believe is a bug. At the very lease it is a difference in behavior
between Solr v5.1.0 and v7.5.0 (and v7.7.1).
> I have a multi-valued string Field defined in my schema as:
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> 
>  <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true"
/>
> I am using a query containing a Range clause and I am using highlighting to get the list
of values that actually matched the range query.
> All examples below were using the appropriate Solr Admin Server SolrCore Query page.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message