lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Derrick <sc...@tnstaafl.net>
Subject Re: Highlighting, all matches show empty {}
Date Wed, 12 Aug 2015 16:19:21 GMT
Erick

Sorry for the newbie mistakes on this.

I think I will take your "implied" advice and add some fields that 
encompass the areas I want to search, thus eliminating the areas I don't 
want searched..

And I will take a look at the FastVectorHighlighter, because it will be 
large text fields.

thanks,

Scott

-------- Original Message --------
Subject: Re: Highlighting, all matches show empty {}
From: Erick Erickson <erickerickson@gmail.com>
To: solr-user@lucene.apache.org
Date: 08/12/2015 10:06 AM

> bq: You mention the I was searching for concord and that its not in
> any documents.  But the results below clearly show 3 hits
>
> Right, as you figured out I _really_ meant "concord in any stored
> fields you were including in the hl.fl parameter". That could have
> been clearer.
>
> bq: Is there a problem with storing _text_  so I can get a highlight
> fragment when a hit is found there?
>
> No, you can store the data in the _text_ field just fine, you'll have
> to re-index after the change though. It's often more useful to a user
> to see the highlights in specific fields though, so I wouldn't throw
> the rest of the highlighting away.
>
> You should probably see the FastVectorHighlighter though. If you don't
> use FVH, highlighting re-analyzes the raw text to produce the snippets
> which may be expensive for large text fields.
>
> Best,
> Erick
>
>
> On Wed, Aug 12, 2015 at 8:46 AM, Scott Derrick <scott@tnstaafl.net> wrote:
>> Erick,
>>
>> that explains it. I figured I didn't understand how solr handled highlight
>> fragments.
>>
>> Most of my documents are just text. or as solr specifies that content
>> _text_, which is not stored, by default.
>>
>> You mention the I was searching for concord and that its not in any
>> documents.  But the results below clearly show 3 hits
>>
>>>>     "response":{"numFound":3,"start":0,"docs":[
>>
>> the problem is the hits are in _text_
>>
>> Is there a problem with storing _text_  so I can get a highlight fragment
>> when a hit is found there?
>>
>> Scott
>>
>> -------- Original Message --------
>> Subject: Re: Highlighting, all matches show empty {}
>> From: Erick Erickson <erickerickson@gmail.com>
>> To: solr-user@lucene.apache.org
>> Date: 08/12/2015 09:27 AM
>>
>>> Well, the example you just showed shouldn't show any highlighting. Your
>>> query is
>>> q=concord
>>> so it's trying to highlight "concord" which isn't in any of your
>>> documents. hl.q can be
>>> used to highlight something other than your q parameter.
>>>
>>> I did notice in some of your other examples that you seemed to be
>>> searching for
>>> terms that were in the fields so I suspect this isn't really your root
>>> problem though.
>>>
>>> do note that fields _must_ be stored to have highlighting work. Is it
>>> possible that your
>>> matches are on fields that aren't stored?
>>>
>>> Let's build it up slowly though, try searching on one term in one
>>> field that you _know_
>>> is stored and see if you get anything back. While the query with
>>> hl.fl=* and fl=field1, field2,
>>> should be fine, let's start as simply as possible and work up maybe?
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Aug 12, 2015 at 7:59 AM, Scott Derrick <scott@tnstaafl.net> wrote:
>>>>
>>>> I think the highlighter is actually running, but I'm not getting the
>>>> results??
>>>>
>>>> with this request
>>>>
>>>>
>>>> http://localhost:8983/solr/mbepp/select?q=concord&fl=accession%2C+title%2C+author%2C+date&wt=json&indent=true&hl=true&hl.fl=*
>>>>
>>>>
>>>> I get this response
>>>>
>>>> {
>>>>     "responseHeader":{
>>>>       "status":0,
>>>>       "QTime":3,
>>>>       "params":{
>>>>         "q":"concord",
>>>>         "hl":"true",
>>>>         "indent":"true",
>>>>         "fl":"accession, title, author, date",
>>>>         "hl.fl":"*",
>>>>         "wt":"json"}},
>>>>     "response":{"numFound":3,"start":0,"docs":[
>>>>         {
>>>>           "date":"1890-02-26",
>>>>           "author":"Mary Baker Eddy",
>>>>           "accession":"L13943",
>>>>           "title":["Mary Baker Eddy to Joseph E. Adams,"]},
>>>>         {
>>>>           "date":"1896-01-13",
>>>>           "author":"Mary Baker Eddy",
>>>>           "accession":"L03453",
>>>>           "title":["Mary Baker Eddy to Ira O. Knapp,"]},
>>>>         {
>>>>           "date":"1902-06-15",
>>>>           "author":"Mary Baker Eddy",
>>>>           "accession":"A10145",
>>>>           "title":["Message of the Pastor Emeritus to The First Church of
>>>> Christ, Scientist, Boston, Mass., June 15, 1902"]}]
>>>>     },
>>>>     "highlighting":{
>>>>
>>>>
>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/L13943/L13943.html":{},
>>>>
>>>>
>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/L03453/L03453.html":{},
>>>>
>>>>
>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/A10145/A10145.html":{}}}
>>>>
>>>> When I ran the request.
>>>> In the admin plubins/Stats I set "Watch Changes" before processing the
>>>> request.  Highlighting showed 2 changes, the gapFragmenter and
>>>> HTMLFormatter
>>>>
>>>> here are the reported changes
>>>>
>>>> org.apache.solr.highlight.GapFragmenter
>>>>       class: org.apache.solr.highlight.GapFragmenter
>>>>       version: 5.2.1
>>>>       description: GapFragmenter
>>>>       stats: requests: Was: 117, Now: 156, Delta: 39
>>>>
>>>> org.apache.solr.highlight.HtmlFormatter
>>>>       class: org.apache.solr.highlight.HtmlFormatter
>>>>       version:5.2.1
>>>>       description:HtmlFormatter
>>>>       stats: requests: Was: 117, Now: 156, Delta: 39
>>>>
>>>> Looks to me like there were 39 fragments or something processed, yet you
>>>> can
>>>> see above the highlights are empty {}???
>>>>
>>>> though all the the other libraries in the highlighter showed no changes.
>>>>
>>>> which are these...
>>>>
>>>>       org.apache.solr.highlight.BreakIteratorBoundaryScanner
>>>>       org.apache.solr.highlight.HtmlEncoder
>>>>       org.apache.solr.highlight.RegexFragmenter
>>>>       org.apache.solr.highlight.ScoreOrderFragmentsBuilder
>>>>       org.apache.solr.highlight.SimpleBoundaryScanner
>>>>       org.apache.solr.highlight.SimpleFragListBuilder
>>>>       org.apache.solr.highlight.SingleFragListBuilder
>>>>       org.apache.solr.highlight.WeightedFragListBuilder
>>>>
>>>>
>>>> Scott
>>>>
>>>> -------- Original Message --------
>>>> Subject: Highlighting, all matches show empty {}
>>>> From: Scott Derrick <scott@tnstaafl.net>
>>>> To: solr-user@lucene.apache.org
>>>> Date: 08/12/2015 08:20 AM
>>>>
>>>>> Tried submitting a filed for hl.fl still empty {}
>>>>>
>>>>> here are the query terms
>>>>>
>>>>> "responseHeader": {
>>>>>        "status": 0,
>>>>>        "QTime": 8,
>>>>>        "params": {
>>>>>          "q": "mary or calvin",
>>>>>          "hl": "true",
>>>>>          "hl.simple.post": "</em>",
>>>>>          "indent": "true",
>>>>>          "fl": "accession, title, author, date",
>>>>>          "hl.fl": "*",
>>>>>          "wt": "json",
>>>>>          "hl.simple.pre": "<em>",
>>>>>          "_": "1439388969240"
>>>>>        }
>>>>>
>>>>> here is one of the responses, there were 135
>>>>>
>>>>> {
>>>>>            "date": "1886-07-06",
>>>>>            "author": "Mary Baker Eddy",
>>>>>            "accession": "L02634",
>>>>>            "title": [
>>>>>              "Mary Baker Eddy to Josephine C. Woodbury, July 6, 1886"
>>>>>            ]
>>>>> },
>>>>>
>>>>> here is the highlight section listing the first 10 matches, still empty
>>>>> {}
>>>>>
>>>>> "highlighting": {
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L02634/L02634.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10720/A10720.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L07894/L07894.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L09828/L09828.html":
>>>>> {},
>>>>>
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10636D/A10636D.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L13943/L13943.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html":
>>>>> {},
>>>>>
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10879/A10879.html":
>>>>> {},
>>>>>
>>>>>
>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L00003/L00003.html":
>>>>> {}
>>>>>      }
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: Re: Highlighting
>>>>> From: Scott Derrick <scott@tnstaafl.net>
>>>>> To: solr-user@lucene.apache.org
>>>>> Date: 08/12/2015 06:39 AM
>>>>>
>>>>>> I was pretty sure I tried that, though I thought if you don't specify
>>>>>> it
>>>>>> just uses the search terms?
>>>>>>
>>>>>> If I just search for "calvin" and don't specify a field, what do
I
>>>>>> assign hl.fl?
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On 8/11/2015 7:27 PM, Erik Hatcher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Scott - doesn’t look you’ve specified hl.fl specifying which
field(s)
>>>>>>> to highlight.
>>>>>>>
>>>>>>> p.s. Erick Erickson surely likes your e-mail domain :)
>>>>>>>
>>>>>>>
>>>>>>> —
>>>>>>> Erik Hatcher, Senior Solutions Architect
>>>>>>> http://www.lucidworks.com <http://www.lucidworks.com/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 11, 2015, at 9:02 PM, Scott Derrick <scott@tnstaafl.net>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I guess I really don't get Highlighting in Solr.
>>>>>>>>
>>>>>>>> We are transitioning from Google Custom Search which generally
sucks,
>>>>>>>> but does return nicely formatted highlighted fragment.
>>>>>>>>
>>>>>>>> I turn highlighting on hl=true in the query and I get a highlighting
>>>>>>>> section returned at the bottom of the page, each identified
by the
>>>>>>>> document file name with a empty {} .  It doesn't matter what
I search
>>>>>>>> for, plain text, a field, I get a list of documents followed
by an
>>>>>>>> empty brace?
>>>>>>>>
>>>>>>>> "highlighting": {
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L00003/L00003.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html":
>>>>>>>>
>>>>>>>> {},
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html":
>>>>>>>>
>>>>>>>> {}
>>>>>>>>     }
>>>>>>>>
>>>>>>>> I haven't made any changes to the default settings
>>>>>>>>
>>>>>>>>      <highlighting>
>>>>>>>>         <!-- Configure the standard fragmenter -->
>>>>>>>>         <!-- This could most likely be commented out in
the "default"
>>>>>>>> case -->
>>>>>>>>         <fragmenter name="gap"
>>>>>>>>                     default="true"
>>>>>>>>                     class="solr.highlight.GapFragmenter">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <int name="hl.fragsize">100</int>
>>>>>>>>           </lst>
>>>>>>>>         </fragmenter>
>>>>>>>>
>>>>>>>>         <!-- A regular-expression-based fragmenter
>>>>>>>>              (for sentence extraction)
>>>>>>>>           -->
>>>>>>>>         <fragmenter name="regex"
>>>>>>>>                     class="solr.highlight.RegexFragmenter">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <!-- slightly smaller fragsizes work better
because of
>>>>>>>> slop
>>>>>>>> -->
>>>>>>>>             <int name="hl.fragsize">70</int>
>>>>>>>>             <!-- allow 50% slop on fragment sizes -->
>>>>>>>>             <float name="hl.regex.slop">0.5</float>
>>>>>>>>             <!-- a basic sentence pattern -->
>>>>>>>>             <str name="hl.regex.pattern">[-\w
>>>>>>>> ,/\n\&quot;&apos;]{20,200}</str>
>>>>>>>>           </lst>
>>>>>>>>         </fragmenter>
>>>>>>>>
>>>>>>>>         <!-- Configure the standard formatter -->
>>>>>>>>         <formatter name="html"
>>>>>>>>                    default="true"
>>>>>>>>                    class="solr.highlight.HtmlFormatter">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <str name="hl.simple.pre"><![CDATA[<em>]]></str>
>>>>>>>>             <str name="hl.simple.post"><![CDATA[</em>]]></str>
>>>>>>>>           </lst>
>>>>>>>>         </formatter>
>>>>>>>>
>>>>>>>>         <!-- Configure the standard encoder -->
>>>>>>>>         <encoder name="html"
>>>>>>>>                  class="solr.highlight.HtmlEncoder" />
>>>>>>>>
>>>>>>>>         <!-- Configure the standard fragListBuilder -->
>>>>>>>>         <fragListBuilder name="simple"
>>>>>>>>
>>>>>>>> class="solr.highlight.SimpleFragListBuilder"/>
>>>>>>>>
>>>>>>>>         <!-- Configure the single fragListBuilder -->
>>>>>>>>         <fragListBuilder name="single"
>>>>>>>>
>>>>>>>> class="solr.highlight.SingleFragListBuilder"/>
>>>>>>>>
>>>>>>>>         <!-- Configure the weighted fragListBuilder -->
>>>>>>>>         <fragListBuilder name="weighted"
>>>>>>>>                          default="true"
>>>>>>>>
>>>>>>>> class="solr.highlight.WeightedFragListBuilder"/>
>>>>>>>>
>>>>>>>>         <!-- default tag FragmentsBuilder -->
>>>>>>>>         <fragmentsBuilder name="default"
>>>>>>>>                           default="true"
>>>>>>>>
>>>>>>>> class="solr.highlight.ScoreOrderFragmentsBuilder">
>>>>>>>>           <!--
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <str name="hl.multiValuedSeparatorChar">/</str>
>>>>>>>>           </lst>
>>>>>>>>           -->
>>>>>>>>         </fragmentsBuilder>
>>>>>>>>
>>>>>>>>         <!-- multi-colored tag FragmentsBuilder -->
>>>>>>>>         <fragmentsBuilder name="colored"
>>>>>>>>
>>>>>>>> class="solr.highlight.ScoreOrderFragmentsBuilder">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <str name="hl.tag.pre"><![CDATA[
>>>>>>>>                  <b style="background:yellow">,<b
>>>>>>>> style="background:lawgreen">,
>>>>>>>>                  <b style="background:aquamarine">,<b
>>>>>>>> style="background:magenta">,
>>>>>>>>                  <b style="background:palegreen">,<b
>>>>>>>> style="background:coral">,
>>>>>>>>                  <b style="background:wheat">,<b
>>>>>>>> style="background:khaki">,
>>>>>>>>                  <b style="background:lime">,<b
>>>>>>>> style="background:deepskyblue">]]></str>
>>>>>>>>             <str name="hl.tag.post"><![CDATA[</b>]]></str>
>>>>>>>>           </lst>
>>>>>>>>         </fragmentsBuilder>
>>>>>>>>
>>>>>>>>         <boundaryScanner name="default"
>>>>>>>>                          default="true"
>>>>>>>>                          class="solr.highlight.SimpleBoundaryScanner">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <str name="hl.bs.maxScan">10</str>
>>>>>>>>             <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
>>>>>>>>           </lst>
>>>>>>>>         </boundaryScanner>
>>>>>>>>
>>>>>>>>         <boundaryScanner name="breakIterator"
>>>>>>>>
>>>>>>>> class="solr.highlight.BreakIteratorBoundaryScanner">
>>>>>>>>           <lst name="defaults">
>>>>>>>>             <!-- type should be one of CHARACTER, WORD(default),
LINE
>>>>>>>> and SENTENCE -->
>>>>>>>>             <str name="hl.bs.type">WORD</str>
>>>>>>>>             <!-- language and country are used when constructing
>>>>>>>> Locale
>>>>>>>> object.  -->
>>>>>>>>             <!-- And the Locale object will be used when
getting
>>>>>>>> instance of BreakIterator -->
>>>>>>>>             <str name="hl.bs.language">en</str>
>>>>>>>>             <str name="hl.bs.country">US</str>
>>>>>>>>           </lst>
>>>>>>>>         </boundaryScanner>
>>>>>>>>       </highlighting>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> This email has been checked for viruses by Avast antivirus software.
>>>>>> https://www.avast.com/antivirus
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> One man's "magic" is another man's engineering. "Supernatural" is a null
>>>> word.”
>>>> Robert A. Heinlein
>>>>
>>>
>>
>> --
>> He who knows others is wise;
>> He who know himself is enlightened.
>> Lao-tzu
>>
>

-- 
Sin makes its own hell, and goodness its own heaven.
Mary Baker Eddy


Mime
View raw message