lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teague James" <teag...@insystechinc.com>
Subject RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos
Date Wed, 01 Feb 2017 20:23:21 GMT
Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 1b, 2a, 2b,
3a, etc. to get highlighted in the documents. Additional testing shows that any alpha-numeric
combo returns a blank highlight, regardless of length. Thus, "pr0blem" will not highlight
because of the zero in the middle of the term.

I came across a ServerFault article where it was suggested that the fieldType must be tokenized
in order for highlighting to work correctly. Setting the field type to text_general was suggested
as a solution. In my case the data is stored as a string fieldType, which is then copied using
copyField to a field that has a fieldType of text_general, but I'm still not getting a good
highlight on terms like "1a". Highlighting works for any other non-alpha-numeric term though.

Other articles pointed to termVectors and termOffsets, but none of these seemed to help. Here's
 my config:

<field name="contents" type="string" indexed="true" stored="true" termPositions="true"
termVectors="true" termOffsets="true" />
<field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>
<copyField source="contents" dest="text"/>

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
		<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1"
generateNumberParts="0" generateWordParts="0" />
		<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="true"/>
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.PorterStemFilterFactory"/>
		<filter class="solr.ApostropheFilterFactory"/>
	</analyzer>
	<analyzer type="query">
 		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1"
generateNumberParts="0" generateWordParts="0" />
		<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.PorterStemFilterFactory"/>
		<filter class="solr.ApostropheFilterFactory"/>
	</analyzer>
</fieldType>

In the solrconfig file highlighting is set to use the text field: <str name="hl.fl">text</str>


Thoughts?

Appreciate the help! Thanks!

-Teague

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, February 1, 2017 2:49 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

How far into the text field are these tokens? The highlighter defaults to the first 10K characters
under control of hl.maxAnalyzedChars. It's vaguely possible that the values happen to be farther
along in the text than that. Not likely, mind you but possible.

Best,
Erick

On Wed, Feb 1, 2017 at 8:24 AM, Teague James <teaguej@insystechinc.com> wrote:
> Hello everyone! I'm still stuck on this issue and could really use 
> some help. I have a Solr 6.0.0 instance that is storing documents 
> peppered with text like "1a", "2e", "4c", etc. If I search the 
> documents for a word, "ms", "in", "the", etc., I get the correct 
> number of hits and the results are highlighted correctly in the 
> highlighting section. But when I search for "1a" or "2e" I get hits, 
> but the highlights are blank. Further testing revealed that the 
> highlighter fails to highlight any combination of alpha-numeric two character value,
such a n0, b1, 1z, etc.:
> <result name="response" numFound="1" start="0"> ...
> <lst name="highlighting">
> <lst name="8667"/>
>
> Where "8667" is the document ID of the record that had the hit, but no 
> highlight. Other searches, "ms" for example, return:
> <result name="response" numFound="1" start="0"> ...
> <lst name="highlighting">
>  <lst name="8667"/>
>   <arr name="text">
>    <str>
>     <em>MS</em>
>    </str>
>   </arr>
>  </lst>
> </lst>
>
> Why does highlighting fail for "1a" type searches? Any help is appreciated!
> Thanks!
>
> -Teague James
>


Mime
View raw message