lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tech Behemoth <tech.behem...@gmail.com>
Subject Re: highlighting with best text fragment from multi-value field
Date Thu, 15 Dec 2016 02:13:24 GMT
Hi all

Any idea of best practice for getting fragmented highlighted string (
Lucene 5.3.2)   of multi-value field?

Thanks

On Mon, Dec 12, 2016 at 12:11 AM, Tech Behemoth <tech.behemoth@gmail.com>
wrote:

> Hi all
>
> How to provide highlighting for fragmented string which is created from
> multi-value field using Lucene 5.3.2 ?
> Is any known solution for it?
>
> 1. Or first approach -  merge all multi-value into one single value and
> apply
>
> highlighter.getBestTextFragments(tokenStream, text, false,
> maxNumFragments);
>
> however we got few fragments which may break boundary between few original
> values since no any delimiters are added to the string.
>
>
> 2. Second approach to get fragmented highlighted value from each value of
> mutli-value field and then  form the indexed list from top scored
> fragments)
>
> We got :
>    Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
> Token three exceeds length of provided text sized 32
>    org.apache.lucene.search.highlight.Highlighter.getBestTextFr
> agments(Highlighter.java:224)
> if we use setStoreTermVectorOffsets when index the field.
>
> If the index* does not *set setStoreTermVectorOffsets the exception is
> not thrown
>       FieldType fType = new FieldType();
>      ...
>     //  fType.setStoreTermVectorOffsets(true);
>
> However the fragment size is much bigger than requested fragment size..
> Please expose correct technique to get fragmented highlighted string for
> munti-value field.
>
>
>  Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message