lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Stults <sstu...@opensourceconnections.com>
Subject Re: Highlighting large documents
Date Tue, 08 Dec 2015 16:50:38 GMT
There are two things going on that you should be aware of. The first is,
Solr Highlighting is mainly concerned about putting a representative
snippet in a results listing. There are a couple of configuration changes
you need to do if you want to highlight a whole document, like setting the
fragListBuilder to SingleFragListBuilder and the maxAnalyzedChars setting
you've already mentioned:

https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize

Because full document highlighting is so different from highlighting
snippets in a result list you'll want to configure two different
highlighters: One for snippets and one for the full document.

The other thing you need to know is that performance in highlighting is an
active area of development. Right now the top docs in the current result
list are calculated completely separate from the snippets (highlighting),
which can lead to problems when the most relevant snippets are later in the
document.

What most people do is compromise by making the result list fast but
inaccurate, and having the full-document highlight be accurate but slower.


Hope that helps,
-Scott


On Fri, Dec 4, 2015 at 11:12 AM, Andrea Gazzarini <a.gazzarini@gmail.com>
wrote:

> No no, sorry, the project is not yet started so I didn't experience your
> issue, but I'll be a careful listener of this thread
>
> Best,
> Andrea
>
> 2015-12-04 17:04 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
>
> > Hi Andrea,
> >
> > I'm using the original highlighter.
> >
> > Below is my configuration for the highlighter in solrconfig.xml
> >
> >   <requestHandler name="/highlight" class="solr.SearchHandler">
> >        <lst name="defaults">
> >            <str name="echoParams">explicit</str>
> >            <int name="rows">10</int>
> >            <str name="wt">json</str>
> >            <str name="indent">true</str>
> >   <str name="df">text</str>
> >   <str name="fl">id, title, content_type, last_modified, url, score
> </str>
> >
> >   <str name="hl">on</str>
> >            <str name="hl.fl">id, title, content, author </str>
> >   <str name="hl.highlightMultiTerm">true</str>
> >            <str name="hl.preserveMulti">true</str>
> >            <str name="hl.encoder">html</str>
> >   <str name="hl.fragsize">200</str>
> >   <str name="hl.maxAnalyzedChars">1000000</str>
> >
> > <str name="group">true</str>
> > <str name="group.field">signature</str>
> > <str name="group.main">true</str>
> > <str name="group.cache.percent">100</str>
> >       </lst>
> >   </requestHandler>
> >
> >
> > Have you managed to solve the problem?
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 December 2015 at 23:54, Andrea Gazzarini <a.gazzarini@gmail.com>
> > wrote:
> >
> > > Hi Zheng,
> > > just curiousity, because shortly I will have to deal with a similar
> > > scenario (Solr 5.3.1 + large documents + highlighting).
> > > Which highlighter are you using?
> > >
> > > Andrea
> > >
> > > 2015-12-04 16:51 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> > >
> > > > Hi,
> > > >
> > > > I'm using Solr 5.3.0
> > > >
> > > > I found that in large documents, sometimes I face situation that
> when I
> > > do
> > > > a highlight query, the resultset that is returned does not contain
> the
> > > > highlighted query. There are actually matches in the documents, but
> > just
> > > > that they located further back in the documents.
> > > >
> > > > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > > > default value is 51200, and I have documents that are much larger
> than
> > > > 51200 characters. Although this method works, but, when I increase
> this
> > > > value, the performance of the search and highlight drops. It can drop
> > > from
> > > > less than 0.5 seconds to more than 10 seconds.
> > > >
> > > > Would like to check, is this method of increasing the value of the
> > > > hl.maxAnalyzedChars the best method to use, or is there other ways
> > which
> > > > can solve the same purpose, but without affecting the performance
> much?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> >
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message