lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binoy Dalal <binoydala...@gmail.com>
Subject Re: Highlight brings the content from the first pages of pdf
Date Mon, 15 Feb 2016 01:28:08 GMT
What you've done so far will highlight every instance of "nietava" found in
the field, and return it, i.e., your entire field will return with all the
"nietava"s in <em> tags.
If you do not want the entire field, only portions of your field containing
the matched terms, then use hl.snippets parameter = the number of snippets
you want, in this particular case 3, along with the hl.fragsize parameter
set to the same number as your hl.mazAnalyzedChars (or a really large
number).

I suggest you go through the wiki documentation for highlighting once (
https://wiki.apache.org/solr/HighlightingParameters). It should answer all
of your questions regarding the use of the standard highlighter that you
might have.

As an additional note, I also suggest that you look into the
PostingsHighlighter (
https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter),
since you seem to be running highlighting on pretty big fields and postings
is much more efficient at highlighting huge fields as compared to the
standard highlighter.

On Mon, Feb 15, 2016 at 4:15 AM Evert R. <evert.ramos@gmail.com> wrote:

> Binoy,
>
> You are the man! =)
>
> Thank you very much!
>
> Would you by chance know how could I get the second highlight of the same
> word in the same file?
>
> Like: file_1.pdf (has three words "nietava") so..., how can I bring the
> highlighs for the three occurrences?
>
> I am pretty new around, should I send (open) another subject?
>
> Thanks again!
>
>
> *--Evert*
>
> 2016-02-14 16:40 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
>
> > Are you sure you've typed in the parameters correctly?
> > In your response it says flagsize instead of fragsize and
> maxanalzyedchars
> > instead of maxanalyzedchars.
> >
> > Ohh wait, I see that I made the analyzed typo. Awfully sorry for that,
> I'm
> > using my phone to send the mail out.
> >
> > On Sun, 14 Feb 2016, 23:53 Evert R. <evert.ramos@gmail.com> wrote:
> >
> > > Hi Binoy,
> > >
> > > thanks!
> > >
> > > Still not working, check the output:
> > >
> > > {
> > >   "responseHeader":{
> > >     "status":0,
> > >     "QTime":58,
> > >     "params":{
> > >       "q":"nietava",
> > >       "hl":"true",
> > >       "hl.simple.post":"</em>",
> > >       "indent":"true",
> > >       "fl":"id",
> > >       "hl.flagsize":"0",
> > >       "hl.fl":"content",
> > >       "hl.maxAnalzyedChars":"208400",
> > >       "wt":"json",
> > >       "hl.simple.pre":"<em>"}},
> > >   "response":{"numFound":1,"start":0,"docs":[
> > >       {
> > >         "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > >   },
> > >   "highlighting":{
> > >     "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > >
> > >
> > >
> > > *--Evert*
> > >
> > > 2016-02-14 14:31 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > >
> > > > Don't add this parameter to the searchComponent definition, because
> the
> > > > components where you've added it, GapFragmenter and RegexFragmenter,
> > > simply
> > > > don't use it.
> > > > Instead, add it to your request handler (/select etc.) if you've
> > > configured
> > > > highlighting in the handler or append it to your query:
> > > > *&hl.maxAnalzyedChars=<some_really_big_number>*.
> > > > Additionally also set the *hl.fragsize parameter to 0*, if your text
> is
> > > > larger than 51200 chars which it mostly is, in a similar fashion.
> > > >
> > > >
> > > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. <evert.ramos@gmail.com>
> > wrote:
> > > >
> > > > > Hi Binoy,
> > > > >
> > > > > I could not find this option in my solrconfig.xml file. ]
> > > > >
> > > > > I tryied to add this setting and nothing changed...
> > > > >
> > > > > Here is the code, I might miss placed:
> > > > >
> > > > > <code>
> > > > > <searchComponent class="solr.HighlightComponent" name="highlight">
> > > > >     <highlighting>
> > > > >       <!-- Configure the standard fragmenter -->
> > > > >       <!-- This could most likely be commented out in the "default"
> > > case
> > > > > -->
> > > > >       <fragmenter name="gap"
> > > > >                   default="true"
> > > > >                   class="solr.highlight.GapFragmenter">
> > > > >         <lst name="defaults">
> > > > >           <int name="hl.fragsize">400</int>
> > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > >         </lst>
> > > > >       </fragmenter>
> > > > >
> > > > >       <!-- A regular-expression-based fragmenter
> > > > >            (for sentence extraction)
> > > > >         -->
> > > > >       <fragmenter name="regex"
> > > > >                   class="solr.highlight.RegexFragmenter">
> > > > >         <lst name="defaults">
> > > > >           <!-- slightly smaller fragsizes work better because
of
> slop
> > > -->
> > > > >           <int name="hl.fragsize">200</int>
> > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > >           <!-- allow 50% slop on fragment sizes -->
> > > > >           <float name="hl.regex.slop">0.5</float>
> > > > >           <!-- a basic sentence pattern -->
> > > > >           <str name="hl.regex.pattern">[-\w
> > > > > ,/\n\&quot;&apos;]{20,200}</str>
> > > > >         </lst>
> > > > >       </fragmenter>
> > > > >
> > > > > </code>
> > > > >
> > > > > thanks!
> > > > >
> > > > >
> > > > > *--Evert*
> > > > >
> > > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > > > >
> > > > > > From the solr wiki:
> > > > > > hl.maxAnalyzedChars
> > > > > >
> > > > > > How many characters into a document to look for suitable
> > > > > > snippets  Solr1.3. This parameter makes sense for the original
> > > > > Highlighter
> > > > > > only.
> > > > > >
> > > > > > The default value is "51200".
> > > > > >
> > > > > > You can assign a large value to this parameter and use
> > hl.fragsize=0
> > > to
> > > > > > return highlighting in large fields that have size greater than
> > 51200
> > > > > > characters.
> > > > > >
> > > > > > I think this might be your hiccup.
> > > > > >
> > > > > > On Sun, 14 Feb 2016, 17:11 Evert R. <evert.ramos@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi Paul,
> > > > > > >
> > > > > > > Sorry my late reply.
> > > > > > >
> > > > > > > All the content is inside de docs. It brings the docs and
the
> pdf
> > > > file
> > > > > > that
> > > > > > > has the search word in it. But the highlight is not showing
if
> > the
> > > > > search
> > > > > > > word is after a few pages.
> > > > > > >
> > > > > > > Evert
> > > > > > >
> > > > > > >
> > > > > > > *--Evert*
> > > > > > >
> > > > > > > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht <paul@hoplahup.net>:
> > > > > > >
> > > > > > > > This looks like the stored content is shortened. Can
it be?
> > > > > > > > Can you see that inside the docs?
> > > > > > > >
> > > > > > > > paul
> > > > > > > >
> > > > > > > > > Evert R. <mailto:evert.ramos@gmail.com>
> > > > > > > > > 14 February 2016 at 11:26
> > > > > > > > > Hi There,
> > > > > > > > >
> > > > > > > > > I have a situation where started a techproducts,
without
> any
> > > > > > > > modification,
> > > > > > > > > post a pdf file. When searching as:
> > > > > > > > >
> > > > > > > > > q=text:search_word
> > > > > > > > > hl=true
> > > > > > > > > hl.fl=content
> > > > > > > > >
> > > > > > > > > It show the highlight accordingly! =)
> > > > > > > > >
> > > > > > > > > BUT... *if the "search_word" is after the first
pages* in
> my
> > > pdf
> > > > > > file,
> > > > > > > > > such
> > > > > > > > > as page 15...
> > > > > > > > >
> > > > > > > > > It simply *does not show* *the HIGHLIGHT*...
> > > > > > > > >
> > > > > > > > > Does anyone has faced this situation before?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *--Evert*
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Binoy Dalal
> > > > > >
> > > > >
> > > > --
> > > > Regards,
> > > > Binoy Dalal
> > > >
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message