lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evert R." <evert.ra...@gmail.com>
Subject Re: Highlight brings the content from the first pages of pdf
Date Sun, 14 Feb 2016 22:44:59 GMT
Binoy,

You are the man! =)

Thank you very much!

Would you by chance know how could I get the second highlight of the same
word in the same file?

Like: file_1.pdf (has three words "nietava") so..., how can I bring the
highlighs for the three occurrences?

I am pretty new around, should I send (open) another subject?

Thanks again!


*--Evert*

2016-02-14 16:40 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:

> Are you sure you've typed in the parameters correctly?
> In your response it says flagsize instead of fragsize and maxanalzyedchars
> instead of maxanalyzedchars.
>
> Ohh wait, I see that I made the analyzed typo. Awfully sorry for that, I'm
> using my phone to send the mail out.
>
> On Sun, 14 Feb 2016, 23:53 Evert R. <evert.ramos@gmail.com> wrote:
>
> > Hi Binoy,
> >
> > thanks!
> >
> > Still not working, check the output:
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":58,
> >     "params":{
> >       "q":"nietava",
> >       "hl":"true",
> >       "hl.simple.post":"</em>",
> >       "indent":"true",
> >       "fl":"id",
> >       "hl.flagsize":"0",
> >       "hl.fl":"content",
> >       "hl.maxAnalzyedChars":"208400",
> >       "wt":"json",
> >       "hl.simple.pre":"<em>"}},
> >   "response":{"numFound":1,"start":0,"docs":[
> >       {
> >         "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> >   },
> >   "highlighting":{
> >     "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> >
> >
> >
> > *--Evert*
> >
> > 2016-02-14 14:31 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> >
> > > Don't add this parameter to the searchComponent definition, because the
> > > components where you've added it, GapFragmenter and RegexFragmenter,
> > simply
> > > don't use it.
> > > Instead, add it to your request handler (/select etc.) if you've
> > configured
> > > highlighting in the handler or append it to your query:
> > > *&hl.maxAnalzyedChars=<some_really_big_number>*.
> > > Additionally also set the *hl.fragsize parameter to 0*, if your text is
> > > larger than 51200 chars which it mostly is, in a similar fashion.
> > >
> > >
> > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. <evert.ramos@gmail.com>
> wrote:
> > >
> > > > Hi Binoy,
> > > >
> > > > I could not find this option in my solrconfig.xml file. ]
> > > >
> > > > I tryied to add this setting and nothing changed...
> > > >
> > > > Here is the code, I might miss placed:
> > > >
> > > > <code>
> > > > <searchComponent class="solr.HighlightComponent" name="highlight">
> > > >     <highlighting>
> > > >       <!-- Configure the standard fragmenter -->
> > > >       <!-- This could most likely be commented out in the "default"
> > case
> > > > -->
> > > >       <fragmenter name="gap"
> > > >                   default="true"
> > > >                   class="solr.highlight.GapFragmenter">
> > > >         <lst name="defaults">
> > > >           <int name="hl.fragsize">400</int>
> > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > >         </lst>
> > > >       </fragmenter>
> > > >
> > > >       <!-- A regular-expression-based fragmenter
> > > >            (for sentence extraction)
> > > >         -->
> > > >       <fragmenter name="regex"
> > > >                   class="solr.highlight.RegexFragmenter">
> > > >         <lst name="defaults">
> > > >           <!-- slightly smaller fragsizes work better because of slop
> > -->
> > > >           <int name="hl.fragsize">200</int>
> > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > >           <!-- allow 50% slop on fragment sizes -->
> > > >           <float name="hl.regex.slop">0.5</float>
> > > >           <!-- a basic sentence pattern -->
> > > >           <str name="hl.regex.pattern">[-\w
> > > > ,/\n\&quot;&apos;]{20,200}</str>
> > > >         </lst>
> > > >       </fragmenter>
> > > >
> > > > </code>
> > > >
> > > > thanks!
> > > >
> > > >
> > > > *--Evert*
> > > >
> > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > > >
> > > > > From the solr wiki:
> > > > > hl.maxAnalyzedChars
> > > > >
> > > > > How many characters into a document to look for suitable
> > > > > snippets  Solr1.3. This parameter makes sense for the original
> > > > Highlighter
> > > > > only.
> > > > >
> > > > > The default value is "51200".
> > > > >
> > > > > You can assign a large value to this parameter and use
> hl.fragsize=0
> > to
> > > > > return highlighting in large fields that have size greater than
> 51200
> > > > > characters.
> > > > >
> > > > > I think this might be your hiccup.
> > > > >
> > > > > On Sun, 14 Feb 2016, 17:11 Evert R. <evert.ramos@gmail.com>
wrote:
> > > > >
> > > > > > Hi Paul,
> > > > > >
> > > > > > Sorry my late reply.
> > > > > >
> > > > > > All the content is inside de docs. It brings the docs and the
pdf
> > > file
> > > > > that
> > > > > > has the search word in it. But the highlight is not showing
if
> the
> > > > search
> > > > > > word is after a few pages.
> > > > > >
> > > > > > Evert
> > > > > >
> > > > > >
> > > > > > *--Evert*
> > > > > >
> > > > > > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht <paul@hoplahup.net>:
> > > > > >
> > > > > > > This looks like the stored content is shortened. Can it
be?
> > > > > > > Can you see that inside the docs?
> > > > > > >
> > > > > > > paul
> > > > > > >
> > > > > > > > Evert R. <mailto:evert.ramos@gmail.com>
> > > > > > > > 14 February 2016 at 11:26
> > > > > > > > Hi There,
> > > > > > > >
> > > > > > > > I have a situation where started a techproducts, without
any
> > > > > > > modification,
> > > > > > > > post a pdf file. When searching as:
> > > > > > > >
> > > > > > > > q=text:search_word
> > > > > > > > hl=true
> > > > > > > > hl.fl=content
> > > > > > > >
> > > > > > > > It show the highlight accordingly! =)
> > > > > > > >
> > > > > > > > BUT... *if the "search_word" is after the first pages*
in my
> > pdf
> > > > > file,
> > > > > > > > such
> > > > > > > > as page 15...
> > > > > > > >
> > > > > > > > It simply *does not show* *the HIGHLIGHT*...
> > > > > > > >
> > > > > > > > Does anyone has faced this situation before?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > >
> > > > > > > > *--Evert*
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > > Regards,
> > > > > Binoy Dalal
> > > > >
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message