lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Ehle <marke...@gmail.com>
Subject Re: Highlight brings the content from the first pages of pdf
Date Mon, 15 Feb 2016 01:47:30 GMT
is all the text being indexed? Check to make sure that there's actually the
data you are looking for in the index. Is there a setting in tika that
limits how much is indexed? I seem to remember confronting this problem
myself once, and the data that I wanted just wasn't in the index because it
was never put there in the first place.Something about setMaxStringLength
orsomething.

On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal <binoydalal93@gmail.com> wrote:

> What you've done so far will highlight every instance of "nietava" found in
> the field, and return it, i.e., your entire field will return with all the
> "nietava"s in <em> tags.
> If you do not want the entire field, only portions of your field containing
> the matched terms, then use hl.snippets parameter = the number of snippets
> you want, in this particular case 3, along with the hl.fragsize parameter
> set to the same number as your hl.mazAnalyzedChars (or a really large
> number).
>
> I suggest you go through the wiki documentation for highlighting once (
> https://wiki.apache.org/solr/HighlightingParameters). It should answer all
> of your questions regarding the use of the standard highlighter that you
> might have.
>
> As an additional note, I also suggest that you look into the
> PostingsHighlighter (
> https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter),
> since you seem to be running highlighting on pretty big fields and postings
> is much more efficient at highlighting huge fields as compared to the
> standard highlighter.
>
> On Mon, Feb 15, 2016 at 4:15 AM Evert R. <evert.ramos@gmail.com> wrote:
>
> > Binoy,
> >
> > You are the man! =)
> >
> > Thank you very much!
> >
> > Would you by chance know how could I get the second highlight of the same
> > word in the same file?
> >
> > Like: file_1.pdf (has three words "nietava") so..., how can I bring the
> > highlighs for the three occurrences?
> >
> > I am pretty new around, should I send (open) another subject?
> >
> > Thanks again!
> >
> >
> > *--Evert*
> >
> > 2016-02-14 16:40 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> >
> > > Are you sure you've typed in the parameters correctly?
> > > In your response it says flagsize instead of fragsize and
> > maxanalzyedchars
> > > instead of maxanalyzedchars.
> > >
> > > Ohh wait, I see that I made the analyzed typo. Awfully sorry for that,
> > I'm
> > > using my phone to send the mail out.
> > >
> > > On Sun, 14 Feb 2016, 23:53 Evert R. <evert.ramos@gmail.com> wrote:
> > >
> > > > Hi Binoy,
> > > >
> > > > thanks!
> > > >
> > > > Still not working, check the output:
> > > >
> > > > {
> > > >   "responseHeader":{
> > > >     "status":0,
> > > >     "QTime":58,
> > > >     "params":{
> > > >       "q":"nietava",
> > > >       "hl":"true",
> > > >       "hl.simple.post":"</em>",
> > > >       "indent":"true",
> > > >       "fl":"id",
> > > >       "hl.flagsize":"0",
> > > >       "hl.fl":"content",
> > > >       "hl.maxAnalzyedChars":"208400",
> > > >       "wt":"json",
> > > >       "hl.simple.pre":"<em>"}},
> > > >   "response":{"numFound":1,"start":0,"docs":[
> > > >       {
> > > >         "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > > >   },
> > > >   "highlighting":{
> > > >     "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > > >
> > > >
> > > >
> > > > *--Evert*
> > > >
> > > > 2016-02-14 14:31 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > > >
> > > > > Don't add this parameter to the searchComponent definition, because
> > the
> > > > > components where you've added it, GapFragmenter and
> RegexFragmenter,
> > > > simply
> > > > > don't use it.
> > > > > Instead, add it to your request handler (/select etc.) if you've
> > > > configured
> > > > > highlighting in the handler or append it to your query:
> > > > > *&hl.maxAnalzyedChars=<some_really_big_number>*.
> > > > > Additionally also set the *hl.fragsize parameter to 0*, if your
> text
> > is
> > > > > larger than 51200 chars which it mostly is, in a similar fashion.
> > > > >
> > > > >
> > > > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. <evert.ramos@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Binoy,
> > > > > >
> > > > > > I could not find this option in my solrconfig.xml file. ]
> > > > > >
> > > > > > I tryied to add this setting and nothing changed...
> > > > > >
> > > > > > Here is the code, I might miss placed:
> > > > > >
> > > > > > <code>
> > > > > > <searchComponent class="solr.HighlightComponent"
> name="highlight">
> > > > > >     <highlighting>
> > > > > >       <!-- Configure the standard fragmenter -->
> > > > > >       <!-- This could most likely be commented out in the
> "default"
> > > > case
> > > > > > -->
> > > > > >       <fragmenter name="gap"
> > > > > >                   default="true"
> > > > > >                   class="solr.highlight.GapFragmenter">
> > > > > >         <lst name="defaults">
> > > > > >           <int name="hl.fragsize">400</int>
> > > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > > >         </lst>
> > > > > >       </fragmenter>
> > > > > >
> > > > > >       <!-- A regular-expression-based fragmenter
> > > > > >            (for sentence extraction)
> > > > > >         -->
> > > > > >       <fragmenter name="regex"
> > > > > >                   class="solr.highlight.RegexFragmenter">
> > > > > >         <lst name="defaults">
> > > > > >           <!-- slightly smaller fragsizes work better because
of
> > slop
> > > > -->
> > > > > >           <int name="hl.fragsize">200</int>
> > > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > > >           <!-- allow 50% slop on fragment sizes -->
> > > > > >           <float name="hl.regex.slop">0.5</float>
> > > > > >           <!-- a basic sentence pattern -->
> > > > > >           <str name="hl.regex.pattern">[-\w
> > > > > > ,/\n\&quot;&apos;]{20,200}</str>
> > > > > >         </lst>
> > > > > >       </fragmenter>
> > > > > >
> > > > > > </code>
> > > > > >
> > > > > > thanks!
> > > > > >
> > > > > >
> > > > > > *--Evert*
> > > > > >
> > > > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > > > > >
> > > > > > > From the solr wiki:
> > > > > > > hl.maxAnalyzedChars
> > > > > > >
> > > > > > > How many characters into a document to look for suitable
> > > > > > > snippets  Solr1.3. This parameter makes sense for the
original
> > > > > > Highlighter
> > > > > > > only.
> > > > > > >
> > > > > > > The default value is "51200".
> > > > > > >
> > > > > > > You can assign a large value to this parameter and use
> > > hl.fragsize=0
> > > > to
> > > > > > > return highlighting in large fields that have size greater
than
> > > 51200
> > > > > > > characters.
> > > > > > >
> > > > > > > I think this might be your hiccup.
> > > > > > >
> > > > > > > On Sun, 14 Feb 2016, 17:11 Evert R. <evert.ramos@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi Paul,
> > > > > > > >
> > > > > > > > Sorry my late reply.
> > > > > > > >
> > > > > > > > All the content is inside de docs. It brings the docs
and the
> > pdf
> > > > > file
> > > > > > > that
> > > > > > > > has the search word in it. But the highlight is not
showing
> if
> > > the
> > > > > > search
> > > > > > > > word is after a few pages.
> > > > > > > >
> > > > > > > > Evert
> > > > > > > >
> > > > > > > >
> > > > > > > > *--Evert*
> > > > > > > >
> > > > > > > > 2016-02-14 8:36 GMT-02:00 Paul Libbrecht <paul@hoplahup.net
> >:
> > > > > > > >
> > > > > > > > > This looks like the stored content is shortened.
Can it be?
> > > > > > > > > Can you see that inside the docs?
> > > > > > > > >
> > > > > > > > > paul
> > > > > > > > >
> > > > > > > > > > Evert R. <mailto:evert.ramos@gmail.com>
> > > > > > > > > > 14 February 2016 at 11:26
> > > > > > > > > > Hi There,
> > > > > > > > > >
> > > > > > > > > > I have a situation where started a techproducts,
without
> > any
> > > > > > > > > modification,
> > > > > > > > > > post a pdf file. When searching as:
> > > > > > > > > >
> > > > > > > > > > q=text:search_word
> > > > > > > > > > hl=true
> > > > > > > > > > hl.fl=content
> > > > > > > > > >
> > > > > > > > > > It show the highlight accordingly! =)
> > > > > > > > > >
> > > > > > > > > > BUT... *if the "search_word" is after the
first pages* in
> > my
> > > > pdf
> > > > > > > file,
> > > > > > > > > > such
> > > > > > > > > > as page 15...
> > > > > > > > > >
> > > > > > > > > > It simply *does not show* *the HIGHLIGHT*...
> > > > > > > > > >
> > > > > > > > > > Does anyone has faced this situation before?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > *--Evert*
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Binoy Dalal
> > > > > > >
> > > > > >
> > > > > --
> > > > > Regards,
> > > > > Binoy Dalal
> > > > >
> > > >
> > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message