lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil <anilk...@gmail.com>
Subject Re: Highlight brings the content from the first pages of pdf
Date Mon, 15 Feb 2016 13:36:04 GMT
Yes. But i have long list of fields.

i feel adding all the fileds in fl is not good practice unless one
interested in few fields. In my case, i am interested in all fields except
the one .

is there any alternative approach ? Thanks in advance.



On 15 February 2016 at 17:27, Binoy Dalal <binoydalal93@gmail.com> wrote:

> If I understand correctly, you have already highlighted the field and only
> want to return the highlights and not the field itself.
> Well in that case, simply remove the field name from your fl list.
>
> On Mon, 15 Feb 2016, 17:04 Anil <anilklce@gmail.com> wrote:
>
> > HOw can highlighted field excluded in the main result ? as it is
> available
> > in the highlight section.
> >
> > In my scenario, One filed (lets say commands) of the each solr document
> > would be around 10 mg. I dont want to fetch that filed in response when
> its
> > highlight snippets available in the response.
> >
> > Please advice.
> >
> >
> >
> > On 15 February 2016 at 15:36, Evert R. <evert.ramos@gmail.com> wrote:
> >
> > > Hello Mark,
> > >
> > > Thanks for you reply.
> > >
> > > All text is indexed (1 pdf file). It works now.
> > >
> > > Best regard,
> > >
> > >
> > > *--Evert*
> > >
> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle <markehle@gmail.com>:
> > >
> > > > is all the text being indexed? Check to make sure that there's
> actually
> > > the
> > > > data you are looking for in the index. Is there a setting in tika
> that
> > > > limits how much is indexed? I seem to remember confronting this
> problem
> > > > myself once, and the data that I wanted just wasn't in the index
> > because
> > > it
> > > > was never put there in the first place.Something about
> > setMaxStringLength
> > > > orsomething.
> > > >
> > > > On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal <binoydalal93@gmail.com
> >
> > > > wrote:
> > > >
> > > > > What you've done so far will highlight every instance of "nietava"
> > > found
> > > > in
> > > > > the field, and return it, i.e., your entire field will return with
> > all
> > > > the
> > > > > "nietava"s in <em> tags.
> > > > > If you do not want the entire field, only portions of your field
> > > > containing
> > > > > the matched terms, then use hl.snippets parameter = the number of
> > > > snippets
> > > > > you want, in this particular case 3, along with the hl.fragsize
> > > parameter
> > > > > set to the same number as your hl.mazAnalyzedChars (or a really
> large
> > > > > number).
> > > > >
> > > > > I suggest you go through the wiki documentation for highlighting
> > once (
> > > > > https://wiki.apache.org/solr/HighlightingParameters). It should
> > answer
> > > > all
> > > > > of your questions regarding the use of the standard highlighter
> that
> > > you
> > > > > might have.
> > > > >
> > > > > As an additional note, I also suggest that you look into the
> > > > > PostingsHighlighter (
> > > > >
> > https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter
> > > ),
> > > > > since you seem to be running highlighting on pretty big fields and
> > > > postings
> > > > > is much more efficient at highlighting huge fields as compared to
> the
> > > > > standard highlighter.
> > > > >
> > > > > On Mon, Feb 15, 2016 at 4:15 AM Evert R. <evert.ramos@gmail.com>
> > > wrote:
> > > > >
> > > > > > Binoy,
> > > > > >
> > > > > > You are the man! =)
> > > > > >
> > > > > > Thank you very much!
> > > > > >
> > > > > > Would you by chance know how could I get the second highlight
of
> > the
> > > > same
> > > > > > word in the same file?
> > > > > >
> > > > > > Like: file_1.pdf (has three words "nietava") so..., how can
I
> bring
> > > the
> > > > > > highlighs for the three occurrences?
> > > > > >
> > > > > > I am pretty new around, should I send (open) another subject?
> > > > > >
> > > > > > Thanks again!
> > > > > >
> > > > > >
> > > > > > *--Evert*
> > > > > >
> > > > > > 2016-02-14 16:40 GMT-02:00 Binoy Dalal <binoydalal93@gmail.com>:
> > > > > >
> > > > > > > Are you sure you've typed in the parameters correctly?
> > > > > > > In your response it says flagsize instead of fragsize and
> > > > > > maxanalzyedchars
> > > > > > > instead of maxanalyzedchars.
> > > > > > >
> > > > > > > Ohh wait, I see that I made the analyzed typo. Awfully
sorry
> for
> > > > that,
> > > > > > I'm
> > > > > > > using my phone to send the mail out.
> > > > > > >
> > > > > > > On Sun, 14 Feb 2016, 23:53 Evert R. <evert.ramos@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi Binoy,
> > > > > > > >
> > > > > > > > thanks!
> > > > > > > >
> > > > > > > > Still not working, check the output:
> > > > > > > >
> > > > > > > > {
> > > > > > > >   "responseHeader":{
> > > > > > > >     "status":0,
> > > > > > > >     "QTime":58,
> > > > > > > >     "params":{
> > > > > > > >       "q":"nietava",
> > > > > > > >       "hl":"true",
> > > > > > > >       "hl.simple.post":"</em>",
> > > > > > > >       "indent":"true",
> > > > > > > >       "fl":"id",
> > > > > > > >       "hl.flagsize":"0",
> > > > > > > >       "hl.fl":"content",
> > > > > > > >       "hl.maxAnalzyedChars":"208400",
> > > > > > > >       "wt":"json",
> > > > > > > >       "hl.simple.pre":"<em>"}},
> > > > > > > >   "response":{"numFound":1,"start":0,"docs":[
> > > > > > > >       {
> > > > > > > >         "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > > > > > > >   },
> > > > > > > >   "highlighting":{
> > > > > > > >     "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > *--Evert*
> > > > > > > >
> > > > > > > > 2016-02-14 14:31 GMT-02:00 Binoy Dalal <
> binoydalal93@gmail.com
> > >:
> > > > > > > >
> > > > > > > > > Don't add this parameter to the searchComponent
definition,
> > > > because
> > > > > > the
> > > > > > > > > components where you've added it, GapFragmenter
and
> > > > > RegexFragmenter,
> > > > > > > > simply
> > > > > > > > > don't use it.
> > > > > > > > > Instead, add it to your request handler (/select
etc.) if
> > > you've
> > > > > > > > configured
> > > > > > > > > highlighting in the handler or append it to your
query:
> > > > > > > > > *&hl.maxAnalzyedChars=<some_really_big_number>*.
> > > > > > > > > Additionally also set the *hl.fragsize parameter
to 0*, if
> > your
> > > > > text
> > > > > > is
> > > > > > > > > larger than 51200 chars which it mostly is, in
a similar
> > > fashion.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Feb 14, 2016 at 9:02 PM Evert R. <
> > > evert.ramos@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Binoy,
> > > > > > > > > >
> > > > > > > > > > I could not find this option in my solrconfig.xml
file. ]
> > > > > > > > > >
> > > > > > > > > > I tryied to add this setting and nothing
changed...
> > > > > > > > > >
> > > > > > > > > > Here is the code, I might miss placed:
> > > > > > > > > >
> > > > > > > > > > <code>
> > > > > > > > > > <searchComponent class="solr.HighlightComponent"
> > > > > name="highlight">
> > > > > > > > > >     <highlighting>
> > > > > > > > > >       <!-- Configure the standard fragmenter
-->
> > > > > > > > > >       <!-- This could most likely be
commented out in the
> > > > > "default"
> > > > > > > > case
> > > > > > > > > > -->
> > > > > > > > > >       <fragmenter name="gap"
> > > > > > > > > >                   default="true"
> > > > > > > > > >                   class="solr.highlight.GapFragmenter">
> > > > > > > > > >         <lst name="defaults">
> > > > > > > > > >           <int name="hl.fragsize">400</int>
> > > > > > > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > > > > > > >         </lst>
> > > > > > > > > >       </fragmenter>
> > > > > > > > > >
> > > > > > > > > >       <!-- A regular-expression-based
fragmenter
> > > > > > > > > >            (for sentence extraction)
> > > > > > > > > >         -->
> > > > > > > > > >       <fragmenter name="regex"
> > > > > > > > > >                   class="solr.highlight.RegexFragmenter">
> > > > > > > > > >         <lst name="defaults">
> > > > > > > > > >           <!-- slightly smaller fragsizes
work better
> > because
> > > > of
> > > > > > slop
> > > > > > > > -->
> > > > > > > > > >           <int name="hl.fragsize">200</int>
> > > > > > > > > >           <int name="hl.maxAnalyzedChars">409600</int>
> > > > > > > > > >           <!-- allow 50% slop on fragment
sizes -->
> > > > > > > > > >           <float name="hl.regex.slop">0.5</float>
> > > > > > > > > >           <!-- a basic sentence pattern
-->
> > > > > > > > > >           <str name="hl.regex.pattern">[-\w
> > > > > > > > > > ,/\n\&quot;&apos;]{20,200}</str>
> > > > > > > > > >         </lst>
> > > > > > > > > >       </fragmenter>
> > > > > > > > > >
> > > > > > > > > > </code>
> > > > > > > > > >
> > > > > > > > > > thanks!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > *--Evert*
> > > > > > > > > >
> > > > > > > > > > 2016-02-14 12:14 GMT-02:00 Binoy Dalal <
> > > binoydalal93@gmail.com
> > > > >:
> > > > > > > > > >
> > > > > > > > > > > From the solr wiki:
> > > > > > > > > > > hl.maxAnalyzedChars
> > > > > > > > > > >
> > > > > > > > > > > How many characters into a document
to look for
> suitable
> > > > > > > > > > > snippets  Solr1.3. This parameter
makes sense for the
> > > > original
> > > > > > > > > > Highlighter
> > > > > > > > > > > only.
> > > > > > > > > > >
> > > > > > > > > > > The default value is "51200".
> > > > > > > > > > >
> > > > > > > > > > > You can assign a large value to this
parameter and use
> > > > > > > hl.fragsize=0
> > > > > > > > to
> > > > > > > > > > > return highlighting in large fields
that have size
> > greater
> > > > than
> > > > > > > 51200
> > > > > > > > > > > characters.
> > > > > > > > > > >
> > > > > > > > > > > I think this might be your hiccup.
> > > > > > > > > > >
> > > > > > > > > > > On Sun, 14 Feb 2016, 17:11 Evert R.
<
> > evert.ramos@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Paul,
> > > > > > > > > > > >
> > > > > > > > > > > > Sorry my late reply.
> > > > > > > > > > > >
> > > > > > > > > > > > All the content is inside de docs.
It brings the docs
> > and
> > > > the
> > > > > > pdf
> > > > > > > > > file
> > > > > > > > > > > that
> > > > > > > > > > > > has the search word in it. But
the highlight is not
> > > showing
> > > > > if
> > > > > > > the
> > > > > > > > > > search
> > > > > > > > > > > > word is after a few pages.
> > > > > > > > > > > >
> > > > > > > > > > > > Evert
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > *--Evert*
> > > > > > > > > > > >
> > > > > > > > > > > > 2016-02-14 8:36 GMT-02:00 Paul
Libbrecht <
> > > > paul@hoplahup.net
> > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > This looks like the stored
content is shortened.
> Can
> > it
> > > > be?
> > > > > > > > > > > > > Can you see that inside the
docs?
> > > > > > > > > > > > >
> > > > > > > > > > > > > paul
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Evert R. <mailto:evert.ramos@gmail.com>
> > > > > > > > > > > > > > 14 February 2016 at
11:26
> > > > > > > > > > > > > > Hi There,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have a situation where
started a techproducts,
> > > > without
> > > > > > any
> > > > > > > > > > > > > modification,
> > > > > > > > > > > > > > post a pdf file. When
searching as:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > q=text:search_word
> > > > > > > > > > > > > > hl=true
> > > > > > > > > > > > > > hl.fl=content
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It show the highlight
accordingly! =)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > BUT... *if the "search_word"
is after the first
> > > pages*
> > > > in
> > > > > > my
> > > > > > > > pdf
> > > > > > > > > > > file,
> > > > > > > > > > > > > > such
> > > > > > > > > > > > > > as page 15...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It simply *does not
show* *the HIGHLIGHT*...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Does anyone has faced
this situation before?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > *--Evert*
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Regards,
> > > > > > > > > > > Binoy Dalal
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > > Binoy Dalal
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Binoy Dalal
> > > > > > >
> > > > > >
> > > > > --
> > > > > Regards,
> > > > > Binoy Dalal
> > > > >
> > > >
> > >
> >
> --
> Regards,
> Binoy Dalal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message