lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binoy Dalal <binoydala...@gmail.com>
Subject Re: Highlight brings the content from the first pages of pdf
Date Tue, 16 Feb 2016 08:13:01 GMT
Yeah.
Under <lst name="defaults"> an entry like so:
<str name="fl">fields</str>

On Tue, 16 Feb 2016, 13:00 Anil <anilklce@gmail.com> wrote:

> you mean default fl ?
>
> On 16 February 2016 at 12:57, Binoy Dalal <binoydalal93@gmail.com> wrote:
>
> > Oh wait. We don't append the fl parameter to the query.
> > We've configured it in the request handler in solrconfig.xml
> > Maybe that is something that you can do.
> >
> > On Tue, 16 Feb 2016, 12:39 Anil <anilklce@gmail.com> wrote:
> >
> > > Thanks for your response Binoy.
> > >
> > > Yes.I am looking for any alternative to this. With long number of
> fileds,
> > > url will become long and might lead to "url too long exception" when
> > using
> > > http request.
> > >
> > > On 16 February 2016 at 11:01, Binoy Dalal <binoydalal93@gmail.com>
> > wrote:
> > >
> > > > Filling in the fl parameter with all the required fields is what we
> do
> > at
> > > > my project as well, and I don't think there is any alternative to
> this.
> > > >
> > > > Maybe somebody else can advise on this?
> > > >
> > > > On Tue, 16 Feb 2016, 10:30 Anil <anilklce@gmail.com> wrote:
> > > >
> > > > > Any help on this ? Thanks.
> > > > >
> > > > > On 15 February 2016 at 19:06, Anil <anilklce@gmail.com> wrote:
> > > > >
> > > > > > Yes. But i have long list of fields.
> > > > > >
> > > > > > i feel adding all the fileds in fl is not good practice unless
> one
> > > > > > interested in few fields. In my case, i am interested in all
> fields
> > > > > except
> > > > > > the one .
> > > > > >
> > > > > > is there any alternative approach ? Thanks in advance.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 15 February 2016 at 17:27, Binoy Dalal <
> binoydalal93@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> If I understand correctly, you have already highlighted
the
> field
> > > and
> > > > > only
> > > > > >> want to return the highlights and not the field itself.
> > > > > >> Well in that case, simply remove the field name from your
fl
> list.
> > > > > >>
> > > > > >> On Mon, 15 Feb 2016, 17:04 Anil <anilklce@gmail.com>
wrote:
> > > > > >>
> > > > > >> > HOw can highlighted field excluded in the main result
? as it
> is
> > > > > >> available
> > > > > >> > in the highlight section.
> > > > > >> >
> > > > > >> > In my scenario, One filed (lets say commands) of the
each solr
> > > > > document
> > > > > >> > would be around 10 mg. I dont want to fetch that filed
in
> > response
> > > > > when
> > > > > >> its
> > > > > >> > highlight snippets available in the response.
> > > > > >> >
> > > > > >> > Please advice.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On 15 February 2016 at 15:36, Evert R. <evert.ramos@gmail.com
> >
> > > > wrote:
> > > > > >> >
> > > > > >> > > Hello Mark,
> > > > > >> > >
> > > > > >> > > Thanks for you reply.
> > > > > >> > >
> > > > > >> > > All text is indexed (1 pdf file). It works now.
> > > > > >> > >
> > > > > >> > > Best regard,
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > *--Evert*
> > > > > >> > >
> > > > > >> > > 2016-02-14 23:47 GMT-02:00 Mark Ehle <markehle@gmail.com>:
> > > > > >> > >
> > > > > >> > > > is all the text being indexed? Check to make
sure that
> > there's
> > > > > >> actually
> > > > > >> > > the
> > > > > >> > > > data you are looking for in the index. Is
there a setting
> in
> > > > tika
> > > > > >> that
> > > > > >> > > > limits how much is indexed? I seem to remember
confronting
> > > this
> > > > > >> problem
> > > > > >> > > > myself once, and the data that I wanted just
wasn't in the
> > > index
> > > > > >> > because
> > > > > >> > > it
> > > > > >> > > > was never put there in the first place.Something
about
> > > > > >> > setMaxStringLength
> > > > > >> > > > orsomething.
> > > > > >> > > >
> > > > > >> > > > On Sun, Feb 14, 2016 at 8:28 PM, Binoy Dalal
<
> > > > > >> binoydalal93@gmail.com>
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > What you've done so far will highlight
every instance of
> > > > > "nietava"
> > > > > >> > > found
> > > > > >> > > > in
> > > > > >> > > > > the field, and return it, i.e., your
entire field will
> > > return
> > > > > with
> > > > > >> > all
> > > > > >> > > > the
> > > > > >> > > > > "nietava"s in <em> tags.
> > > > > >> > > > > If you do not want the entire field,
only portions of
> your
> > > > field
> > > > > >> > > > containing
> > > > > >> > > > > the matched terms, then use hl.snippets
parameter = the
> > > number
> > > > > of
> > > > > >> > > > snippets
> > > > > >> > > > > you want, in this particular case 3,
along with the
> > > > hl.fragsize
> > > > > >> > > parameter
> > > > > >> > > > > set to the same number as your hl.mazAnalyzedChars
(or a
> > > > really
> > > > > >> large
> > > > > >> > > > > number).
> > > > > >> > > > >
> > > > > >> > > > > I suggest you go through the wiki documentation
for
> > > > highlighting
> > > > > >> > once (
> > > > > >> > > > > https://wiki.apache.org/solr/HighlightingParameters).
> It
> > > > should
> > > > > >> > answer
> > > > > >> > > > all
> > > > > >> > > > > of your questions regarding the use
of the standard
> > > > highlighter
> > > > > >> that
> > > > > >> > > you
> > > > > >> > > > > might have.
> > > > > >> > > > >
> > > > > >> > > > > As an additional note, I also suggest
that you look into
> > the
> > > > > >> > > > > PostingsHighlighter (
> > > > > >> > > > >
> > > > > >> >
> > > >
> https://cwiki.apache.org/confluence/display/solr/Postings+Highlighter
> > > > > >> > > ),
> > > > > >> > > > > since you seem to be running highlighting
on pretty big
> > > fields
> > > > > and
> > > > > >> > > > postings
> > > > > >> > > > > is much more efficient at highlighting
huge fields as
> > > compared
> > > > > to
> > > > > >> the
> > > > > >> > > > > standard highlighter.
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Feb 15, 2016 at 4:15 AM Evert
R. <
> > > > evert.ramos@gmail.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Binoy,
> > > > > >> > > > > >
> > > > > >> > > > > > You are the man! =)
> > > > > >> > > > > >
> > > > > >> > > > > > Thank you very much!
> > > > > >> > > > > >
> > > > > >> > > > > > Would you by chance know how could
I get the second
> > > > highlight
> > > > > of
> > > > > >> > the
> > > > > >> > > > same
> > > > > >> > > > > > word in the same file?
> > > > > >> > > > > >
> > > > > >> > > > > > Like: file_1.pdf (has three words
"nietava") so...,
> how
> > > can
> > > > I
> > > > > >> bring
> > > > > >> > > the
> > > > > >> > > > > > highlighs for the three occurrences?
> > > > > >> > > > > >
> > > > > >> > > > > > I am pretty new around, should
I send (open) another
> > > > subject?
> > > > > >> > > > > >
> > > > > >> > > > > > Thanks again!
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > *--Evert*
> > > > > >> > > > > >
> > > > > >> > > > > > 2016-02-14 16:40 GMT-02:00 Binoy
Dalal <
> > > > > binoydalal93@gmail.com
> > > > > >> >:
> > > > > >> > > > > >
> > > > > >> > > > > > > Are you sure you've typed
in the parameters
> correctly?
> > > > > >> > > > > > > In your response it says flagsize
instead of
> fragsize
> > > and
> > > > > >> > > > > > maxanalzyedchars
> > > > > >> > > > > > > instead of maxanalyzedchars.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Ohh wait, I see that I made
the analyzed typo.
> Awfully
> > > > sorry
> > > > > >> for
> > > > > >> > > > that,
> > > > > >> > > > > > I'm
> > > > > >> > > > > > > using my phone to send the
mail out.
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Sun, 14 Feb 2016, 23:53
Evert R. <
> > > > evert.ramos@gmail.com>
> > > > > >> > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Hi Binoy,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > thanks!
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Still not working, check
the output:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > {
> > > > > >> > > > > > > >   "responseHeader":{
> > > > > >> > > > > > > >     "status":0,
> > > > > >> > > > > > > >     "QTime":58,
> > > > > >> > > > > > > >     "params":{
> > > > > >> > > > > > > >       "q":"nietava",
> > > > > >> > > > > > > >       "hl":"true",
> > > > > >> > > > > > > >       "hl.simple.post":"</em>",
> > > > > >> > > > > > > >       "indent":"true",
> > > > > >> > > > > > > >       "fl":"id",
> > > > > >> > > > > > > >       "hl.flagsize":"0",
> > > > > >> > > > > > > >       "hl.fl":"content",
> > > > > >> > > > > > > >       "hl.maxAnalzyedChars":"208400",
> > > > > >> > > > > > > >       "wt":"json",
> > > > > >> > > > > > > >       "hl.simple.pre":"<em>"}},
> > > > > >> > > > > > > >   "response":{"numFound":1,"start":0,"docs":[
> > > > > >> > > > > > > >       {
> > > > > >> > > > > > > >
>  "id":"/home/solr/dados/teste/Emmanuel.pdf"}]
> > > > > >> > > > > > > >   },
> > > > > >> > > > > > > >   "highlighting":{
> > > > > >> > > > > > > >     "/home/solr/dados/teste/Emmanuel.pdf":{}}}
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > *--Evert*
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > 2016-02-14 14:31 GMT-02:00
Binoy Dalal <
> > > > > >> binoydalal93@gmail.com
> > > > > >> > >:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Don't add this parameter
to the searchComponent
> > > > > >> definition,
> > > > > >> > > > because
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > components where
you've added it, GapFragmenter
> > and
> > > > > >> > > > > RegexFragmenter,
> > > > > >> > > > > > > > simply
> > > > > >> > > > > > > > > don't use it.
> > > > > >> > > > > > > > > Instead, add it
to your request handler (/select
> > > etc.)
> > > > > if
> > > > > >> > > you've
> > > > > >> > > > > > > > configured
> > > > > >> > > > > > > > > highlighting in
the handler or append it to your
> > > > query:
> > > > > >> > > > > > > > > *&hl.maxAnalzyedChars=<some_really_big_number>*.
> > > > > >> > > > > > > > > Additionally also
set the *hl.fragsize parameter
> > to
> > > > 0*,
> > > > > if
> > > > > >> > your
> > > > > >> > > > > text
> > > > > >> > > > > > is
> > > > > >> > > > > > > > > larger than 51200
chars which it mostly is, in a
> > > > similar
> > > > > >> > > fashion.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Sun, Feb 14,
2016 at 9:02 PM Evert R. <
> > > > > >> > > evert.ramos@gmail.com>
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi Binoy,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I could not
find this option in my
> > solrconfig.xml
> > > > > file.
> > > > > >> ]
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I tryied to
add this setting and nothing
> > > changed...
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Here is the
code, I might miss placed:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > <code>
> > > > > >> > > > > > > > > > <searchComponent
> class="solr.HighlightComponent"
> > > > > >> > > > > name="highlight">
> > > > > >> > > > > > > > > >     <highlighting>
> > > > > >> > > > > > > > > >       <!--
Configure the standard fragmenter
> -->
> > > > > >> > > > > > > > > >       <!--
This could most likely be commented
> > out
> > > > in
> > > > > >> the
> > > > > >> > > > > "default"
> > > > > >> > > > > > > > case
> > > > > >> > > > > > > > > > -->
> > > > > >> > > > > > > > > >       <fragmenter
name="gap"
> > > > > >> > > > > > > > > >           
       default="true"
> > > > > >> > > > > > > > > >
> > > > >  class="solr.highlight.GapFragmenter">
> > > > > >> > > > > > > > > >         <lst
name="defaults">
> > > > > >> > > > > > > > > >           <int
name="hl.fragsize">400</int>
> > > > > >> > > > > > > > > >           <int
> > > > name="hl.maxAnalyzedChars">409600</int>
> > > > > >> > > > > > > > > >         </lst>
> > > > > >> > > > > > > > > >       </fragmenter>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >       <!--
A regular-expression-based
> fragmenter
> > > > > >> > > > > > > > > >           
(for sentence extraction)
> > > > > >> > > > > > > > > >         -->
> > > > > >> > > > > > > > > >       <fragmenter
name="regex"
> > > > > >> > > > > > > > > >
> > > > > >>  class="solr.highlight.RegexFragmenter">
> > > > > >> > > > > > > > > >         <lst
name="defaults">
> > > > > >> > > > > > > > > >           <!--
slightly smaller fragsizes work
> > > > better
> > > > > >> > because
> > > > > >> > > > of
> > > > > >> > > > > > slop
> > > > > >> > > > > > > > -->
> > > > > >> > > > > > > > > >           <int
name="hl.fragsize">200</int>
> > > > > >> > > > > > > > > >           <int
> > > > name="hl.maxAnalyzedChars">409600</int>
> > > > > >> > > > > > > > > >           <!--
allow 50% slop on fragment
> sizes
> > > -->
> > > > > >> > > > > > > > > >           <float
> > name="hl.regex.slop">0.5</float>
> > > > > >> > > > > > > > > >           <!--
a basic sentence pattern -->
> > > > > >> > > > > > > > > >           <str
name="hl.regex.pattern">[-\w
> > > > > >> > > > > > > > > > ,/\n\&quot;&apos;]{20,200}</str>
> > > > > >> > > > > > > > > >         </lst>
> > > > > >> > > > > > > > > >       </fragmenter>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > </code>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > thanks!
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > *--Evert*
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > 2016-02-14
12:14 GMT-02:00 Binoy Dalal <
> > > > > >> > > binoydalal93@gmail.com
> > > > > >> > > > >:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > From the
solr wiki:
> > > > > >> > > > > > > > > > > hl.maxAnalyzedChars
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > How many
characters into a document to look
> > for
> > > > > >> suitable
> > > > > >> > > > > > > > > > > snippets
 Solr1.3. This parameter makes
> sense
> > > for
> > > > > the
> > > > > >> > > > original
> > > > > >> > > > > > > > > > Highlighter
> > > > > >> > > > > > > > > > > only.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > The default
value is "51200".
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > You can
assign a large value to this
> parameter
> > > and
> > > > > use
> > > > > >> > > > > > > hl.fragsize=0
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > > > return
highlighting in large fields that
> have
> > > size
> > > > > >> > greater
> > > > > >> > > > than
> > > > > >> > > > > > > 51200
> > > > > >> > > > > > > > > > > characters.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > I think
this might be your hiccup.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Sun,
14 Feb 2016, 17:11 Evert R. <
> > > > > >> > evert.ramos@gmail.com
> > > > > >> > > >
> > > > > >> > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Hi
Paul,
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Sorry
my late reply.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > All
the content is inside de docs. It
> brings
> > > the
> > > > > >> docs
> > > > > >> > and
> > > > > >> > > > the
> > > > > >> > > > > > pdf
> > > > > >> > > > > > > > > file
> > > > > >> > > > > > > > > > > that
> > > > > >> > > > > > > > > > > > has
the search word in it. But the
> highlight
> > > is
> > > > > not
> > > > > >> > > showing
> > > > > >> > > > > if
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > search
> > > > > >> > > > > > > > > > > > word
is after a few pages.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Evert
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > *--Evert*
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > 2016-02-14
8:36 GMT-02:00 Paul Libbrecht <
> > > > > >> > > > paul@hoplahup.net
> > > > > >> > > > > >:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
This looks like the stored content is
> > > > shortened.
> > > > > >> Can
> > > > > >> > it
> > > > > >> > > > be?
> > > > > >> > > > > > > > > > > > >
Can you see that inside the docs?
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
paul
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
> Evert R. <mailto:
> evert.ramos@gmail.com>
> > > > > >> > > > > > > > > > > > >
> 14 February 2016 at 11:26
> > > > > >> > > > > > > > > > > > >
> Hi There,
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> I have a situation where started a
> > > > > techproducts,
> > > > > >> > > > without
> > > > > >> > > > > > any
> > > > > >> > > > > > > > > > > > >
modification,
> > > > > >> > > > > > > > > > > > >
> post a pdf file. When searching as:
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> q=text:search_word
> > > > > >> > > > > > > > > > > > >
> hl=true
> > > > > >> > > > > > > > > > > > >
> hl.fl=content
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> It show the highlight accordingly! =)
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> BUT... *if the "search_word" is after
> > the
> > > > > first
> > > > > >> > > pages*
> > > > > >> > > > in
> > > > > >> > > > > > my
> > > > > >> > > > > > > > pdf
> > > > > >> > > > > > > > > > > file,
> > > > > >> > > > > > > > > > > > >
> such
> > > > > >> > > > > > > > > > > > >
> as page 15...
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> It simply *does not show* *the
> > > HIGHLIGHT*...
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> Does anyone has faced this situation
> > > before?
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> Thanks!
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> *--Evert*
> > > > > >> > > > > > > > > > > > >
>
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > --
> > > > > >> > > > > > > > > > > Regards,
> > > > > >> > > > > > > > > > > Binoy
Dalal
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > --
> > > > > >> > > > > > > > > Regards,
> > > > > >> > > > > > > > > Binoy Dalal
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > --
> > > > > >> > > > > > > Regards,
> > > > > >> > > > > > > Binoy Dalal
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > --
> > > > > >> > > > > Regards,
> > > > > >> > > > > Binoy Dalal
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> --
> > > > > >> Regards,
> > > > > >> Binoy Dalal
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > > --
> > > > Regards,
> > > > Binoy Dalal
> > > >
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message