lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: highlighting
Date Tue, 26 Sep 2006 01:30:41 GMT
See below...

"Stelios Eliakis" <eliakis@gmail.com> wrote on 25/09/2006 15:48:10:
> You are right!
> 1)As far as Example 1 is concerned, I don't want these 2 fragments to
have
> the same score.Do you know how could I do this?

This behavior is not configurable, as far as I can understand, at least not
without changing the code of the QueryScorer (that you are using). I will
raise this question in the developers mailing list.

>
> 2)Furthemore, if a try to take fragment score:
>
> Scorer fragmentScore= highlighter.getFragmentScorer();
> float fragmentScoreFloat=fragmentScore.getFragmentScore();
>
> I take 0.0. why?
>

The QueryScorer maintains the score of the 'currently handled fragment', as
it process text fragments serially. Each time a new fragment starts, the
score maintained for it by the scorer is initialized to zero. So it really
depends how and when you call this API. If you invoke it after the call to
getBestFragments*() it would reflects the last processed fragment, so it
could be 0 or not. This makes sense with the javadoc:
  /** Called when the highlighter has no more tokens for
   * the current fragment - the scorer returns
   * the weighting it has derived for the most
   * recent fragment, typically based on the tokens
   * passed to getTokenScore().
   **/

> 3)Moreover,  for some docs lucene don't returns any fragment even if the
> query exist in the document. why? :)

I can't see how this happens... Do you have a sample - doc text and query -
that demonstrate this behavior?

>
> Thanks in advance
> Stelios Eliakis
>
>
> On 9/26/06, Doron Cohen <DORONC@il.ibm.com> wrote:
> >
> >
> > "Stelios Eliakis" <eliakis@gmail.com> wrote on 23/09/2006 02:39:27:
> > > I want to extract the Best Fragment (passage) from a text file.
> > > When I use the following code I take the first fragment that contains
my
> > > query. Nevertheless, the JavaDoc says that the function
getBestFragment
> > > returns the best fragment. Do I something wrong?
> >
> > That code seems fine to me.
> >
> > A possible explanation (which I think might be the case here but not
sure)
> > is that getBestFragment*() only accumulates fragments scores for
matches
> > of
> > "unique terms" in the fragment.
> >
> > Example 1: query = "xy", and the term "xy" appears once in an early
> > fragment but 3 times in a later fragment. In this case both fragments
> > would
> > be scored equally, and hence the early fragment would be selected
"best"
> > just because of how the sorting works.
> >
> > Example 2: query = "xy zw", and the early fragment contains "xy" but a
> > later fragment contains both "xy" and "zw". In this case the later
> > fragment
> > would be selected "best".
> >
> > Does this explain what you see in your program?
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Stelios Eliakis


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message